Ao Zhang@CVPR @zhanga6 Twitter profile

Pinned Tweet

Ao Zhang@CVPR

6 months

🚀NExT-Chat🚀: An LMM for Chat, Detection and Segmentation All of the demo code, training code, evaluation code, and model weights are released at . This a large multimodal model for chat, detection and segmentation as shown in the demo video:

3

15

53

Last Seen Profiles

@milky_cereal3

@TerrellePryor

@its_cierraaaaaa

@princesskatefa

@etherealregalia

@dyao_myao

@shi_li3133

@SDinneen16

@bgkvfcs9yn

@kirginayse1

@ExtrmW1llpower

@Lightworke64895

@irisslee

@rhttarmani

@Sawsan_panoty

@lattoxmegxglo

@shubiwubi

@AinaChang37985

@tscdchinese

@CAMP_TOYOKAWApr

@o1tmSdVpZumoj5q

@nsamimi9

@b00ml00p

@bn_brzan

@ssssknt

@d_rempuz

@shyzi27896

@StaunchInsafi

@turkiyeliata

@bswixyi

@asprahk

@nekona_kiki

@IVAN_fanaccount

@AAlyaseen5

@rossa_zara69090

@Bemykitten214

Ao Zhang@CVPR

@zhanga6

1 month

So sad to hear the news ()😰. The conclusion of our investigation: 1. Llama3-V can be run using MiniCPM-Llama3-V 2.5's code and config.json after changing param names 2. It behaves similarly to MiniCPM-Llama3-V 2.5 in unrevealed experimental features

Project author team stay tuned: I found out that the llama3-V project is stealing a lot of academic...

Fellow MiniCPM-Llama3-V 2.5 project authors, a few days ago I discovered a shocking fact.There is a large amount of work in the llama3-V (https://github.com/mustafaaljadery/llama3v) project that is...

github.com

12

73

500

Ao Zhang@CVPR

@zhanga6

1 month

One of the experimental features of MiniCPM-Llama3-V 2.5 is recognizing Tsinghua Bamboo Characters (清华简), a very special and rare type of Chinese ancient characters written on bamboo during China's Warring States Period (475 BC-221 BC). These training images are recently

3

7

85

Ao Zhang@CVPR

@zhanga6

1 year

🌟In this work, we propose a VPGTrans for building vision-language LLM (VL-LLM) at a lower cost (e.g. 10%)🚀🚀🚀. With VPGTrans, we built VL-LLaMA and VL-Vicuna. Welcome try out our VL-Vicuna: Codes: Paper:

2

16

50

Ao Zhang@CVPR

@zhanga6

1 month

@chrmanning @TsinghuaNLP @Stanford Thanks for recognizing our team’s work😊

0

34

Ao Zhang@CVPR

@zhanga6

1 month

After receiving the issue from @yangzhizheng1 on GitHub, we launched a serious investigation. We can obtain inference results correctly using Llama3-V checkpoint with MiniCPM-Llama3-V 2.5's code and config file following @yangzhizheng1 's instruction on GitHub. Even more, we also

1

2

33

Ao Zhang@CVPR

@zhanga6

1 month

For quantative results, we also test several Llama3-based VLMs on 1K Bamboo Character images and compared the prediction exact match for each pair of models. The overlaps between every two models are zero, whereas the overlaps between Llama3-V and MiniCPM-Llama3-V 2.5 achieve a

1

2

30

Ao Zhang@CVPR

@zhanga6

1 month

The same thing also happens to WebAgent, another unrevealed feature trained on in-house data. They even make identical errors in a WebAgent schema newly defined within our team...

1

27

Ao Zhang@CVPR

@zhanga6

22 days

I will be at #CVPR2024 from 16 Jun. to 22 Jun. Happy to meet old friends and make new friends😃. If you are interested in MLLM, let’s discuss!

0

2

18

Ao Zhang@CVPR

@zhanga6

2 months

Comparable to GPT-4V with only 8b param😱. Welcome to check out our new MiniCPM-Llama3-V 2.5.

OpenBMB

@OpenBMB

2 months

🚀 Excited to introduce MiniCPM-Llama3-V 2.5! With 8B parameters, it’s our latest breakthrough, outperforming top models like GPT-4V. 📈 💪 Superior OCR capabilities 🔑 Supports 30+ languages HuggingFace: GitHub:

1

36

80

0

5

Ao Zhang@CVPR

@zhanga6

3 months

Big congrats to my friend Kai's excellent paper💯! It is inspiring to empower the retrieval model with knowledge from the website data and LLM. Wondering will this be used in the Google search engine later😁?

Kai Zhang

@DrogoKhal4

3 months

Proud to present 🔍MagicLens: image retrieval models following open-ended instructions. 🌟Highlights of 🔍MagicLens: >🧐Novel Insights: Naturally occurring image pairs on the same web page contain diverse image relations (e.g., inside and outside views

14

59

185

1

3

Ao Zhang@CVPR

@zhanga6

7 months

@mervenoyann @Microsoft Great share! Also make a recommendation for our recent work🔥NExT-Chat🔥that can output both boxes and segmentation masks! All code is released recently at .

1

0

3

Ao Zhang@CVPR

@zhanga6

13 days

Real-time video generation🤩🤩🤩. Congrats to Xuanlei and Kai.

Xuanlei Zhao

@oahzxl

13 days

Real-Time Video Generation: Achieved 🥳 Share our latest work with @JxlDragon , @VictorKaiWang1 , and @YangYou1991 : "Real-Time Video Generation with Pyramid Attention Broadcast." 3 features: real-time, lossless quality, and training-free! Blog: (🧵1/6)

17

141

639

0

3

Ao Zhang@CVPR

@zhanga6

7 months

@REVOLVO_OCELOTS @_akhaliq I think it can not work with llava. Open sourced models are not powerful enough to support Set of Mark prompting.

0

2

Ao Zhang@CVPR

@zhanga6

4 months

Will the trend of scaling up AI systems end quickly🤔? I do not think so. Imagine that there is a model&algorithm that can get a human-like AI with only 1 A100 in 1 day. What will the big companies do? I think they will just scale up the data and model to seek super AI😂

0

2

Ao Zhang@CVPR

@zhanga6

1 month

Since the HuggingFace page of Llama3-V is removed now, we upload both Llama3-V and MiniCPM-V checkpoints () for comparison. Since this model has received several thousands of downloads on HuggingFace, there should be independent copies to reproduce this.

0

2

Ao Zhang@CVPR

@zhanga6

6 months

@mervenoyann @Microsoft Great! I am truly excited and grateful for the opportunity!

0

1

Ao Zhang@CVPR

@zhanga6

1 month

@giffmana @OpenBMB thanks for sharing😃

0

1

Ao Zhang@CVPR

@zhanga6

3 months

Shocked by the performance💥! Also solve my long confuse about which version of ChatGPT to use for eval.

Ruohong Zhang

@RuohongZhang

3 months

[p1] 🐕Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward🐕 Paper link: page: How to effectively train video large multimodal Model (LMM) alignment with preference modeling?

2

15

67

0

1

Ao Zhang@CVPR

@zhanga6

6 months

😭😭😭A lesson from 2023 in research: do remember to give a good title to your paper! In May, I release a paper with a similar idea with MiniGPT-4, where I build 🚀VL-Vicuna🚀 for MM conversation. However, I have a much bad title (VPGTrans) and promotion. Finally:

1

0

1

Ao Zhang@CVPR

@zhanga6

3 months

@huybery congrats! Just contribute a new fan

0

1

Ao Zhang@CVPR

@zhanga6

1 month

@RylanSchaeffer @cfpark1997 thanks for the efforts!

0

1

Ao Zhang@CVPR

@zhanga6

7 months

@REVOLVO_OCELOTS @_akhaliq There is a GPT-4V-Act before.

GitHub - ddupont808/GPT-4V-Act: AI agent using GPT-4V(ision) capable of using a mouse/keyboard to...

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI - ddupont808/GPT-4V-Act

github.com

1

0

1

Ao Zhang@CVPR

@zhanga6

1 month

@billyuchenlin thanks❤️!

0

1

Ao Zhang@CVPR

@zhanga6

4 months

@siddkaramcheti @SurajNair_1 @ashwinb96 @percyliang @tkollar @DorsaSadigh Great work!!! I have a small question regarding the multiple epoch training. Will the multiple epoch training cause some difference in qualitative performance?

1

0

Ao Zhang@CVPR

@zhanga6

4 months

@siddkaramcheti @SurajNair_1 @ashwinb96 @percyliang @tkollar @DorsaSadigh However, when I recently train a LMM+detection+segmentation model using both VQA and detection data, I also notice that more epochs are required. So, I am curious about whether the multiple epoch training will lead to some qualitative difference.

1

0

1

Ao Zhang@CVPR

@zhanga6

1 year

3. Our VPGTrans built models sometimes can even outperform the original ones.

0

1

Ao Zhang@CVPR

@zhanga6

1 year

1. In this work we explore the transferability of visual prompt generator (VPG) across LLMs, such that one can easily create a novel high-performance vision-language LLM (VL-LLM) without training from scratch in prohibitively expensive costs.

1

Ao Zhang@CVPR

@zhanga6

6 months

Project page:

0

1

Ao Zhang@CVPR

@zhanga6

6 months

method:

0

1

Ao Zhang@CVPR

@zhanga6

7 months

🤯Every LMM (including my work) uses downstream tasks to finetune their model. I do not know how to really evaluate a LMM except for human evaluation.

0

1

Ao Zhang@CVPR

@zhanga6

4 months

@XiaohuaZhai @giffmana check your mail🙏🙏🙏

0

1

Ao Zhang@CVPR

@zhanga6

1 year

2. We propose a VPGTrans, an VPG transfer framework that is simple yet highly effective. This work also contribute by building two novel VL-LLMs, including VL-LLaMA and VL-Vicuna.

1