Ge Zhang Profile
Ge Zhang

@GeZhang86038849

1,221
Followers
727
Following
69
Media
406
Statuses

Founder: M-A-P() Incoming Ph.D. student: Computer Science @UWaterloo MSc: ECE & DS @UMich BSc: Computer Science @ BUPT

Joined April 2021
Don't wanna be here? Send us removal request.
@GeZhang86038849
Ge Zhang
5 months
[1/n] 🚀 Excited to share our latest work on OpenCodeInterpreter! With a blend of execution results and human feedback, we've achieved significant advancements in code generation. Here are the key points: ✨ Introducing OpenCodeInterpreter - a leap in iterative code refinement.
Tweet media one
13
61
217
@GeZhang86038849
Ge Zhang
3 months
I'm extremely excited to announce "the big bomb"!: Neo and Matrix, that we're working on with colleagues and friends from open-source community, , wuhan ai, and . Neo is the first fully-transparent bilingual large language model, with
Tweet media one
8
50
201
@GeZhang86038849
Ge Zhang
4 months
[1/n] 🎉🎉🎉 Excited to share our latest work: "The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis"! We delve into the dynamics of LLMs across different scales and domains. 💡Highlights include: 🗺️ Comprehensive Model Evaluation:
Tweet media one
2
29
104
@GeZhang86038849
Ge Zhang
6 months
What happens when decision networks encounter multimodal instruction? We explore enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a “read-to-play” capability. Paper: [1/n]
3
21
90
@GeZhang86038849
Ge Zhang
4 months
[1/n] 🚀🚀🚀 Excited to share our latest work: "CodeEditorBench:Evaluating Code Editing Capability of Large Language Models"! ### 🧐 Highlights of the CodeEditorBench: > 8K meticulously collected code editing questions from five sources: namely
Tweet media one
2
19
74
@GeZhang86038849
Ge Zhang
4 months
[1/n] Happy to unveil the pioneering work behind "Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model". Our study presents CT-LLM, a model engineered with a deep focus on the Chinese language, marking a significant shift from traditional models that often
Tweet media one
Tweet media two
2
15
68
@GeZhang86038849
Ge Zhang
4 months
[1/n] Happy to share our new work "MuPT: A Generative Symbolic Music Pretrained Transformer", encompassing a series of music generation models ranging from 190 million to 4.2 billion parameters, all based on the ABC Notation. According to human preference evaluations, our models
2
22
67
@GeZhang86038849
Ge Zhang
3 months
Personally speaking, I believe that the paradigm of clearly spliting pretraining and alignment won't last long. There are tons of valuable instruction corpora or plain extracted text but with manually understandable implicit motivations and structures existing in the nowadays
@arankomatsuzaki
Aran Komatsuzaki
3 months
MAmmoTH2: Scaling Instructions from the Web - Proposes a paradigm to efficiently harvest 10M instruction data from web corpus to enhance LLM reasoning - 11% -> 34% on MATH and 36% -> 67% on GSM8K proj: abs:
Tweet media one
7
25
178
3
5
37
@GeZhang86038849
Ge Zhang
4 months
Thanks for sharing our work! Congrats to Tianle and other team members! In-Context Learning Capacity is a strong reference of whether Long-context LLMs can really achieve compositional reasoning as they can when the context is shorter. Needle in a haystack focuses more on
@arankomatsuzaki
Aran Komatsuzaki
4 months
Long-context LLMs Struggle with Long In-context Learning Suggests a notable gap in current LLM capabilities for processing and understanding long, context-rich sequences.
Tweet media one
5
40
216
1
3
29
@GeZhang86038849
Ge Zhang
6 months
We are delighted to announce CMMMU, A Chinese Massive Multi-discipline Multimodal Understanding Benchmark. CMMMU is inspired by and strictly follows the pattern of MMMU. CMMMU starts the race of Bilingual LMMs on complex reasoning!
Tweet media one
1
12
29
@GeZhang86038849
Ge Zhang
2 months
MAP-Neo is another good example!() We release all resources needed to be on par with Mistral v0.2 and surpass LLaMA-2(code/pretrain corpus/data cleaning pipeline/etc) with the similar size. We are also glad to announce that we open-source the phase-2 SFT
Tweet media one
@percyliang
Percy Liang
2 months
We should call models like Llama 3, Mixtral, etc. “open-weight models”, not “open-source models”. For a model to be open-source, the code and training data need to be public (good examples: GPT-J, OLMo, RedPajama, StarCoder, K2, etc.). Weights are like an exe file, which would be
45
319
2K
1
3
28
@GeZhang86038849
Ge Zhang
2 months
🚀 Excited to announce that the tech report of MAP-Neo (): a fully open-source and transparent bilingual LLM suite with superior performance to bridge the gap with closed-source models, is now available: 🔧MAP-Neo's workflow
2
9
29
@GeZhang86038849
Ge Zhang
4 months
Thanks for tweeting COIG-CQIA! lol! Yea, indeed, we devote heavy human effort to manually collect and clean SFT corpora from Ruozhiba, A Chinese Sarcastic Joke Forum, and it indeed makes Yi-34B better! COIG-CQIA is a scaled-up Chinese version LIMA to support Open-Source SFT for
@9hills
九原客
4 months
看论文看到哈哈大笑,用「弱智吧」标题+GPT-4回答微调后的Yi-34B模型评估结果超过了精心收集的 SFT 指令集数据,安全性评估也是第二名。 弱智吧就是百度弱智吧,里面的帖子是这种画风:「既然监狱里全是罪犯,👮♀️为什么不去监狱里抓人?」 论文:
Tweet media one
Tweet media two
44
171
852
5
2
26
@GeZhang86038849
Ge Zhang
5 months
[2/n] Teaching code LLM to self-correct based on the executors is the way to achieve GPT-comparable code generation!
Tweet media one
1
5
24
@GeZhang86038849
Ge Zhang
2 months
[1/n] New Paper Alert! Funny II-Bench Coming! Excited to introduce II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models. Check it out: In this study, we aims to evaluate the MLLMs’s higher-order perception of images.
Tweet media one
2
8
21
@GeZhang86038849
Ge Zhang
3 months
I'm heading to Vienna after taking a nice travel and break. I will present MAmmoTH (MAmmoTH-2 coming soon as well) and have discussions about 4 of my recent works (MERT, MALMEN, STABLE-Alignment and MAmmoTH) with my amazing friends, mentors, and colleagues! I also bring a big
Tweet media one
1
1
18
@GeZhang86038849
Ge Zhang
2 months
Thanks for sharing our work!
@arankomatsuzaki
Aran Komatsuzaki
2 months
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark abs: leaderboard:
Tweet media one
1
16
115
0
1
17
@GeZhang86038849
Ge Zhang
25 days
Glad to announce that StructLM and CT-LLM are accepted to #COLM2024 . CT-LLM verifies that Chinese, as any other languages besides English, can generalize to emergent ability of other languages. StructLM is the SoTA foundation model for tackling different
@WenhuChen
Wenhu Chen
25 days
Delighted to have two papers accepted to #COLM2024 . The first paper is StructLM. It's the SoTA foundation model for tackling all different types of structure knowledge grounding tasks like tableQA, KBQA,etc. This work was led by an undergrad @alexzhuang_
1
8
52
0
3
16
@GeZhang86038849
Ge Zhang
2 months
To be honest, a reviewer should be qualified to review a pretrain related paper only if she/he is a core member of at least one of an Open Large Language Model. Otherwise, it is really disappointing, for a LLM conference, like COLM.
3
1
16
@GeZhang86038849
Ge Zhang
3 years
@SpokespersonCHN Winged words indeed! P.P. 10043 not only destroy young scholars' academic dream but also destroy people's confidence in academic integrity and racist condition in US.
1
0
15
@GeZhang86038849
Ge Zhang
8 months
Check out our recent work on MMMU (massive multi-task multimodal understanding)! Multimodal Foundation Model should be able to understand the images and reason like LLM on MMLU. It's the time to take the next step towards a better Multimodal!
@xiangyue96
Xiang Yue
8 months
🚀 Introducing MMMU, a Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. 🧐 Highlights of the MMMU benchmark: > 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks >
Tweet media one
Tweet media two
Tweet media three
Tweet media four
18
184
746
0
2
13
@GeZhang86038849
Ge Zhang
4 months
Thanks for sharing our work! A Love Letter to the Chinese NLP Community from the M-A-P!
@arankomatsuzaki
Aran Komatsuzaki
4 months
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model - Presents CT-LLM, a 2B LLM - Open-sourcing the full process of training, including a detailed data processing procedure hf: abs:
Tweet media one
2
32
142
0
3
12
@GeZhang86038849
Ge Zhang
4 months
Thanks for sharing our MuPT!
@arankomatsuzaki
Aran Komatsuzaki
4 months
MuPT: A Generative Symbolic Music Pretrained Transformer Presents a series of pre-trained models for symbolic music generation based on Llama architecture proj: abs:
Tweet media one
4
14
76
0
4
11
@GeZhang86038849
Ge Zhang
5 months
Thanks for sharing our work!
@_akhaliq
AK
5 months
OpenCodeInterpreter Integrating Code Generation with Execution and Refinement The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems
18
53
246
0
4
11
@GeZhang86038849
Ge Zhang
1 month
Glad to see that Claude-3.5-Sonnet performs amazing on our recently released II-Bench () and significantly surpasses the existing MLLMs! It really understands human's humors and visual implications!
Tweet media one
Tweet media two
1
2
11
@GeZhang86038849
Ge Zhang
2 months
Thanks for including our MAP-Neo-instruct-v0.1 in the amazing WildBench! True Open-Source Power!
@billyuchenlin
Bill Yuchen Lin 🤖
2 months
M-A-P/Neo-7B-Instruct is the 1st 💎fully-open💎 LLM on WildBench leaderboard and its performance is awesome. "Fully open-source" here means that all data for pre-training & post-training are open, code is open-source, in addition to the public model weights! As @percyliang
Tweet media one
1
18
76
0
2
10
@GeZhang86038849
Ge Zhang
5 months
[4/n] A coooool demo video!!!!:
1
1
9
@GeZhang86038849
Ge Zhang
2 months
@agihippo It makes sense as well. But I believe LLaMA-3, Qwen Series, Yi Series are quite good as well.
1
0
7
@GeZhang86038849
Ge Zhang
5 months
[3/n] The statistics of our multi-round code generation instructions:
Tweet media one
1
1
8
@GeZhang86038849
Ge Zhang
2 months
Amazing! A real step towards a General World Model that: 1. Simulates world states by generating videos across any domains 2. Allows any-time control with free-text actions
@MaitrixOrg
Maitrix.org
2 months
🔥Introducing Pandora 🌏 🪐 a World Model that generates videos of world states with real-time language control 🎥🕹️ Simulate the world across domains in an _interactive_ way! check out more
7
75
233
0
1
9
@GeZhang86038849
Ge Zhang
2 months
@agihippo Maybe I'm ignorant. The data retrieval pipeline from Deepseek, the WSD LRS from MiniCPM, detailed tech report of Qwen/Yi/Deepseek/baichuan/skyword/... There are a lot of solid contributions from Chinese Open-source LLM community with relatively "limited" computational resource.
0
0
8
@GeZhang86038849
Ge Zhang
2 months
Real impressive work! Glad that MAP-Neo can be baseline of the amazing DCLM!
@arankomatsuzaki
Aran Komatsuzaki
2 months
DataComp-LM: In search of the next generation of training sets for language models - Provides a corpus of 240T tokens from Common Crawl - Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B proj:
Tweet media one
1
46
210
0
2
9
@GeZhang86038849
Ge Zhang
4 months
Kudos to the Qwen Team! The SOTA-Level Code LLM should be capable of editing codes to satisfy programmers' requests indeed! Try out the amazing CodeQwen and our CodeEditorBench!
@huybery
Binyuan Hui
4 months
(4/n) 🧵 CodeQwen are Debuggers In assessing CodeQwen1.5’s proficiency in code modification tasks, we concentrated our evaluation on the CodeEditorBench suite, encompassing four distinct dimensions: Debugging, Translation, Language Switching, and Code Polishing. The results
Tweet media one
1
1
13
0
0
9
@GeZhang86038849
Ge Zhang
2 months
我的Twitter互动圈 ➡️
Tweet media one
1
0
8
@GeZhang86038849
Ge Zhang
9 months
100 citations!
0
0
8
@GeZhang86038849
Ge Zhang
2 months
My personal favorite idea in the past 2024 on letting more people benefit from LLMs: Domain-specific LLM is a real thing! There are tons of traditional companies feeling bad sharing their private data and on the way building their own LLMs. But they are with very limited
@QuehryS
Quehry Que
2 months
Excited to share our latest research paper! 📄📷 In this study, we explore Scaling Law in Domain-specific Continual Pre-training scenarios. Our findings reveal the relationship between model performance and mixture ratios. Check it out here: #ScalingLaw
1
1
6
0
1
8
@GeZhang86038849
Ge Zhang
1 year
A preliminary release for researchers of the Chinese community to play with it. Will keep updating it and welcome all collaborators to contribute to it. The general translated corpus doesn't use OpenAI API so it's capable of commercial use as well.
@_akhaliq
AK
1 year
Chinese Open Instruction Generalist: A Preliminary Release abs: dataset:
Tweet media one
3
66
167
0
0
8
@GeZhang86038849
Ge Zhang
3 months
Yinghao @nicolaus625 will present MERT() at Hall B #282 from 10:45 am to 12:45 am today. I will present MAmmoTH() and discuss about MAmmoTH2() at Hall B #122 from 4:30 pm to 6:30 pm today. Come and Chat with
0
1
7
@GeZhang86038849
Ge Zhang
11 months
The experiment results of MAmmoTH reveal that different math metrics do not necessarily improve simultaneously but can all benefit from transfer learning of a well-designed mixed-up instruction tuning set. It's very exciting to be part of the work!
@WenhuChen
Wenhu Chen
11 months
Excited to introduce our latest math generalist model MAmmoTH 🦣, built through instruction tuning. We proposed hybrid "chain-of-thought" & "program-of-thought" training to supercharge LLMs' math reasoning capabilities. 🦣 beats the open SoTA by 20+% on many datasets like MATH.
Tweet media one
9
41
256
0
1
7
@GeZhang86038849
Ge Zhang
2 months
Thanks a lot! LLM360 team has done amazing exploration work for the whole fully open LLM community by developing amber and crystal, and further scaling to K2! All fully open LLM teams share the same motivation purchasing LLM Democratization and Real Open Science!
@llm360
LLM360
2 months
🎉 Congratulations to an awesome fully open source model, by the m-a-p team! Paper: 📎 Includes great info on: -Data Curation -Infra details -Intermediate checkpoints -Scaling law LLM360 is happy to work with this thriving community on open source AI.
1
2
32
1
1
7
@GeZhang86038849
Ge Zhang
5 months
Thanks for sharing our work! Try out our StructLM!
@_akhaliq
AK
5 months
StructLM Towards Building Generalist Models for Structured Knowledge Grounding Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their
Tweet media one
2
55
332
0
1
7
@GeZhang86038849
Ge Zhang
3 months
[2/n] Neo's major metrics performance:
Tweet media one
1
0
6
@GeZhang86038849
Ge Zhang
4 months
[2/n] Cheers to the Yi Team! Thanks for sharing the Yi-34B intermediate metrics with us to boost the research!
Tweet media one
1
0
6
@GeZhang86038849
Ge Zhang
2 months
It's extremely amazing! Kudos to all the teams working on transparent LLM and LLM democratization!
@llm360
LLM360
2 months
Please welcome K2-65B🏔️, the most performant fully-open LLM released to date. As a blueprint for open-source AGI, we release all model checkpoints, code, logs, and data. About K2: 🧠65 billion parameters 🪟Fully transparent & reproducible 🔓Apache 2.0 📈Outperforms Llama 2 70B
6
145
495
1
4
6
@GeZhang86038849
Ge Zhang
1 month
Check the amazing VideoScore led by Dongfu and Xuan He! One of the very first Model-based Video Generation Matrix!
@DongfuJiang
Dongfu Jiang
1 month
🔥Thrilled to announce 📽️VideoScore, the first-ever fine-grained and reliable evaluator/reward model for text-to-video generation tasks, which is trained on 🎞️VideoFeedback, a large-scale and fine-grained human-feedback dataset for text-to-video (T2V) generations. 🤔Why
Tweet media one
1
18
67
0
0
6
@GeZhang86038849
Ge Zhang
4 months
[8/n] Sources: 800B Chinese Pretrain Corpora(MAP-CC): CHC Bench: Intermediate CKPTs: Base Model: SFT Model: DPO Model: paper:
1
1
6
@GeZhang86038849
Ge Zhang
4 months
[3/n] MMLU, CMMLU, and CEVAL appear to assess overlapping capabilities of the models, leading to similar performance trends. Maybe bilingual LLMs can only measure one of them during the pretrain process.
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
4 months
[9/n] Enjoy a lot collaborating with Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma under the supervision of @WenhuChen and @bigaidream . Also thanks to @zhengtianyu4 , Zhouliang Yu, Ding Pan, @yizhilll , @RuiboLiu , Yue Wang, Shuyue Guo, Xingwei Qu, @xiangyue96 for their
2
0
5
@GeZhang86038849
Ge Zhang
5 months
Congrats to the Team! Glad to see that MMMU and CMMMU are included!
@BoLi68567011
Li Bo
5 months
Accelerating the Development of Large Multimoal Models with LMMs-Eval Repo: Blog: We are offering a one command evaluation API for fast and thorough evaluation of LMMs over 39 datasets (increasingly).
Tweet media one
Tweet media two
1
24
113
0
0
4
@GeZhang86038849
Ge Zhang
6 months
Thanks for sharing our work!
@arankomatsuzaki
Aran Komatsuzaki
6 months
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Presents an any-to-any multimodal LM that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music proj: abs:
Tweet media one
5
58
236
0
2
5
@GeZhang86038849
Ge Zhang
1 month
[1/n] New Benchmark Alert! LongIns () is a little "brother" of LongICLBench (), but it provides a possibility of more dynamically verifying LLM's long-context reasoning performance. Each sample in LongIns is composed of multiple
Tweet media one
1
4
5
@GeZhang86038849
Ge Zhang
3 months
[6/n] document converting pipeline:
Tweet media one
2
0
5
@GeZhang86038849
Ge Zhang
2 months
🔟Kudos to my co-leads: Scott Qu, @liujiaheng2 , our advisors: Jiajun Zhang, Wanli Ouyang, @HuangRubio , and @WenhuChen , and solid contributions from the whole team!
0
0
2
@GeZhang86038849
Ge Zhang
1 year
Excited to announce our Acoustic Music Bert. It's time to step into the next era for acoustic music understanding!
@yizhilll
Yizhi Li
1 year
1/ Excited to announce the release of our new paper "MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training"! We propose a self-supervised music understanding model, attaining overall SOTA performance on 14 MIR tasks.
Tweet media one
6
77
339
0
0
5
@GeZhang86038849
Ge Zhang
4 months
[5/n] For the Amber-7b model, there is a noticeable decline in capability in the 200b-300b token range, likely due to the pretrain corpus. Our hypothesis is that Amber pretrain corpus is not deduplicated well.
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
3 years
@SpoxCHNinUS A lot of comments here complaining about not taking foreign students back to China, but without given specific discriminatory policy. Instead, PP10043 is a clear systematic racist discriminatory policy. If USA a country with better democracy, why not just set an example?
0
1
4
@GeZhang86038849
Ge Zhang
4 months
@chasel_yan26765 @jzli413 @WenhuChen @bigaidream @Dudodododo @xaichuxue @abc43992899 @yizhilll @liujiaheng2 [10/n] We'd like to kindly ask for more Open-Source models releasing their intermediate ckpts to boost the research of Scaling Law and babysitting the pretraining of LLM! Thanks to @deepseek_ai and for their released intermediate ckpts as well!
1
0
4
@GeZhang86038849
Ge Zhang
4 months
[2/n] Data Statistics of Primary Branch of CodeEditorBench:
Tweet media one
1
0
4
@GeZhang86038849
Ge Zhang
3 months
[3/n] The data processing pipeline is available at: The heuristic rules of Chinese and English are carefully verified by human annotators.
1
2
4
@GeZhang86038849
Ge Zhang
1 year
A Larger Model and Paper Coming Soon! 🧑‍🎨
@bigaidream
Jie Fu
1 year
We’ve released two new music understanding models: and , which are trained with up to 160K hrs 24K Hz audio. Our models give strong performance on various (≥ 8) music info retrieval tasks. Paper coming soon! 🧵🔛
4
31
169
0
0
3
@GeZhang86038849
Ge Zhang
2 months
[5/n] Congrats to Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, and Shiwen Ni co-leading the amazing project!
0
0
4
@GeZhang86038849
Ge Zhang
2 months
[2/n] We collect 20,150 raw images from various renowned illustration websites. After the carefully designed three-stage data filtration procedure — image deduplication, text-to-image ratio control and human review, we get 1,222 images and 1,434 questions. II-Bench comprises
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
4
@GeZhang86038849
Ge Zhang
4 months
[6/n] It is observable that all models improve their abilities in tasks involving math, physical interaction understanding and commonsense reasoning in a relatively synchronized manner.
Tweet media one
1
0
4
@GeZhang86038849
Ge Zhang
4 months
[6/n] Introduction CHC-Bench, a MTbench-like benchmark for evaluating model’s understanding of Chinese culture, history, traditions, humanities, geography, and STEM in eight main Categories.
Tweet media one
Tweet media two
1
0
4
@GeZhang86038849
Ge Zhang
5 months
Thrilled to work with @alexzhuang_ @WenhuChen @bigaidream on the amazing StructLM! Try our generalized language model for structured knowledge grounding tasks!
@alexzhuang_
Alex Zhuang
5 months
[1/n] Excited to share StructLM🏗️, a series of models fine-tuned to generalize over structured knowledge grounding tasks. paper: - We achieve SoTA on 7/18 SKG tasks - On held out tasks, our 7B model 0-shot is 30% better than 1-shot ChatGPT-3.5.
Tweet media one
1
4
9
0
0
4
@GeZhang86038849
Ge Zhang
8 months
Glad to contribute to UniIR and M-BEIR! Information Retrieval is also important for unlocking the door of multimodal expert AGI!
@CongWei1230
Cong Wei
8 months
🚀 Introduce UniIR, a unified instruction-guided multimodal retriever handles diverse tasks. - 1️⃣model for 8️⃣ retrieval tasks (SoTA w/ Instruction-tuning) - Generalizes to unseen retrieval tasks. - M-BEIR: multimodal retrieval benchmark w/ 10 datasets, 1.1M queries, 5.6M cands.
Tweet media one
1
45
183
0
0
4
@GeZhang86038849
Ge Zhang
4 months
[7/n] Upon analyzing the graphs, it is evident that while the trend of increasing performance with larger datasets is present, the actual scores for each model at various training checkpoints do not precisely align with the expected trajectory of the scaling law.
Tweet media one
1
0
4
@GeZhang86038849
Ge Zhang
3 years
@ChinaDaily #stop10043 Grasp the final chance to provide innocent chinese scholars an opportunity to realize their academic dream.
0
0
4
@GeZhang86038849
Ge Zhang
5 months
Kudos to the Team! Glad to see that it achieves 37.9 on our CMMMU and 36.6 on MMMU, which is amazing! Try out our CMMMU on and . Let's begin the Chinese MModal Competition!
@deepseek_ai
DeepSeek
5 months
[1/5] 🚀 Announcing DeepSeek-VL, sota 1.3B and 7B visual-language models! Paper: GitHub: 📚 Diverse training corpus 👯 Hybrid Vision Encoder 🧠 3-stage training strategy 🆓 Totally free for commercial use and fully open-source
Tweet media one
7
61
303
0
2
4
@GeZhang86038849
Ge Zhang
3 months
[7/n] We also reproduce the pipeline introduced by deepseek-math to retrieve high-quality data from large massive pretrain corpus. The pipeline is available here:
2
0
4
@GeZhang86038849
Ge Zhang
4 months
[8/n] Reproduce-Amber Intermediate CKPTs are now available on . OpenLLaMA intermediate CKPTs are now available on .
1
0
4
@GeZhang86038849
Ge Zhang
2 months
@sivil_taram Believe that it‘s positive, if the query distribution is diverse enough. Several papers mention the importance of unifying the tone, like Yi Tech Report.
1
0
4
@GeZhang86038849
Ge Zhang
2 years
@yizhilll @bohao_yang @chenghua_lin @bigaidream The personal experience inspires exploring the topic. "Geography should not be destiny." People should not be defined by where they're born and suffer from system regional discrimination or any other type of discrimination. #aiforgood #fairness
1
1
4
@GeZhang86038849
Ge Zhang
4 months
Check out the amazing AnyV2V, led by Cong, Weiming, and Max!
@CongWei1230
Cong Wei
4 months
Checkout our AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks - Website: - Code: - ArXiv: - Huggingface Paper Page:
5
56
189
0
1
4
@GeZhang86038849
Ge Zhang
2 months
[3/n] Our findings as followed: 1. A significant difference exists in performance between humans and MLLMs: the highest accuracy achieved by the model is 74.8%, whereas the average accuracy for humans is 90%, with the highest reaching 98%. 2. Closed-source models often outperform
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
4
@GeZhang86038849
Ge Zhang
2 months
[4/n] Sources: Paper: HomePage: GitHub: HuggingFace:
1
0
4
@GeZhang86038849
Ge Zhang
2 months
@XueFz Yes. It might be a general problem that all LLM researchers meet when they share their papers, not only COLM. Some traditional ML or NLP researchers have difficulty getting the point. Only if you train it, you know what is hard and valuable.
0
0
2
@GeZhang86038849
Ge Zhang
9 months
MusiLingo is reported by AIHub: More efforts needed to be devoted to Music Foundation Model Development! Paper: Code:
1
2
3
@GeZhang86038849
Ge Zhang
4 months
Try our ChatMusician: to write a song for the amazing COLM_conf! All you need to do is follow the prompt and ask for ABC Notations from it.
@yoavartzi
Yoav Artzi (PC-ing COLM)
4 months
I hear text-to-music is having its moment. Can someone generate a theme song for @COLM_conf ? I am not musical enough to even prompt it
3
3
27
0
2
2
@GeZhang86038849
Ge Zhang
4 months
Thanks for sharing our work!
@arankomatsuzaki
Aran Komatsuzaki
4 months
The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints
Tweet media one
2
13
66
0
1
3
@GeZhang86038849
Ge Zhang
2 years
Our newest paper released: . It's essential to probe and mitigate discrimination based on humanity's beliefs. 🐶 explores how manually annotated data serves as a benchmark for gender bias probing and mitigation. @yizhilll @bigaidream @chenghua_lin
0
1
3
@GeZhang86038849
Ge Zhang
4 months
AnyV2V demo available at huggingface now!
@CongWei1230
Cong Wei
4 months
Our AnyV2V Official Demo🤗is now available at
0
6
40
0
0
3
@GeZhang86038849
Ge Zhang
2 months
@sbmaruf Open-weight LLMs at least. More transparent LLMs help the community more, like BLOOM, Pythia, Amber, OpenMoE, OLMo, CT-LLM, NEO.
1
0
1
@GeZhang86038849
Ge Zhang
4 months
[3/n] Final pretraining data distribution for CT-LLM.
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
4 months
[9/n] The amazing work is led by @Dudodododo @ZLY and @xaichuxue and mentored by @bigaidream , Guorui Zhou, @Hades317 , and @WenhuChen . Also thanks to the solid contribution from @DingPan144063 , Yuyang Cheng, @ddlbojack , @abc43992899 , Xingwei Qu, @liujiaheng2 , @zhengtianyu4 ,
0
0
3
@GeZhang86038849
Ge Zhang
9 months
ToM may reveal the next step of improving the learning efficiency and lifelong learning of LLM. Congratulations on the good work!
@ziqiao_ma
Martin Ziqiao Ma
9 months
Emerged or not emerged, that may not be the right question to ask for Theory of Mind (ToM) in #LLM . In our theme track paper in the Findings of #EMNLP2023 @emnlpmeeting , we asked ourselves (1) what constitutes a machine ToM? (2) How to better evaluate ToM in LLMs?...🧵[1/n]
Tweet media one
4
19
80
1
0
2
@GeZhang86038849
Ge Zhang
4 months
[9/n] Thank for @chasel_yan26765 @jzli413 @WenhuChen @bigaidream dedicating into "The Fine Line"! Also Thanks to our amazing team: Xinyao Niu, @Dudodododo @xaichuxue Haoran Zhang, Zhaoliang Chen, Xingwei Qu, @abc43992899 @yizhilll @liujiaheng2 Stephen W. Huang, and Shawn Yue
1
0
3
@GeZhang86038849
Ge Zhang
4 months
[4/n] Overview of CodeEditorBench. CodeEditorBench evaluates programming languages by selecting initial data from five sources and filtering based on code length. It enriches the dataset with Large Language Model-generated test cases, which, along with all code, are verified by
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
3 months
[4/n] English filtering pipeline:
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
3 months
@WenhuChen @RylanSchaeffer The token number of each global batch is 8M. We use 1024 global token size and 8192 context length. In the decay stage, we use 640 batch size and 8192 context length, the token number of each global batch is 5.24M.
0
0
2
@GeZhang86038849
Ge Zhang
4 months
[6/n] ## Benchmark results: Evaluating LLMs on CodeEditorBench. All results of models are generated by greedy decoding. Code Debug, Code Translate and Code Requirement Switch are evaluated with pass @1 , while Code Polish is evaluated with Mean OptScore. Values outside parentheses
Tweet media one
1
0
3
@GeZhang86038849
Ge Zhang
4 months
[11/n] We want to remind the Open Source researchers that we miss the Amber officially released intermediate ckpts(). And we'll also include the results of Amber/OLMo/Pythia in our next version of paper. Thanks for the reminder from @BlancheMinerva .
0
0
3
@GeZhang86038849
Ge Zhang
2 months
@billyuchenlin Thanks for including our MAP-Neo-7B-Instruct-v0.1() in the amazing WildBench()! Glad to see that MAP-Neo performs well on it given its 7B size! True Open-Source Power!
Tweet media one
0
0
3
@GeZhang86038849
Ge Zhang
4 months
[5/n] CodeEditorBench delineates the spectrum of code editing tasks, including Code Debugging, Code Translating, Code Polishing, and Code Requirement Switching. Each dataset entry shares similar attributes such as title, difficulty, public and private test inputs and outputs, as
Tweet media one
1
0
3