Ge Zhang @GeZhang86038849 Twitter profile

Last Seen Profiles

@Crystal_E3753

@neoelneoo

@FoadAlosta24474

@Trevornoah

@cbenaventep

@DJdZ8EdB2hmJ3R4

@YoItsAli3

@geminigenerall

@ywend

@SierraUzaki

@bokeplokalmalam

@JeffKCollins

@FreeNFT

@_IGC_

@real2030estaty

@WHS_LadykatSB

@jnobi_hor3

@ParveenKaswan

@TheStonedFrogs

@itirafet0

@bo3abdulla80

@jacketrose86

@CNMM73

@DegeleRaya65730

@Alliestrasza

@SmoovNLMB

@K9_KURO

@LeftWordBooks

@lixiaoy77791341

@Bushnell

@ssssknt

@WFP_Sudan

@FaZeClan

@MANTRA_Chain

@RogerJStoneJr

@TheRock

Ge Zhang

@GeZhang86038849

5 months

[1/n] 🚀 Excited to share our latest work on OpenCodeInterpreter! With a blend of execution results and human feedback, we've achieved significant advancements in code generation. Here are the key points: ✨ Introducing OpenCodeInterpreter - a leap in iterative code refinement.

13

61

217

Ge Zhang

@GeZhang86038849

3 months

I'm extremely excited to announce "the big bomb"!: Neo and Matrix, that we're working on with colleagues and friends from open-source community, , wuhan ai, and . Neo is the first fully-transparent bilingual large language model, with

8

50

201

Ge Zhang

@GeZhang86038849

4 months

[1/n] 🎉🎉🎉 Excited to share our latest work: "The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis"! We delve into the dynamics of LLMs across different scales and domains. 💡Highlights include: 🗺️ Comprehensive Model Evaluation:

2

29

104

Ge Zhang

@GeZhang86038849

6 months

What happens when decision networks encounter multimodal instruction? We explore enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a “read-to-play” capability. Paper: [1/n]

3

21

90

Ge Zhang

@GeZhang86038849

4 months

[1/n] 🚀🚀🚀 Excited to share our latest work: "CodeEditorBench:Evaluating Code Editing Capability of Large Language Models"! ### 🧐 Highlights of the CodeEditorBench: > 8K meticulously collected code editing questions from five sources: namely

2

19

74

Ge Zhang

@GeZhang86038849

4 months

[1/n] Happy to unveil the pioneering work behind "Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model". Our study presents CT-LLM, a model engineered with a deep focus on the Chinese language, marking a significant shift from traditional models that often

2

15

68

Ge Zhang

@GeZhang86038849

4 months

[1/n] Happy to share our new work "MuPT: A Generative Symbolic Music Pretrained Transformer", encompassing a series of music generation models ranging from 190 million to 4.2 billion parameters, all based on the ABC Notation. According to human preference evaluations, our models

2

22

67

Ge Zhang

@GeZhang86038849

3 months

Personally speaking, I believe that the paradigm of clearly spliting pretraining and alignment won't last long. There are tons of valuable instruction corpora or plain extracted text but with manually understandable implicit motivations and structures existing in the nowadays

Aran Komatsuzaki

@arankomatsuzaki

3 months

MAmmoTH2: Scaling Instructions from the Web - Proposes a paradigm to efficiently harvest 10M instruction data from web corpus to enhance LLM reasoning - 11% -> 34% on MATH and 36% -> 67% on GSM8K proj: abs:

7

25

178

3

5

37

Ge Zhang

@GeZhang86038849

4 months

Thanks for sharing our work! Congrats to Tianle and other team members! In-Context Learning Capacity is a strong reference of whether Long-context LLMs can really achieve compositional reasoning as they can when the context is shorter. Needle in a haystack focuses more on

Aran Komatsuzaki

@arankomatsuzaki

4 months

Long-context LLMs Struggle with Long In-context Learning Suggests a notable gap in current LLM capabilities for processing and understanding long, context-rich sequences.

5

40

216

1

3

29

Ge Zhang

@GeZhang86038849

6 months

We are delighted to announce CMMMU, A Chinese Massive Multi-discipline Multimodal Understanding Benchmark. CMMMU is inspired by and strictly follows the pattern of MMMU. CMMMU starts the race of Bilingual LMMs on complex reasoning!

1

12

29

Ge Zhang

@GeZhang86038849

2 months

MAP-Neo is another good example!() We release all resources needed to be on par with Mistral v0.2 and surpass LLaMA-2(code/pretrain corpus/data cleaning pipeline/etc) with the similar size. We are also glad to announce that we open-source the phase-2 SFT

Percy Liang

@percyliang

2 months

We should call models like Llama 3, Mixtral, etc. “open-weight models”, not “open-source models”. For a model to be open-source, the code and training data need to be public (good examples: GPT-J, OLMo, RedPajama, StarCoder, K2, etc.). Weights are like an exe file, which would be

45

319

2K

1

3

28

Ge Zhang

@GeZhang86038849

2 months

🚀 Excited to announce that the tech report of MAP-Neo (): a fully open-source and transparent bilingual LLM suite with superior performance to bridge the gap with closed-source models, is now available: 🔧MAP-Neo's workflow

TWITTER BANNER TITLE META TAG

TWITTER BANNER DESCRIPTION META TAG

map-neo.github.io

2

9

29

Ge Zhang

@GeZhang86038849

4 months

Thanks for tweeting COIG-CQIA! lol! Yea, indeed, we devote heavy human effort to manually collect and clean SFT corpora from Ruozhiba, A Chinese Sarcastic Joke Forum, and it indeed makes Yi-34B better! COIG-CQIA is a scaled-up Chinese version LIMA to support Open-Source SFT for

九原客

@9hills

4 months

看论文看到哈哈大笑，用「弱智吧」标题+GPT-4回答微调后的Yi-34B模型评估结果超过了精心收集的 SFT 指令集数据，安全性评估也是第二名。弱智吧就是百度弱智吧，里面的帖子是这种画风：「既然监狱里全是罪犯，👮♀️为什么不去监狱里抓人？」论文：

44

171

852

5

2

26

Ge Zhang

@GeZhang86038849

5 months

[2/n] Teaching code LLM to self-correct based on the executors is the way to achieve GPT-comparable code generation!

1

5

24

Ge Zhang

@GeZhang86038849

2 months

[1/n] New Paper Alert! Funny II-Bench Coming! Excited to introduce II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models. Check it out: In this study, we aims to evaluate the MLLMs’s higher-order perception of images.

2

8

21

Ge Zhang

@GeZhang86038849

3 months

I'm heading to Vienna after taking a nice travel and break. I will present MAmmoTH (MAmmoTH-2 coming soon as well) and have discussions about 4 of my recent works (MERT, MALMEN, STABLE-Alignment and MAmmoTH) with my amazing friends, mentors, and colleagues! I also bring a big

1

18

Ge Zhang

@GeZhang86038849

2 months

Thanks for sharing our work!

Aran Komatsuzaki

@arankomatsuzaki

2 months

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark abs: leaderboard:

1

16

115

0

1

17

Ge Zhang

@GeZhang86038849

25 days

Glad to announce that StructLM and CT-LLM are accepted to #COLM2024 . CT-LLM verifies that Chinese, as any other languages besides English, can generalize to emergent ability of other languages. StructLM is the SoTA foundation model for tackling different

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch,...

arxiv.org

Wenhu Chen

@WenhuChen

25 days

Delighted to have two papers accepted to #COLM2024 . The first paper is StructLM. It's the SoTA foundation model for tackling all different types of structure knowledge grounding tasks like tableQA, KBQA,etc. This work was led by an undergrad @alexzhuang_

1

8

52

0

3

16

Ge Zhang

@GeZhang86038849

2 months

To be honest, a reviewer should be qualified to review a pretrain related paper only if she/he is a core member of at least one of an Open Large Language Model. Otherwise, it is really disappointing, for a LLM conference, like COLM.

3

1

16

Ge Zhang

@GeZhang86038849

3 years

@SpokespersonCHN Winged words indeed! P.P. 10043 not only destroy young scholars' academic dream but also destroy people's confidence in academic integrity and racist condition in US.

1

0

15

Ge Zhang

@GeZhang86038849

8 months

Check out our recent work on MMMU (massive multi-task multimodal understanding)！ Multimodal Foundation Model should be able to understand the images and reason like LLM on MMLU. It's the time to take the next step towards a better Multimodal!

Xiang Yue

@xiangyue96

8 months

🚀 Introducing MMMU, a Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI. 🧐 Highlights of the MMMU benchmark: > 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks >

18

184

746

0

2

13

Ge Zhang

@GeZhang86038849

4 months

Thanks for sharing our work! A Love Letter to the Chinese NLP Community from the M-A-P!

Aran Komatsuzaki

@arankomatsuzaki

4 months

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model - Presents CT-LLM, a 2B LLM - Open-sourcing the full process of training, including a detailed data processing procedure hf: abs:

2

32

142

0

3

12

Ge Zhang

@GeZhang86038849

4 months

Thanks for sharing our MuPT!

Aran Komatsuzaki

@arankomatsuzaki

4 months

MuPT: A Generative Symbolic Music Pretrained Transformer Presents a series of pre-trained models for symbolic music generation based on Llama architecture proj: abs:

4

14

76

0

4

11

Ge Zhang

@GeZhang86038849

5 months

Thanks for sharing our work!

AK

@_akhaliq

5 months

OpenCodeInterpreter Integrating Code Generation with Execution and Refinement The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems

18

53

246

0

4

11

Ge Zhang

@GeZhang86038849

1 month

Glad to see that Claude-3.5-Sonnet performs amazing on our recently released II-Bench () and significantly surpasses the existing MLLMs! It really understands human's humors and visual implications!

1

2

11

Ge Zhang

@GeZhang86038849

2 months

Thanks for including our MAP-Neo-instruct-v0.1 in the amazing WildBench! True Open-Source Power!

Bill Yuchen Lin 🤖

@billyuchenlin

2 months

M-A-P/Neo-7B-Instruct is the 1st 💎fully-open💎 LLM on WildBench leaderboard and its performance is awesome. "Fully open-source" here means that all data for pre-training & post-training are open, code is open-source, in addition to the public model weights! As @percyliang

1

18

76

0

2

10

Ge Zhang

@GeZhang86038849

5 months

[4/n] A coooool demo video!!!!:

1

9

Ge Zhang

@GeZhang86038849

2 months

@agihippo It makes sense as well. But I believe LLaMA-3, Qwen Series, Yi Series are quite good as well.

1

0

7

Ge Zhang

@GeZhang86038849

5 months

[3/n] The statistics of our multi-round code generation instructions:

1

8

Ge Zhang

@GeZhang86038849

2 months

Amazing！ A real step towards a General World Model that: 1. Simulates world states by generating videos across any domains 2. Allows any-time control with free-text actions

Maitrix.org

@MaitrixOrg

2 months

🔥Introducing Pandora 🌏 🪐 a World Model that generates videos of world states with real-time language control 🎥🕹️ Simulate the world across domains in an _interactive_ way! check out more

7

75

233

0

1

9

Ge Zhang

@GeZhang86038849

2 months

@agihippo Maybe I'm ignorant. The data retrieval pipeline from Deepseek, the WSD LRS from MiniCPM, detailed tech report of Qwen/Yi/Deepseek/baichuan/skyword/... There are a lot of solid contributions from Chinese Open-source LLM community with relatively "limited" computational resource.

0

8

Ge Zhang

@GeZhang86038849

2 months

Real impressive work! Glad that MAP-Neo can be baseline of the amazing DCLM!

Aran Komatsuzaki

@arankomatsuzaki

2 months

DataComp-LM: In search of the next generation of training sets for language models - Provides a corpus of 240T tokens from Common Crawl - Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B proj:

1

46

210

0

2

9

Ge Zhang

@GeZhang86038849

4 months

Kudos to the Qwen Team! The SOTA-Level Code LLM should be capable of editing codes to satisfy programmers' requests indeed! Try out the amazing CodeQwen and our CodeEditorBench!

Binyuan Hui

@huybery

4 months

(4/n) 🧵 CodeQwen are Debuggers In assessing CodeQwen1.5’s proficiency in code modification tasks, we concentrated our evaluation on the CodeEditorBench suite, encompassing four distinct dimensions: Debugging, Translation, Language Switching, and Code Polishing. The results

1

13

0

9

Ge Zhang

@GeZhang86038849

2 months

我的Twitter互动圈 ➡️

1

0

8

Ge Zhang

@GeZhang86038849

9 months

100 citations!

0

8

Ge Zhang

@GeZhang86038849

2 months

My personal favorite idea in the past 2024 on letting more people benefit from LLMs: Domain-specific LLM is a real thing! There are tons of traditional companies feeling bad sharing their private data and on the way building their own LLMs. But they are with very limited

Quehry Que

@QuehryS

2 months

Excited to share our latest research paper! 📄📷 In this study, we explore Scaling Law in Domain-specific Continual Pre-training scenarios. Our findings reveal the relationship between model performance and mixture ratios. Check it out here: #ScalingLaw

1

6

0

1

8

Ge Zhang

@GeZhang86038849

1 year

A preliminary release for researchers of the Chinese community to play with it. Will keep updating it and welcome all collaborators to contribute to it. The general translated corpus doesn't use OpenAI API so it's capable of commercial use as well.

AK

@_akhaliq

1 year

Chinese Open Instruction Generalist: A Preliminary Release abs: dataset:

3

66

167

0

8

Ge Zhang

@GeZhang86038849

3 months

Yinghao @nicolaus625 will present MERT() at Hall B #282 from 10:45 am to 12:45 am today. I will present MAmmoTH() and discuss about MAmmoTH2() at Hall B #122 from 4:30 pm to 6:30 pm today. Come and Chat with

MAmmoTH2: Scaling Instructions from the Web

Instruction tuning improves the reasoning abilities of large language models (LLMs), with data quality and scalability being the crucial factors. Most instruction tuning data come from human...

arxiv.org

0

1

7

Ge Zhang

@GeZhang86038849

11 months

The experiment results of MAmmoTH reveal that different math metrics do not necessarily improve simultaneously but can all benefit from transfer learning of a well-designed mixed-up instruction tuning set. It's very exciting to be part of the work!

Wenhu Chen

@WenhuChen

11 months

Excited to introduce our latest math generalist model MAmmoTH 🦣, built through instruction tuning. We proposed hybrid "chain-of-thought" & "program-of-thought" training to supercharge LLMs' math reasoning capabilities. 🦣 beats the open SoTA by 20+% on many datasets like MATH.

9

41

256

0

1

7

Ge Zhang

@GeZhang86038849

2 months

Thanks a lot! LLM360 team has done amazing exploration work for the whole fully open LLM community by developing amber and crystal, and further scaling to K2! All fully open LLM teams share the same motivation purchasing LLM Democratization and Real Open Science!

LLM360

@llm360

2 months

🎉 Congratulations to an awesome fully open source model, by the m-a-p team! Paper: 📎 Includes great info on: -Data Curation -Infra details -Intermediate checkpoints -Scaling law LLM360 is happy to work with this thriving community on open source AI.

1

2

32

1

7

Ge Zhang

@GeZhang86038849

5 months

Thanks for sharing our work! Try out our StructLM!

AK

@_akhaliq

5 months

StructLM Towards Building Generalist Models for Structured Knowledge Grounding Structured data sources, such as tables, graphs, and databases, are ubiquitous knowledge sources. Despite the demonstrated capabilities of large language models (LLMs) on plain text, their

2

55

332

0

1

7

Ge Zhang

@GeZhang86038849

4 months

[8/n] ## Resources: >arXiv: >🤗Dataset: >Code: >Website:

GitHub - CodeEditorBench/CodeEditorBench

Contribute to CodeEditorBench/CodeEditorBench development by creating an account on GitHub.

github.com

1

6

Ge Zhang

@GeZhang86038849

3 months

[2/n] Neo's major metrics performance:

1

0

6

Ge Zhang

@GeZhang86038849

4 months

[2/n] Cheers to the Yi Team! Thanks for sharing the Yi-34B intermediate metrics with us to boost the research!

1

0

6

Ge Zhang

@GeZhang86038849

2 months

It's extremely amazing! Kudos to all the teams working on transparent LLM and LLM democratization!

LLM360

@llm360

2 months

Please welcome K2-65B🏔️, the most performant fully-open LLM released to date. As a blueprint for open-source AGI, we release all model checkpoints, code, logs, and data. About K2: 🧠65 billion parameters 🪟Fully transparent & reproducible 🔓Apache 2.0 📈Outperforms Llama 2 70B

6

145

495

1

4

6

Ge Zhang

@GeZhang86038849

1 month

Check the amazing VideoScore led by Dongfu and Xuan He! One of the very first Model-based Video Generation Matrix!

Dongfu Jiang

@DongfuJiang

1 month

🔥Thrilled to announce 📽️VideoScore, the first-ever fine-grained and reliable evaluator/reward model for text-to-video generation tasks, which is trained on 🎞️VideoFeedback, a large-scale and fine-grained human-feedback dataset for text-to-video (T2V) generations. 🤔Why

1

18

67

0

6

Ge Zhang

@GeZhang86038849

4 months

[8/n] Sources: 800B Chinese Pretrain Corpora(MAP-CC): CHC Bench: Intermediate CKPTs: Base Model: SFT Model: DPO Model: paper:

m-a-p/CT-LLM-SFT-DPO · Hugging Face

huggingface.co

1

6

Ge Zhang

@GeZhang86038849

4 months

[3/n] MMLU, CMMLU, and CEVAL appear to assess overlapping capabilities of the models, leading to similar performance trends. Maybe bilingual LLMs can only measure one of them during the pretrain process.

1

0

3

Ge Zhang

@GeZhang86038849

4 months

[9/n] Enjoy a lot collaborating with Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma under the supervision of @WenhuChen and @bigaidream . Also thanks to @zhengtianyu4 , Zhouliang Yu, Ding Pan, @yizhilll , @RuiboLiu , Yue Wang, Shuyue Guo, Xingwei Qu, @xiangyue96 for their

2

0

5

Ge Zhang

@GeZhang86038849

5 months

Congrats to the Team! Glad to see that MMMU and CMMMU are included!

Li Bo

@BoLi68567011

5 months

Accelerating the Development of Large Multimoal Models with LMMs-Eval Repo: Blog: We are offering a one command evaluation API for fast and thorough evaluation of LMMs over 39 datasets (increasingly).

1

24

113

0

4

Ge Zhang

@GeZhang86038849

6 months

Thanks for sharing our work!

Aran Komatsuzaki

@arankomatsuzaki

6 months

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Presents an any-to-any multimodal LM that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music proj: abs:

5

58

236

0

2

5

Ge Zhang

@GeZhang86038849

1 month

[1/n] New Benchmark Alert! LongIns () is a little "brother" of LongICLBench (), but it provides a possibility of more dynamically verifying LLM's long-context reasoning performance. Each sample in LongIns is composed of multiple

1

4

5

Ge Zhang

@GeZhang86038849

3 months

[6/n] document converting pipeline:

2

0

5

Ge Zhang

@GeZhang86038849

2 months

🔟Kudos to my co-leads: Scott Qu, @liujiaheng2 , our advisors: Jiajun Zhang, Wanli Ouyang, @HuangRubio , and @WenhuChen , and solid contributions from the whole team!

0

2

Ge Zhang

@GeZhang86038849

1 year

Excited to announce our Acoustic Music Bert. It's time to step into the next era for acoustic music understanding!

Yizhi Li

@yizhilll

1 year

1/ Excited to announce the release of our new paper "MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training"! We propose a self-supervised music understanding model, attaining overall SOTA performance on 14 MIR tasks.

6

77

339

0

5

Ge Zhang

@GeZhang86038849

4 months

[5/n] For the Amber-7b model, there is a noticeable decline in capability in the 200b-300b token range, likely due to the pretrain corpus. Our hypothesis is that Amber pretrain corpus is not deduplicated well.

1

0

3

Ge Zhang

@GeZhang86038849

3 years

@SpoxCHNinUS A lot of comments here complaining about not taking foreign students back to China, but without given specific discriminatory policy. Instead, PP10043 is a clear systematic racist discriminatory policy. If USA a country with better democracy, why not just set an example?

0

1

4

Ge Zhang

@GeZhang86038849

4 months

@chasel_yan26765 @jzli413 @WenhuChen @bigaidream @Dudodododo @xaichuxue @abc43992899 @yizhilll @liujiaheng2 [10/n] We'd like to kindly ask for more Open-Source models releasing their intermediate ckpts to boost the research of Scaling Law and babysitting the pretraining of LLM! Thanks to @deepseek_ai and for their released intermediate ckpts as well!

1

0

4

Ge Zhang

@GeZhang86038849

4 months

[2/n] Data Statistics of Primary Branch of CodeEditorBench:

1

0

4

Ge Zhang

@GeZhang86038849

3 months

[3/n] The data processing pipeline is available at: The heuristic rules of Chinese and English are carefully verified by human annotators.

1

2

4

Ge Zhang

@GeZhang86038849

5 months

@zhengtianyu4 @xiangyue96 @tianhaoshen @billyuchenlin @bigaidream @WenhuChen [7/n] data:

m-a-p/Code-Feedback · Datasets at Hugging Face

huggingface.co

1

0

3

Ge Zhang

@GeZhang86038849

6 months

@JustinLin610 @01AI_Yi @beichen1019 @WenhuChen @bigaidream @chenghua_lin @ZenMoore1 @yizhilll @abc43992899 @zhengtianyu4 @yudongliu1997 @FengjiZhang98 code: web: arxiv: hf data: hf paper: \thread end

Paper page - CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

huggingface.co

0

2

4

Ge Zhang

@GeZhang86038849

1 year

A Larger Model and Paper Coming Soon! 🧑‍🎨

Jie Fu

@bigaidream

1 year

We’ve released two new music understanding models: and , which are trained with up to 160K hrs 24K Hz audio. Our models give strong performance on various (≥ 8) music info retrieval tasks. Paper coming soon! 🧵🔛

4

31

169

0

3

Ge Zhang

@GeZhang86038849

2 months

[5/n] Congrats to Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, and Shiwen Ni co-leading the amazing project!

0

4

Ge Zhang

@GeZhang86038849

2 months

[2/n] We collect 20,150 raw images from various renowned illustration websites. After the carefully designed three-stage data filtration procedure — image deduplication, text-to-image ratio control and human review, we get 1,222 images and 1,434 questions. II-Bench comprises

2

0

4

Ge Zhang

@GeZhang86038849

4 months

[6/n] It is observable that all models improve their abilities in tasks involving math, physical interaction understanding and commonsense reasoning in a relatively synchronized manner.

1

0

4

Ge Zhang

@GeZhang86038849

4 months

[6/n] Introduction CHC-Bench, a MTbench-like benchmark for evaluating model’s understanding of Chinese culture, history, traditions, humanities, geography, and STEM in eight main Categories.

1

0

4

Ge Zhang

@GeZhang86038849

5 months

Thrilled to work with @alexzhuang_ @WenhuChen @bigaidream on the amazing StructLM! Try our generalized language model for structured knowledge grounding tasks!

Alex Zhuang

@alexzhuang_

5 months

[1/n] Excited to share StructLM🏗️, a series of models fine-tuned to generalize over structured knowledge grounding tasks. paper: - We achieve SoTA on 7/18 SKG tasks - On held out tasks, our 7B model 0-shot is 30% better than 1-shot ChatGPT-3.5.

1

4

9

0

4

Ge Zhang

@GeZhang86038849

8 months

Glad to contribute to UniIR and M-BEIR! Information Retrieval is also important for unlocking the door of multimodal expert AGI!

Cong Wei

@CongWei1230

8 months

🚀 Introduce UniIR, a unified instruction-guided multimodal retriever handles diverse tasks. - 1️⃣model for 8️⃣ retrieval tasks (SoTA w/ Instruction-tuning) - Generalizes to unseen retrieval tasks. - M-BEIR: multimodal retrieval benchmark w/ 10 datasets, 1.1M queries, 5.6M cands.

1

45

183

0

4

Ge Zhang

@GeZhang86038849

4 months

[7/n] Upon analyzing the graphs, it is evident that while the trend of increasing performance with larger datasets is present, the actual scores for each model at various training checkpoints do not precisely align with the expected trajectory of the scaling law.

1

0

4

Ge Zhang

@GeZhang86038849

3 years

@ChinaDaily #stop10043 Grasp the final chance to provide innocent chinese scholars an opportunity to realize their academic dream.

0

4

Ge Zhang

@GeZhang86038849

5 months

Kudos to the Team! Glad to see that it achieves 37.9 on our CMMMU and 36.6 on MMMU, which is amazing! Try out our CMMMU on and . Let's begin the Chinese MModal Competition!

EvalAI: Evaluating state of the art in AI

EvalAI is an open-source web platform for organizing and participating in challenges to push the state of the art on AI tasks.

eval.ai

DeepSeek

@deepseek_ai

5 months

[1/5] 🚀 Announcing DeepSeek-VL, sota 1.3B and 7B visual-language models! Paper: GitHub: 📚 Diverse training corpus 👯 Hybrid Vision Encoder 🧠 3-stage training strategy 🆓 Totally free for commercial use and fully open-source

7

61

303

0

2

4

Ge Zhang

@GeZhang86038849

3 months

[7/n] We also reproduce the pipeline introduced by deepseek-math to retrieve high-quality data from large massive pretrain corpus. The pipeline is available here:

2

0

4

Ge Zhang

@GeZhang86038849

4 months

[8/n] Reproduce-Amber Intermediate CKPTs are now available on . OpenLLaMA intermediate CKPTs are now available on .

OpenLLaMA-Reproduce-Intermediate-CKPTs (The Fine Line) - a m-a-p Collection

huggingface.co

1

0

4

Ge Zhang

@GeZhang86038849

2 months

@sivil_taram Believe that it‘s positive, if the query distribution is diverse enough. Several papers mention the importance of unifying the tone, like Yi Tech Report.

1

0

4

Ge Zhang

@GeZhang86038849

2 years

@yizhilll @bohao_yang @chenghua_lin @bigaidream The personal experience inspires exploring the topic. "Geography should not be destiny." People should not be defined by where they're born and suffer from system regional discrimination or any other type of discrimination. #aiforgood #fairness

1

4

Ge Zhang

@GeZhang86038849

4 months

Check out the amazing AnyV2V, led by Cong, Weiming, and Max!

Cong Wei

@CongWei1230

4 months

Checkout our AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks - Website: - Code: - ArXiv: - Huggingface Paper Page:

5

56

189

0

1

4

Ge Zhang

@GeZhang86038849

5 months

@zhengtianyu4 @xiangyue96 @tianhaoshen @billyuchenlin @bigaidream @WenhuChen [6/n] Website: Paper: Github: Huggingface:

OpenCodeInterpreter - a m-a-p Collection

huggingface.co

1

3

Ge Zhang

@GeZhang86038849

2 months

[3/n] Our findings as followed: 1. A significant difference exists in performance between humans and MLLMs: the highest accuracy achieved by the model is 74.8%, whereas the average accuracy for humans is 90%, with the highest reaching 98%. 2. Closed-source models often outperform

1

0

4

Ge Zhang

@GeZhang86038849

2 months

[4/n] Sources: Paper: HomePage: GitHub: HuggingFace:

m-a-p/II-Bench · Datasets at Hugging Face

huggingface.co

1

0

4

Ge Zhang

@GeZhang86038849

2 months

@XueFz Yes. It might be a general problem that all LLM researchers meet when they share their papers, not only COLM. Some traditional ML or NLP researchers have difficulty getting the point. Only if you train it, you know what is hard and valuable.

0

2

Ge Zhang

@GeZhang86038849

9 months

Code:

GitHub - zihaod/MusiLingo

Contribute to zihaod/MusiLingo development by creating an account on GitHub.

github.com

0

1

Ge Zhang

@GeZhang86038849

9 months

MusiLingo is reported by AIHub: More efforts needed to be devoted to Music Foundation Model Development! Paper: Code:

GitHub - zihaod/MusiLingo

Contribute to zihaod/MusiLingo development by creating an account on GitHub.

github.com

1

2

3

Ge Zhang

@GeZhang86038849

4 months

Try our ChatMusician: to write a song for the amazing COLM_conf! All you need to do is follow the prompt and ask for ABC Notations from it.

m-a-p/ChatMusician · Hugging Face

huggingface.co

Yoav Artzi (PC-ing COLM)

@yoavartzi

4 months

I hear text-to-music is having its moment. Can someone generate a theme song for @COLM_conf ? I am not musical enough to even prompt it

3

27

0

2

Ge Zhang

@GeZhang86038849

4 months

Thanks for sharing our work!

Aran Komatsuzaki

@arankomatsuzaki

4 months

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints

2

13

66

0

1

3

Ge Zhang

@GeZhang86038849

2 years

Our newest paper released: . It's essential to probe and mitigate discrimination based on humanity's beliefs. 🐶 explores how manually annotated data serves as a benchmark for gender bias probing and mitigation. @yizhilll @bigaidream @chenghua_lin

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data...

arxiv.org

0

1

3

Ge Zhang

@GeZhang86038849

4 months

AnyV2V demo available at huggingface now!

Cong Wei

@CongWei1230

4 months

Our AnyV2V Official Demo🤗is now available at

0

6

40

0

3

Ge Zhang

@GeZhang86038849

2 months

@sbmaruf Open-weight LLMs at least. More transparent LLMs help the community more, like BLOOM, Pythia, Amber, OpenMoE, OLMo, CT-LLM, NEO.

1

0

1

Ge Zhang

@GeZhang86038849

4 months

[3/n] Final pretraining data distribution for CT-LLM.

1

0

3

Ge Zhang

@GeZhang86038849

4 months

[9/n] The amazing work is led by @Dudodododo @ZLY and @xaichuxue and mentored by @bigaidream , Guorui Zhou, @Hades317 , and @WenhuChen . Also thanks to the solid contribution from @DingPan144063 , Yuyang Cheng, @ddlbojack , @abc43992899 , Xingwei Qu, @liujiaheng2 , @zhengtianyu4 ,

0

3

Ge Zhang

@GeZhang86038849

9 months

ToM may reveal the next step of improving the learning efficiency and lifelong learning of LLM. Congratulations on the good work!

Martin Ziqiao Ma

@ziqiao_ma

9 months

Emerged or not emerged, that may not be the right question to ask for Theory of Mind (ToM) in #LLM . In our theme track paper in the Findings of #EMNLP2023 @emnlpmeeting , we asked ourselves (1) what constitutes a machine ToM? (2) How to better evaluate ToM in LLMs?...🧵[1/n]

4

19

80

1

0

2

Ge Zhang

@GeZhang86038849

4 months

[9/n] Thank for @chasel_yan26765 @jzli413 @WenhuChen @bigaidream dedicating into "The Fine Line"! Also Thanks to our amazing team: Xinyao Niu, @Dudodododo @xaichuxue Haoran Zhang, Zhaoliang Chen, Xingwei Qu, @abc43992899 @yizhilll @liujiaheng2 Stephen W. Huang, and Shawn Yue

1

0

3

Ge Zhang

@GeZhang86038849

4 months

[4/n] Overview of CodeEditorBench. CodeEditorBench evaluates programming languages by selecting initial data from five sources and filtering based on code length. It enriches the dataset with Large Language Model-generated test cases, which, along with all code, are verified by

1

0

3

Ge Zhang

@GeZhang86038849

3 months

[4/n] English filtering pipeline:

1

0

3

Ge Zhang

@GeZhang86038849

3 months

@WenhuChen @RylanSchaeffer The token number of each global batch is 8M. We use 1024 global token size and 8192 context length. In the decay stage, we use 640 batch size and 8192 context length, the token number of each global batch is 5.24M.

0

2

Ge Zhang

@GeZhang86038849

4 months

[6/n] ## Benchmark results: Evaluating LLMs on CodeEditorBench. All results of models are generated by greedy decoding. Code Debug, Code Translate and Code Requirement Switch are evaluated with pass @1 , while Code Polish is evaluated with Mean OptScore. Values outside parentheses

1

0

3

Ge Zhang

@GeZhang86038849

4 months

[11/n] We want to remind the Open Source researchers that we miss the Amber officially released intermediate ckpts(). And we'll also include the results of Amber/OLMo/Pythia in our next version of paper. Thanks for the reminder from @BlancheMinerva .

LLM360/Amber at main

huggingface.co

0

3

Ge Zhang

@GeZhang86038849

2 months

@billyuchenlin Thanks for including our MAP-Neo-7B-Instruct-v0.1() in the amazing WildBench()! Glad to see that MAP-Neo performs well on it given its 7B size! True Open-Source Power!

0

3

Ge Zhang

@GeZhang86038849

4 months

[5/n] CodeEditorBench delineates the spectrum of code editing tasks, including Code Debugging, Code Translating, Code Polishing, and Code Requirement Switching. Each dataset entry shares similar attributes such as title, difficulty, public and private test inputs and outputs, as

1

0

3