Shizhe Diao @shizhediao Twitter profile | Pikagi

Pikagi

Shizhe Diao

@shizhediao

2,361

Followers

1,235

Following

36

Media

299

Statuses

Research Scientist @NVIDIA focusing on efficient post-training of LLMs. Finetuning your own LLMs with LMFlow: Views are my own.

Santa Clara, CA

https://t.co/VsI9xMTy00

Joined January 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@shizhediao

Shizhe Diao

3 months

Excited to share our R-Tuning got an outstanding paper award @NAACL 2024! Take a look at this paper to see how to align your LLMs to honesty. This work is finished during my visit at UIUC. Thanks for Prof. Ji and Prof. Zhang’s supervision!

Tweet card media

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. A predominant issue is the propensity for these models to...

@hengjinlp

Heng Ji

3 months

We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han @Glaciohound will be on academic job market next year!

Tweet media one

Tweet media two

Tweet media three

Tweet media four

17

13

228

12

9

77

Last Seen Profiles

@LCHS_LadyTigers

@whatdaheallll

@lollpog

@JulieWittekind

@faisalrafi

@ValdaVinson

@Nilsinroble1

@RATSiRay

@hassertive

@williamblakebot

@azgrl007

@SSN_WYO

@andyliverbird

@cukienaknikmati

@6iZZeVvPSuL2n6Y

@ia_xilp

@fowler_lara

@MattKrauseTX

@Gaziantepevli0

@tnproduct

@VeganSupremeTM

@stw_pdg

@Sportskeeda

@zxcx789

@_mlbfits

@rhayez

@Lilka_xo

@Harper_K4449

@GunnerOaklaw

@fabio1971

@limitedCBC

@WebbAFC

@MyMMANews

@lyre_sky

@BinorRaja

@toineupnext

@shizhediao

Shizhe Diao

8 months

Can we align LLMs to honesty via instruction finetuning? Can we instruct LLMs to say I Don't Know? Can uncertainty learning improve prediction ability? Excited to share R-Tuning, Refusal-Aware Instruction Tuning to tackle hallucination in LLMs. Paper:

Tweet media one

13

99

402

@shizhediao

Shizhe Diao

2 months

🌟A new chapter in my career journey I’m thrilled to be joining @NVIDIA Research as a Research Scientist working on foundation models. Looking forward to contributing to groundbreaking innovations and working with an incredible team! #NVIDIAlife #NVIDIA

Tweet media one

27

6

305

@shizhediao

Shizhe Diao

1 year

🎉Exciting news!🎉 Our finetuned Robin-33B-V2 scored an impressive 64.1 on the @huggingface LLM leaderboard in our offline evaluation! 🔍Check out our Robin-V2 series models, including 7B, 13B, 33B, and 65B versions. Upgrade your language modeling game today! #NLP #LanguageModels

Tweet media one

@OptimalScale

OptimalScale

1 year

Introducing the Robin V2 - a leap in LLM fine-tuning with LMFlow! 🚀Defeating major open-source LLMs like Falcon, LLaMA & more. Robin-7B to 65B boast impressive scores in OpenLLM.🥇Deep-tuned & optimized for accuracy in multiple domains. For results, check out LMFlow-benchmark.📊

1

12

82

5

43

292

@shizhediao

Shizhe Diao

6 months

Happy to share R-Tuning got accepted to #NAACL2024 main! We introduce Refusal-Aware Instruction Tuning to tackle hallucination in LLMs. So that the LLMs could say I Don't Know now! Goal: Alignment for Honesty Paper:

Tweet media one

4

34

153

@shizhediao

Shizhe Diao

1 year

🥳LMFlow paper is out! ⏩ LMFlow is an extensible toolkit for fine-tuning and inference of LLMs (e.g., Robin🐦!). 🔎Check out our implementation details at Everything from code to model weights is fully available for you to explore!

Tweet media one

4

37

138

@shizhediao

Shizhe Diao

7 days

We are hiring!😃 Our team is expanding, and we’re looking for passionate researchers to join us in advancing the frontiers of LLM/VLM efficiency. Join us in doing impactful and innovative research!

@PavloMolchanov

Pavlo Molchanov

@PavloMolchanov

7 days

🚀 Our team is hiring! Join to Advance Efficiency in Deep Learning at NVIDIA! 🚀 🔗 Apply here: Our team, Deep Learning Efficiency Research () at NVIDIA Research, is about a year old, and we are expanding. We're looking for

Tweet media one

1

34

193

6

16

163

@shizhediao

Shizhe Diao

3 months

Can we teach LLMs to express fine-grained confidence? Can we instruct LLMs to explain their uncertainty? Introducing SaySelf: a framework to boost LLMs' reliability by teaching them to provide fine-grained confidence and self-reflective rationales. Paper:

Tweet media one

6

34

137

@shizhediao

Shizhe Diao

1 year

Robin-7b-v2 is a model finetuned from LLaMA-7B and it scored 51.7 in the OpenLLM leaderboard. @ClementDelangue Checkpoints are released: Try our online demo from HF space:

Robin 7b - a Hugging Face Space by OptimalScale

4

21

62

@shizhediao

Shizhe Diao

3 months

LMFlow got Best Demo Paper award! LMFlow is a lightweight toolkit for finetuning customised LLMs. We are iterating quickly, so stay tuned for more new features! 🥳

@naaclmeeting

NAACL HLT 2025

3 months

Some of the award winners in this edition of #NAACL2024

Tweet media one

Tweet media two

Tweet media three

Tweet media four

0

2

47

9

7

56

@shizhediao

Shizhe Diao

11 months

I am diving into the LLM job market! 🚀 Actively seeking full-time industry jobs with my research on large foundation models. Discover more about my work! ➡️ #JobSearch #AIResearch #CareerOpportunities #OpenToWork

4

10

51

@shizhediao

Shizhe Diao

15 days

Over the past few weeks, we’ve been rolling out a series of pruned + distilled small models, along with a complete training recipe (). Nemotron-4 15B ➡️ Minitron 4B Llama-3.1 8B ➡️ Llama-3.1-Minitron-4B Mistral NeMo 12B ➡️ MN-Minitron-8B 🔥 In addition,

Tweet media one

3

6

42

@shizhediao

Shizhe Diao

2 years

I created a paper list about ChatGPT with the goal of helping everyone learn the techniques behind it. 🔥It reached 136 stars within just a few days 🙌Please, send PRs with your favorite work. #MachineLearning #ChatGPT #paperlist (generated by ChatGPT😁)

GitHub - shizhediao/ChatGPTPapers: Must-read papers, related blogs and API tools on the pre-train...

Must-read papers, related blogs and API tools on the pre-training and tuning methods for ChatGPT. - shizhediao/ChatGPTPapers

1

5

39

@shizhediao

Shizhe Diao

5 months

Check out our recent efforts on memory-efficient fine-tuning of LLMs! We investigate the layerwise properties of LoRA on fine-tuning tasks and propose Layerwise Importance Sampled AdamW (𝑳𝑰𝑺𝑨), a promising alternative for LoRA. HuggingFace paper:

Tweet card media

Paper page - LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-T...

@rui4research

Rui

5 months

Excited to share LISA, which enables - 7B tuning on a 24GB GPU - 70B tuning on 4x80GB GPUs and obtains better performance than LoRA in ~50% less time 🚀

Tweet media one

8

114

551

0

4

30

@shizhediao

Shizhe Diao

1 year

Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀. ⭐️Code: 🔍Documentation: (1/n)

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

3

4

30

@shizhediao

Shizhe Diao

1 month

Recently I often reflect on the idea of overfitting benchmarks and wonder if we can achieve AGI by “overfitting the world.” Today, I tried a SOTA SLM, and it felt like a baby who has only ever studied textbooks and math, but can’t even say hello… #AI #AGI #MachineLearning

Tweet media one

2

0

28

@shizhediao

Shizhe Diao

2 months

Minitron is the latest LLM series developed by NVIDIA. It is super efficient (just 2.6B active non-embedding parameters) and super powerful (MMLU + 14%)! 🥳 The competition among small-scale models (under 10B parameters) seems to be getting increasingly intense...😮

@PavloMolchanov

Pavlo Molchanov

@PavloMolchanov

2 months

🚀 40x Faster Model Training via Pruning and Distillation! Permissive Minitron-4B and Minitron-8B models! 🔗 Paper: 🔗 GitHub: 🔗 Models on HF: Key highlights of 4B/8B models: 📊 2.6B/6.2B active

Tweet media one

5

47

163

0

1

28

@shizhediao

Shizhe Diao

2 years

We're excited to share that our work ‘Prefix Language Models are Unified Modal Learners’ has been accepted to #ICLR2023 ! 🥳 Paper📜: Code🧑‍💻: The code will be released soon! @wangchunshu

GitHub - shizhediao/DaVinci: Source code for the paper "Prefix Language Models are Unified Modal...

Source code for the paper "Prefix Language Models are Unified Modal Learners" - shizhediao/DaVinci

1

3

24

@shizhediao

Shizhe Diao

9 months

Thanks for following our work! Instructing towards honest LLMs is an interesting topic and it is also important for alignment research. A surprising discovery: Learning uncertainty improves model calibration (expected) and prediction (surprising)! 🌟

@leoyerrrr

HanRong YE

9 months

🌟Read a paper on "Refusal-Aware Instruction Tuning", which is pretty interesting. They identified the knowledge gap between LLMs and instruction tuning data. With R-Tuning, they teach LLMs to recognize when they lack knowledge and resist making things up.

Tweet media one

2

12

26

1

11

22

@shizhediao

Shizhe Diao

7 months

Curious about how severe the alignment tax is on LLMs' general capabilities? Eager to mitigate the alignment tax? We explored a frustratingly easy approach: Model Averaging. It's astonishingly effective, outperforming numerous baselines! 🔎Paper:

Tweet media one

3

3

19

@shizhediao

Shizhe Diao

2 months

Very insightful post! I would like to mention our R-Tuning paper () towards refusing / abstaining to answer unknown questions by fine-tuning, showcasing how LLMs can effectively acknowledge when they don’t know the answer. #MitigatingExtrinsicHallucinations

Tweet card media

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. A predominant issue is the propensity for these models to...

@lilianweng

Lilian Weng

2 months

Wrote about extrinsic hallucinations during the July 4th break. Here is what ChatGPT suggested as a fun tweet for the blog: 🚀 Dive into the wild world of AI hallucinations! 🤖 Discover how LLMs can conjure up some seriously creative (and sometimes

21

178

967

3

1

19

@shizhediao

Shizhe Diao

1 year

Davinci is a foundation model capable of various tasks across modalities (language/vision/vision+language) and types (understanding/generation)! Don't miss our poster presentation at ICLR in Kigali tomorrow! Hit us up with a DM to hang out and chat about our research. #ICLR2023

Tweet media one

1

2

18

@shizhediao

Shizhe Diao

17 days

Exciting updates from our Minitron project. Welcome to check our best 8B Base model via pruning and distillation!🥳

@PavloMolchanov

Pavlo Molchanov

@PavloMolchanov

17 days

🌟 The best 8B Base model via pruning and distillation! 🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B. Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens. Result: the

Tweet media one

4

51

156

1

1

18

@shizhediao

Shizhe Diao

1 year

@ClementDelangue @huggingface Hi, Thanks for your attention! We have released all of these checkpoints at huggingface hub. Please try it out: Robin-65b-v2: Robin-33b-v2: Robin-13b-v2: Robin-7b-v2:

OptimalScale/robin-7b-v2-delta · Hugging Face

1

3

17

@shizhediao

Shizhe Diao

3 months

🥰Happy to share LMFlow got accepted to #NAACL2024 demo track! now hosts camera-ready hot takes on: -One-stop lightweight toolkit for LLM fine-tuning -Support SOTA techniques like LISA -Streamlining scientific LLM development like AstroLLaMA-Chat,MarineGPT

@shizhediao

Shizhe Diao

1 year

Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀. ⭐️Code: 🔍Documentation: (1/n)

3

4

30

1

5

15

@shizhediao

Shizhe Diao

23 days

Llama-3.1-Minitron 4B is a very good small language model developed by our team. Please check it and enjoy!🤗

@NVIDIAAIDev

NVIDIA AI Developer

23 days

See how our #NVIDIAResearch team has developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation - offering several advantages for developers: ✅ 16% better performance on MMLU scores ✅ 40x fewer

Tweet media one

2

27

117

0

1

16

@shizhediao

Shizhe Diao

1 year

Excited to share DetGPT! With its powerful reasoning and image understanding capabilities, it accurately locates and provides details about target objects. It can even interpret user context to find objects that aren't explicitly named. #Robotics 👋

@OptimalScale

OptimalScale

1 year

🔍Check out our new object detector DetGPT! It's unlike any other - using instructions, it can not only find objects, but also reason under complex contexts. Whether you need a cold beer or help in Zelda, DetGPT has you covered.🎯 #objectdetection #ChatGPT

5

4

9

0

5

14

@shizhediao

Shizhe Diao

1 year

Speculative Decoding is a game-changer for model inference speeds! 🚀 Witness the magic with LMFlow's new update! #LMFlowBoost #LLMSpeedUp 🔥

@OptimalScale

OptimalScale

1 year

LMFlow now supports Speculative Decoding! 🥳 Experience faster model inference without retraining or tweaking the architecture! We're seeing close to a 4x speed boost (gpt2-xl <- gpt2). 🚀 Check out the speculative_inference.py in LMFlow!

Tweet media one

0

0

6

1

0

12

@shizhediao

Shizhe Diao

3 months

#LMFlow now supports the fine-tuning of Qwen2 🤗🚀A SOTA ✅Full-parameter training ✅LoRA ✅QLoRA 🚧LISA: under integration

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: Pretrained and instruct...

qwenlm.github.io

@huybery

Binyuan Hui

3 months

After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: ⭐ Base and Instruct models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Having been trained on data in 27 additional

Tweet media one

63

182

848

0

3

12

@shizhediao

Shizhe Diao

5 months

Check out our LMFlow to use LISA, a memory-efficient finetuning algorithm that allows a tradeoff between memory and the number of randomly unfrozen layers.:

@rui4research

Rui

5 months

LISA is now supported in LMFlow🚀 Check out our latest script for tuning 7B LLMs with LISA in one line🌟. Any feedback is highly appreciated 😄

1

5

18

0

2

11

@shizhediao

Shizhe Diao

9 months

🌟 With the buzz around #Mixtral on Twitter, we're thrilled to share our previous work published at ACL2023, the mixture-of-domain-adapters. 🔍 We explore the effective combination of multiple domain adapters. Check out our research: 🔗

Tweet card media

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain...

Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. Although continued pre-training on a large...

@MistralAI

Mistral AI

9 months

magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%%3A6969%2Fannounce&tr=http%3A%2F%%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24

538

2K

10K

1

1

11

@shizhediao

Shizhe Diao

2 months

Amazing work! Please check Flextron - a Many-in-One LLM - Train once, and deploy optimally on any GPU without retraining. 🔗:

Tweet card media

Flextron: Many-in-One Flexible Large Language Model

Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is...

@PavloMolchanov

Pavlo Molchanov

@PavloMolchanov

2 months

🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML! Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟 🔗 Paper: Main benefits with only 5% post-training finetuning: ✅ Best model for

5

66

196

1

1

11

@shizhediao

Shizhe Diao

1 year

We open-source a new benchmark for evaluating open-source LLMs! 🚀 It is cheap and user-friendly! Also, we release our model Robin-Chat-V2, achieving competitive performance on chitchat, commonsense, and instruction following. Try it out:

@OptimalScale

OptimalScale

1 year

🤖💬Evaluating chat-style Large Language Models just got easier with LMFlow benchmark! A free and user-friendly evaluation framework now open-sourced for the entire community. Say goodbye to expensive human labeling and API calls. See 🌟 #LMFlow #ChatGPT

Tweet media one

5

1

10

0

2

10

@shizhediao

Shizhe Diao

1 year

Arrived in Kigali for #ICLR2023 . I am greatly looking forward to ICLR next week!

@OptimalScale

OptimalScale

1 year

The LMFlow team has arrived at #ICLR2023 , and we can't wait to connect with fellow researchers! See you all there! #ICLR

0

0

4

0

0

10

@shizhediao

Shizhe Diao

8 months

Further analysis surprisingly shows that learning uncertainty during training yields better results than directly applying uncertainty filtering on test data. It unveils a surprising bonus: learning uncertainty leads to better calibration and improved prediction ability.

Tweet media one

0

0

10

@shizhediao

Shizhe Diao

2 years

Our new paper “Hashtag-Guided Low-Resource Tweet Classification” has been accepted to the web conference (WWW 2023)! @TheWebConf 👉Paper: 🔧Code: @sedrickkeh2 @PanLiangming #nlproc #MachineLearning

GitHub - shizhediao/HashTation: Source code for the paper "Hashtag-Guided Low-Resource Tweet...

Source code for the paper "Hashtag-Guided Low-Resource Tweet Classification" - shizhediao/HashTation

2

0

10

@shizhediao

Shizhe Diao

2 years

Fantastic two days in Dubai! Heading to Abu Dhabi for #emnlp2022 ! Looking forward to meeting new and old friends. 🥳 Photo credit to @taoyds Thx!

Tweet media one

0

0

9

@shizhediao

Shizhe Diao

1 year

LMFlow proposes a new alignment algorithm RAFT🚣 supporting RLHF! It aligns the model with human preferences & personalization through reward function ranking. More efficient & easy to use than PPO. Try it now! 🚀 Paper: #ChatGPT #alignment #RLHF

3

1

9

@shizhediao

Shizhe Diao

1 month

Very insightful findings! I find the research in small language models prefers MMLU-cloze, where most of SLMs perform at chance-level on MMLU. In addition to the training token perspective, model size is also related. The discrete metric also explains the emergent ability :)

@louvishh

lovish

3 months

MMLU performance is at a chance level even after training for 210B tokens for the standard formulation (the model is presented with all the choices and asked to predict the most relevant choice). But MMLU-Cloze gives a better signal during the early stages of the training. [5/n]

Tweet media one

Tweet media two

1

1

9

1

0

9

@shizhediao

Shizhe Diao

8 months

Joint work with Blender Lab @UIUC , where I am a visiting scholar under the supervision of Professor Heng Ji @elgreco_winter . Thanks for everyone's contribution! @HanningZhangHK @YiFung10 @xingyaow_ @YangyiChen6666

1

0

8

@shizhediao

Shizhe Diao

2 years

#emnlp2022 Encountered an embarrassing situation: one reviewer thought our work lacks novelty because similar ideas have been proposed by previous arXiv paper A. However, A is our paper, and 80% same as our submission. How to explain this without violating double-blind policy

2

0

8

@shizhediao

Shizhe Diao

8 months

Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. When the question is out of the parametric knowledge, it will try to make up something and fail to indicate when it lacks knowledge.

Tweet media one

0

0

8

@shizhediao

Shizhe Diao

8 months

Nice work! Glad to see more specialized LLMs trained with LMFlow MarineGPT (), AstroLLaMA-Chat (). What is the next? 🤔

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

@rui4research

Rui

8 months

AstroLLaMA is upgraded! Thrilled to announce AstroLLaMA-chat, the successor of AstroLLaMA that you can interact with! * Demo: * Paper: // Powered by LMFlow

Tweet media one

1

12

38

0

0

6

@shizhediao

Shizhe Diao

6 months

It is a joint work with Blender Lab @UIUC , where I was a visiting scholar under the supervision of Professor Heng Ji @elgreco_winter . Thanks for everyone's contribution! @HanningZhangHK @YiFung10 @xingyaow_ @YangyiChen6666

0

0

5

@shizhediao

Shizhe Diao

20 days

@hengjinlp Cannot agree more🙌

0

0

7

@shizhediao

Shizhe Diao

2 years

Excited to share our new work, ExtremeBERT, an easy-to-use toolkit for accelerating your language model pre-training on customized datasets! 📃Paper: ⭐️Code: 🔍Documentation: (1/n)

Tweet media one

1

0

7

@shizhediao

Shizhe Diao

8 months

In this paper, we instruct LLMs to reject unknown questions by (1) measuring the knowledge gap between parametric knowledge and finetuning data, (2) constructing the refusal-aware data by padding the uncertainty expression, and (3) finetuning the model on the refusal-aware data.

Tweet media one

0

1

7

@shizhediao

Shizhe Diao

11 months

I definitely learnt at lot from all your advice & guidance, super grateful for the warm and generous support from Blender Lab during my research exchange ~ thank you so much! 😊

@hengjinlp

Heng Ji

11 months

Wonderful work by Shizhe and Tong, along with a toolkit!

0

0

24

0

0

7

@shizhediao

Shizhe Diao

7 months

@colinraffel Congrats on such great work! We studied "Mixture-of-Adapters" @ACL2023 , where we proposed training specialized adapters and utilizing a gate mechanism to fuse knowledge from different adapters. You might find it relevant, and we are happy to discussions!

Tweet card media

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain...

Pre-trained language models (PLMs) demonstrate excellent abilities to understand texts in the generic domain while struggling in a specific domain. Although continued pre-training on a large...

0

0

7

@shizhediao

Shizhe Diao

3 months

@YangyiChen6666 SaySelf trains LLMs in two stages. First, supervised fine-tuning lets them generate self-reflective rationales and precise confidence estimates. Then, reinforcement learning refines these estimates through task supervision. [3/n]

Tweet media one

0

0

6

@shizhediao

Shizhe Diao

8 months

Our method is called Refusal-Aware Instruction Tuning(R-Tuning). The experiments demonstrate the ability to refuse to answer uncertain questions and improve the accuracy of the willingly answered questions. It also exhibits superior generalization performance on unseen datasets.

Tweet media one

Tweet media two

2

0

6

@shizhediao

Shizhe Diao

1 year

Glad to introduce our new work PTUnifier which is accepted by ICCV 2023 recently.🥳 Our latest paper presents a new prompt-based method for unifying medical vision-and-language pre-training. Paper: Code:

GitHub - zhjohnchan/PTUnifier: [ICCV-2023] Towards Unifying Medical Vision-and-Language Pre-train...

[ICCV-2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts - zhjohnchan/PTUnifier

@zhjohnchan

Zhihong Chen

1 year

Recently, there are several __biomedical generalist foundation models__ with extremely large scales, e.g., MedPaLM M from @GoogleAI . Last year, we also developed a small yet powerful one, PTUnifer. Code has been released () now 🔥. @wabyking @shizhediao

1

2

6

1

0

6

@shizhediao

Shizhe Diao

8 months

@alexgraveley @abacaj Thanks for retweeting!

0

0

3

@shizhediao

Shizhe Diao

1 year

GPT-4 still doesn't support multimodal inputs☹️, but Bard can. And so can our MiniBard! 🥳 With our LMFlow toolkit, building your own multimodal conversational AI is now easier than ever. #NLP #AI #multimodality Code:

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

@OptimalScale

OptimalScale

1 year

Personalized chatbot toolbox LMFlow now supports image inputs!😆 With LMFlow, you can build "MiniBard" with ease!😉 Currently we support chatbots with minigpt4 + robin 7b/13b inference, the multimodal finetuning service will be available soon~

2

0

3

2

0

5

@shizhediao

Shizhe Diao

8 months

@omarsar0 Very helpful and comprehensive survey! Thanks for sharing our work R-Tuning!

Tweet card media

R-Tuning: Instructing Large Language Models to Say `I Don't Know'

Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges. A predominant issue is the propensity for these models to...

0

0

5

@shizhediao

Shizhe Diao

1 year

Representative examples of HH-RLHF experiments with randomly sampled prompts.

Tweet media one

0

0

5

@shizhediao

Shizhe Diao

1 year

@TheTuringPost Excited to share our Robin-7b and -13b, performing quite competitively to Vicuna. We also release a full solution of fine-tuning including instruction tuning and alignment tuning, LMFlow ()

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

1

0

4

@shizhediao

Shizhe Diao

11 months

Skill #2 Finetuning: I developed a large language model fine-tuning pipeline LMFlow (), allowing fine-tuning and deploying personalized LLMs with minimal cost and effort. It has accumulated 7000+ stars⭐️ on GitHub.

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

0

0

4

@shizhediao

Shizhe Diao

2 years

@alisabets @aicrumb @amanrsanger here is an example:

0

0

4

@shizhediao

Shizhe Diao

2 years

@donglixp We released an image-text foundation model with unified objectives (PrefixLM) in our previous work DaVinci (Prefix Language Models are Unified Modal Learners, ).Glad to see BEIT-3 achieves excellent performance.Congratulations! Hope to see more discussions!

Tweet card media

Write and Paint: Generative Vision-Language Models are Unified...

Recent advances in vision-language pre-training have pushed the state-of-the-art on various vision-language tasks, making machines more capable of multi-modal writing (image-to-text generation)...

0

1

3

@shizhediao

Shizhe Diao

11 months

Skill #3 Prompting: I have experience in LLM reasoning and released two works: automate-CoT () and active-Prompt (). I also proposed to tune LLMs with black-box prompts (BDPL ).

Tweet card media

Black-box Prompt Learning for Pre-trained Language Models

The increasing scale of general-purpose Pre-trained Language Models (PLMs) necessitates the study of more efficient adaptation across different downstream tasks. In this paper, we establish a...

0

0

3

@shizhediao

Shizhe Diao

1 year

The performance on Huggingface Open LLM Leaderboard.

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

3 months

@YangyiChen6666 Case studies demonstrate SaySelf’s capability to generate insightful self-reflective rationales that effectively capture the internal uncertainty in LLMs. [5/n]

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

11 months

Skill #1 Pretraining: I pretrained a BERT-based model (ZEN) in 2019 achieving SOTA on several benchmark datasets (published in Findings of EMNLP 2020). I also have some experience in vision-language pre-training and published work in ICLR 2023 during my internship in Bytedance.

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

3 months

@YangyiChen6666 We present SaySelf to teach LLMs to generate more accurate and fine-grained confidence estimates. It goes beyond the confidence elicitation in previous work, enabling LLMs to generate self-reflective rationales, indicating the knowledge gap and explaining the uncertainty. [2/n]

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

15 days

It seems like the picture is not clear, here it is.

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

2 years

Made a bit of progress toward the unified foundation model. Welcome to check and follow our paper:

Tweet card media

Write and Paint: Generative Vision-Language Models are Unified...

Recent advances in vision-language pre-training have pushed the state-of-the-art on various vision-language tasks, making machines more capable of multi-modal writing (image-to-text generation)...

@wangchunshu

Wangchunshu Zhou

2 years

We find that by combining prefix language modeling and prefix image modeling, we can effectively train a multi-modal foundation model (we named as DaVinci🤣) capable of variety of tasks across modalities (language / vision / vision+language) and types (understanding / generation)

0

11

49

1

0

3

@shizhediao

Shizhe Diao

11 months

Skill #4 Vision-Language: Check out my recent work Detect What You Need via Reasoning (EMNLP 2023), Generative Vision-Language Pre-training (ICLR 2023), and a Multi-Task Benchmark for Evaluating Vision-Language Models (ICML 2022).

Tweet media one

Tweet media two

0

0

3

@shizhediao

Shizhe Diao

3 months

@YangyiChen6666 LLMs often hallucinate and fail to indicate uncertainty, necessitating more effective methods for obtaining accurate confidence estimates. Previous methods to derive confidence from LLMs often yield imprecise or binary estimates, failing to truly capture model confidence. [1/n]

0

0

3

@shizhediao

Shizhe Diao

1 year

@ClementDelangue Our training code and training data are fully open-source ()! Feel free to use LMFlow to fine-tune your own models. #LMFlow #machinelearning

Tweet card media

GitHub - OptimalScale/LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundat...

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - OptimalScale/LMFlow

0

1

3

@shizhediao

Shizhe Diao

3 months

@hengjinlp Thank you, Heng! The visit to UIUC was an amazing and rewarding experience to me!

0

0

3

@shizhediao

Shizhe Diao

1 year

@SamehKamalEldin @huggingface Yes, we will release a report next week. Please stay tuned~

0

0

3

@shizhediao

Shizhe Diao

5 months

🤗🤗🤗

@rui4research

Rui

5 months

Phi-3 template supported in #LMFlow , waiting for the base model 👀 Try further tuning Phi-3-instruct in LMFlow with SFT/ #LISA / #LoRA via 👇

Tweet media one

0

1

6

0

0

3

@shizhediao

Shizhe Diao

1 year

The performance of task-tuned LLaMA on three medical datasets:

Tweet media one

0

0

3

@shizhediao

Shizhe Diao

13 days

@kevinyli_ Hi Kevin, great work! A quick question: does the student mamba model and the teacher phi model have the same tokenizer?

1

0

3

@shizhediao

Shizhe Diao

1 year

@ClementDelangue Considering the cost of GPUs, we did not make all of them online. Try our Robin-33b-v2 online demo:

0

0

2

@shizhediao

Shizhe Diao

8 months

@hunkims Happy New Year Sung!

0

0

2

@shizhediao

Shizhe Diao

2 months

@shangbinfeng Very interesting work! The refusal / abstention capability of large models is crucial for them, and I have been focusing on this for a long time (e.g., our R-Tuning). It is nice to see that multi-LLM collaboration can assist in abstention.😀

1

0

2

@shizhediao

Shizhe Diao

8 months

@woojinrad Thanks for sharing our work!🤗

0

0

1

@shizhediao

Shizhe Diao

11 months

[3] DetGPT: Detect What You Need via Reasoning Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang

Tweet media one

0

1

2

@shizhediao

Shizhe Diao

8 months

@itsgalth Thank you for pointing that out! You're absolutely right! We appreciate your observation and will correct this in the next version. Thanks!

0

0

2

@shizhediao

Shizhe Diao

8 months

@paul_cal @_jasonwei Yeah, thanks for sharing! Strongly agree with Jason that comparison is a complication, and this is exactly the step where we put in the most effort. Fortunately, inspired by the object detection, we introduced average precision to consider both precision and recall (Section 3.3).

Tweet media one

1

0

0

@shizhediao

Shizhe Diao

1 year

@NLPiation @huggingface Hi Robin models are finetuned from LLaMA checkpoints. We will release a report with more details next week.

1

1

2

@shizhediao

Shizhe Diao

1 year

We propose a new alignment method called RAFT (Reward rAnked FineTuning). Here are the results on HH-RLHF datasets. Checkout the details of RAFT:

Tweet media one

0

0

2

@shizhediao

Shizhe Diao

7 months

@aadityaura We will consider conducting experiments on this great benchmark. Thanks!

0

0

1

@shizhediao

Shizhe Diao

27 days

@ZhijingJin @UofTCompSci @VectorInst Congrats Zhijing!

0

0

2

@shizhediao

Shizhe Diao

1 year

@mrcrp96 @huggingface For the inference of 13b model, you may need about 26GB GPU memory. For the delta weights, you can directly merge it with the llama models. please refer to Thanks!

0

1

2

@shizhediao

Shizhe Diao

1 year

Four highlighted features of LMFlow: 🥳Extensible: Support common backbones (LLaMA, Galactica, GPT-2, etc.) 🚀Light-Weight: Extremely few parameters with LoRA 🎯Task-Oriented: Comparable with ChatGPT on 7B/30B models 🌐Open: The whole pipeline is open-source (2/n)

0

0

2

@shizhediao

Shizhe Diao

1 year

Try our online demo here:

0

0

2

@shizhediao

Shizhe Diao

8 months

@YiFung10 @elgreco_winter @HanningZhangHK @xingyaow_ @YangyiChen6666 Absolutely! Hanning is an amazing undergraduate student whose talents and dedication consistently exceed expectations. Please consider him if you have positions!🩷

0

0

0

@shizhediao

Shizhe Diao

8 months

@5hoges yes that's true. but 0-parameter LLM has 0-accuracy and 100% IDK right? which is worse than a model with 80% accuracy and 20% IDK (what we are aiming at).

1

0

1

@shizhediao

Shizhe Diao

10 months

🩷

@hendrydong

Hanze Dong

10 months

🚀 Exciting to share our latest TMLR paper — RAFT: Reward rAnked FineTuning for aligning Generative Foundation Models! 🧠💡

3

3

10

0

0

2

@shizhediao

Shizhe Diao

3 months

@YangyiChen6666 The ECE, AUROC, Accuracy, and faithfulness evaluation results demonstrate improved confidence calibration performance, better task performance, and more reasonable self-reflective rationales. [4/n]

Tweet media one

Tweet media two

0

0

2

@shizhediao

Shizhe Diao

23 days

@hengjinlp @Glaciohound Congratulations to the team! Amazing!

0

0

2

@shizhediao

Shizhe Diao

2 years

Three highlighted features of ExtremeBERT: 🥳Easy-to-use Pipeline: one-line command pipeline without pain 🚀Acceleration: train your own BERT in one day 🌐Customized Datasets: compatible with huggingface datasets, support customization as well (2/n)

0

0

2

@shizhediao

Shizhe Diao

1 year

@lori_1104 🤣🤣🤣

0

0

2

@shizhediao

Shizhe Diao

11 months

[2] Automate-CoT: Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data KaShun SHUM, Shizhe Diao, Tong Zhang

Tweet media one

0

0

2

@shizhediao

Shizhe Diao

5 years

@MichelYang same question

0

0

1

@shizhediao

Shizhe Diao

7 months

We systematically compare a wide range of methods, including model averaging regularization methods, RL methods, LoRA, experience replay… We found that model averaging is surprisingly strong!

Tweet media one

0

0

1

@shizhediao

Shizhe Diao

3 months

@HuayangLi Thanks Huayang!

0

0

1