Shizhe Diao Profile Banner
Shizhe Diao Profile
Shizhe Diao

@shizhediao

2,361
Followers
1,235
Following
36
Media
299
Statuses

Research Scientist @NVIDIA focusing on efficient post-training of LLMs. Finetuning your own LLMs with LMFlow: Views are my own.

Santa Clara, CA
Joined January 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@shizhediao
Shizhe Diao
3 months
Excited to share our R-Tuning got an outstanding paper award @NAACL 2024! Take a look at this paper to see how to align your LLMs to honesty. This work is finished during my visit at UIUC. Thanks for Prof. Ji and Prof. Zhang’s supervision!
@hengjinlp
Heng Ji
3 months
We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han @Glaciohound will be on academic job market next year!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
17
13
228
12
9
77
@shizhediao
Shizhe Diao
8 months
Can we align LLMs to honesty via instruction finetuning? Can we instruct LLMs to say I Don't Know? Can uncertainty learning improve prediction ability? Excited to share R-Tuning, Refusal-Aware Instruction Tuning to tackle hallucination in LLMs. Paper:
Tweet media one
13
99
402
@shizhediao
Shizhe Diao
2 months
🌟A new chapter in my career journey I’m thrilled to be joining @NVIDIA Research as a Research Scientist working on foundation models. Looking forward to contributing to groundbreaking innovations and working with an incredible team! #NVIDIAlife #NVIDIA
Tweet media one
27
6
305
@shizhediao
Shizhe Diao
1 year
🎉Exciting news!🎉 Our finetuned Robin-33B-V2 scored an impressive 64.1 on the @huggingface LLM leaderboard in our offline evaluation! 🔍Check out our Robin-V2 series models, including 7B, 13B, 33B, and 65B versions. Upgrade your language modeling game today! #NLP #LanguageModels
Tweet media one
@OptimalScale
OptimalScale
1 year
Introducing the Robin V2 - a leap in LLM fine-tuning with LMFlow! 🚀Defeating major open-source LLMs like Falcon, LLaMA & more. Robin-7B to 65B boast impressive scores in OpenLLM.🥇Deep-tuned & optimized for accuracy in multiple domains. For results, check out LMFlow-benchmark.📊
1
12
82
5
43
292
@shizhediao
Shizhe Diao
6 months
Happy to share R-Tuning got accepted to #NAACL2024 main! We introduce Refusal-Aware Instruction Tuning to tackle hallucination in LLMs. So that the LLMs could say I Don't Know now! Goal: Alignment for Honesty Paper:
Tweet media one
4
34
153
@shizhediao
Shizhe Diao
1 year
🥳LMFlow paper is out! ⏩ LMFlow is an extensible toolkit for fine-tuning and inference of LLMs (e.g., Robin🐦!). 🔎Check out our implementation details at Everything from code to model weights is fully available for you to explore!
Tweet media one
4
37
138
@shizhediao
Shizhe Diao
7 days
We are hiring!😃 Our team is expanding, and we’re looking for passionate researchers to join us in advancing the frontiers of LLM/VLM efficiency. Join us in doing impactful and innovative research!
@PavloMolchanov
Pavlo Molchanov
7 days
🚀 Our team is hiring! Join to Advance Efficiency in Deep Learning at NVIDIA! 🚀 🔗 Apply here: Our team, Deep Learning Efficiency Research () at NVIDIA Research, is about a year old, and we are expanding. We're looking for
Tweet media one
1
34
193
6
16
163
@shizhediao
Shizhe Diao
3 months
Can we teach LLMs to express fine-grained confidence? Can we instruct LLMs to explain their uncertainty? Introducing SaySelf: a framework to boost LLMs' reliability by teaching them to provide fine-grained confidence and self-reflective rationales. Paper:
Tweet media one
6
34
137
@shizhediao
Shizhe Diao
1 year
Robin-7b-v2 is a model finetuned from LLaMA-7B and it scored 51.7 in the OpenLLM leaderboard. @ClementDelangue Checkpoints are released: Try our online demo from HF space:
4
21
62
@shizhediao
Shizhe Diao
3 months
LMFlow got Best Demo Paper award! LMFlow is a lightweight toolkit for finetuning customised LLMs. We are iterating quickly, so stay tuned for more new features! 🥳
@naaclmeeting
NAACL HLT 2025
3 months
Some of the award winners in this edition of #NAACL2024
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
2
47
9
7
56
@shizhediao
Shizhe Diao
11 months
I am diving into the LLM job market! 🚀 Actively seeking full-time industry jobs with my research on large foundation models. Discover more about my work! ➡️ #JobSearch #AIResearch #CareerOpportunities #OpenToWork
4
10
51
@shizhediao
Shizhe Diao
15 days
Over the past few weeks, we’ve been rolling out a series of pruned + distilled small models, along with a complete training recipe (). Nemotron-4 15B ➡️ Minitron 4B Llama-3.1 8B ➡️ Llama-3.1-Minitron-4B Mistral NeMo 12B ➡️ MN-Minitron-8B 🔥 In addition,
Tweet media one
3
6
42
@shizhediao
Shizhe Diao
2 years
I created a paper list about ChatGPT with the goal of helping everyone learn the techniques behind it. 🔥It reached 136 stars within just a few days 🙌Please, send PRs with your favorite work. #MachineLearning #ChatGPT #paperlist (generated by ChatGPT😁)
1
5
39
@shizhediao
Shizhe Diao
5 months
Check out our recent efforts on memory-efficient fine-tuning of LLMs! We investigate the layerwise properties of LoRA on fine-tuning tasks and propose Layerwise Importance Sampled AdamW (𝑳𝑰𝑺𝑨), a promising alternative for LoRA. HuggingFace paper:
@rui4research
Rui
5 months
Excited to share LISA, which enables - 7B tuning on a 24GB GPU - 70B tuning on 4x80GB GPUs and obtains better performance than LoRA in ~50% less time 🚀
Tweet media one
8
114
551
0
4
30
@shizhediao
Shizhe Diao
1 year
Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀. ⭐️Code: 🔍Documentation: (1/n)
3
4
30
@shizhediao
Shizhe Diao
1 month
Recently I often reflect on the idea of overfitting benchmarks and wonder if we can achieve AGI by “overfitting the world.” Today, I tried a SOTA SLM, and it felt like a baby who has only ever studied textbooks and math, but can’t even say hello… #AI #AGI #MachineLearning
Tweet media one
2
0
28
@shizhediao
Shizhe Diao
2 months
Minitron is the latest LLM series developed by NVIDIA. It is super efficient (just 2.6B active non-embedding parameters) and super powerful (MMLU + 14%)! 🥳 The competition among small-scale models (under 10B parameters) seems to be getting increasingly intense...😮
@PavloMolchanov
Pavlo Molchanov
2 months
🚀 40x Faster Model Training via Pruning and Distillation! Permissive Minitron-4B and Minitron-8B models! 🔗 Paper: 🔗 GitHub: 🔗 Models on HF: Key highlights of 4B/8B models: 📊 2.6B/6.2B active
Tweet media one
5
47
163
0
1
28
@shizhediao
Shizhe Diao
2 years
We're excited to share that our work ‘Prefix Language Models are Unified Modal Learners’ has been accepted to #ICLR2023 ! 🥳 Paper📜: Code🧑‍💻: The code will be released soon! @wangchunshu
1
3
24
@shizhediao
Shizhe Diao
9 months
Thanks for following our work! Instructing towards honest LLMs is an interesting topic and it is also important for alignment research. A surprising discovery: Learning uncertainty improves model calibration (expected) and prediction (surprising)! 🌟
@leoyerrrr
HanRong YE
9 months
🌟Read a paper on "Refusal-Aware Instruction Tuning", which is pretty interesting. They identified the knowledge gap between LLMs and instruction tuning data. With R-Tuning, they teach LLMs to recognize when they lack knowledge and resist making things up.
Tweet media one
2
12
26
1
11
22
@shizhediao
Shizhe Diao
7 months
Curious about how severe the alignment tax is on LLMs' general capabilities? Eager to mitigate the alignment tax? We explored a frustratingly easy approach: Model Averaging. It's astonishingly effective, outperforming numerous baselines! 🔎Paper:
Tweet media one
3
3
19
@shizhediao
Shizhe Diao
2 months
Very insightful post! I would like to mention our R-Tuning paper () towards refusing / abstaining to answer unknown questions by fine-tuning, showcasing how LLMs can effectively acknowledge when they don’t know the answer. #MitigatingExtrinsicHallucinations
@lilianweng
Lilian Weng
2 months
Wrote about extrinsic hallucinations during the July 4th break. Here is what ChatGPT suggested as a fun tweet for the blog: 🚀 Dive into the wild world of AI hallucinations! 🤖 Discover how LLMs can conjure up some seriously creative (and sometimes
21
178
967
3
1
19
@shizhediao
Shizhe Diao
1 year
Davinci is a foundation model capable of various tasks across modalities (language/vision/vision+language) and types (understanding/generation)! Don't miss our poster presentation at ICLR in Kigali tomorrow! Hit us up with a DM to hang out and chat about our research. #ICLR2023
Tweet media one
1
2
18
@shizhediao
Shizhe Diao
17 days
Exciting updates from our Minitron project. Welcome to check our best 8B Base model via pruning and distillation!🥳
@PavloMolchanov
Pavlo Molchanov
17 days
🌟 The best 8B Base model via pruning and distillation! 🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B. Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens. Result: the
Tweet media one
4
51
156
1
1
18
@shizhediao
Shizhe Diao
1 year
@ClementDelangue @huggingface Hi, Thanks for your attention! We have released all of these checkpoints at huggingface hub. Please try it out: Robin-65b-v2: Robin-33b-v2: Robin-13b-v2: Robin-7b-v2:
1
3
17
@shizhediao
Shizhe Diao
3 months
🥰Happy to share LMFlow got accepted to #NAACL2024 demo track! now hosts camera-ready hot takes on: -One-stop lightweight toolkit for LLM fine-tuning -Support SOTA techniques like LISA -Streamlining scientific LLM development like AstroLLaMA-Chat,MarineGPT
@shizhediao
Shizhe Diao
1 year
Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀. ⭐️Code: 🔍Documentation: (1/n)
3
4
30
1
5
15
@shizhediao
Shizhe Diao
23 days
Llama-3.1-Minitron 4B is a very good small language model developed by our team. Please check it and enjoy!🤗
@NVIDIAAIDev
NVIDIA AI Developer
23 days
See how our #NVIDIAResearch team has developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation - offering several advantages for developers: ✅ 16% better performance on MMLU scores ✅ 40x fewer
Tweet media one
2
27
117
0
1
16
@shizhediao
Shizhe Diao
1 year
Excited to share DetGPT! With its powerful reasoning and image understanding capabilities, it accurately locates and provides details about target objects. It can even interpret user context to find objects that aren't explicitly named. #Robotics 👋
@OptimalScale
OptimalScale
1 year
🔍Check out our new object detector DetGPT! It's unlike any other - using instructions, it can not only find objects, but also reason under complex contexts. Whether you need a cold beer or help in Zelda, DetGPT has you covered.🎯 #objectdetection #ChatGPT
5
4
9
0
5
14
@shizhediao
Shizhe Diao
1 year
Speculative Decoding is a game-changer for model inference speeds! 🚀 Witness the magic with LMFlow's new update! #LMFlowBoost #LLMSpeedUp 🔥
@OptimalScale
OptimalScale
1 year
LMFlow now supports Speculative Decoding! 🥳 Experience faster model inference without retraining or tweaking the architecture! We're seeing close to a 4x speed boost (gpt2-xl <- gpt2). 🚀 Check out the speculative_inference.py in LMFlow!
Tweet media one
0
0
6
1
0
12
@shizhediao
Shizhe Diao
3 months
#LMFlow now supports the fine-tuning of Qwen2 🤗🚀A SOTA ✅Full-parameter training ✅LoRA ✅QLoRA 🚧LISA: under integration
@huybery
Binyuan Hui
3 months
After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: ⭐ Base and Instruct models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Having been trained on data in 27 additional
Tweet media one
63
182
848
0
3
12
@shizhediao
Shizhe Diao
5 months
Check out our LMFlow to use LISA, a memory-efficient finetuning algorithm that allows a tradeoff between memory and the number of randomly unfrozen layers.:
@rui4research
Rui
5 months
LISA is now supported in LMFlow🚀 Check out our latest script for tuning 7B LLMs with LISA in one line🌟. Any feedback is highly appreciated 😄
1
5
18
0
2
11
@shizhediao
Shizhe Diao
9 months
🌟 With the buzz around #Mixtral on Twitter, we're thrilled to share our previous work published at ACL2023, the mixture-of-domain-adapters. 🔍 We explore the effective combination of multiple domain adapters. Check out our research: 🔗
@MistralAI
Mistral AI
9 months
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%%3A6969%2Fannounce&tr=http%3A%2F%%3A80%2Fannounce RELEASE a6bbd9affe0c2725c1b7410d66833e24
538
2K
10K
1
1
11
@shizhediao
Shizhe Diao
2 months
Amazing work! Please check Flextron - a Many-in-One LLM - Train once, and deploy optimally on any GPU without retraining. 🔗:
@PavloMolchanov
Pavlo Molchanov
2 months
🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML! Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟 🔗 Paper: Main benefits with only 5% post-training finetuning: ✅ Best model for
5
66
196
1
1
11
@shizhediao
Shizhe Diao
1 year
We open-source a new benchmark for evaluating open-source LLMs! 🚀 It is cheap and user-friendly! Also, we release our model Robin-Chat-V2, achieving competitive performance on chitchat, commonsense, and instruction following. Try it out:
@OptimalScale
OptimalScale
1 year
🤖💬Evaluating chat-style Large Language Models just got easier with LMFlow benchmark! A free and user-friendly evaluation framework now open-sourced for the entire community. Say goodbye to expensive human labeling and API calls. See 🌟 #LMFlow #ChatGPT
Tweet media one
5
1
10
0
2
10
@shizhediao
Shizhe Diao
1 year
Arrived in Kigali for #ICLR2023 . I am greatly looking forward to ICLR next week!
@OptimalScale
OptimalScale
1 year
The LMFlow team has arrived at #ICLR2023 , and we can't wait to connect with fellow researchers! See you all there! #ICLR
0
0
4
0
0
10
@shizhediao
Shizhe Diao
8 months
Further analysis surprisingly shows that learning uncertainty during training yields better results than directly applying uncertainty filtering on test data. It unveils a surprising bonus: learning uncertainty leads to better calibration and improved prediction ability.
Tweet media one
0
0
10
@shizhediao
Shizhe Diao
2 years
Fantastic two days in Dubai! Heading to Abu Dhabi for #emnlp2022 ! Looking forward to meeting new and old friends. 🥳 Photo credit to @taoyds Thx!
Tweet media one
0
0
9
@shizhediao
Shizhe Diao
1 year
LMFlow proposes a new alignment algorithm RAFT🚣 supporting RLHF! It aligns the model with human preferences & personalization through reward function ranking. More efficient & easy to use than PPO. Try it now! 🚀 Paper: #ChatGPT #alignment #RLHF
3
1
9
@shizhediao
Shizhe Diao
1 month
Very insightful findings! I find the research in small language models prefers MMLU-cloze, where most of SLMs perform at chance-level on MMLU. In addition to the training token perspective, model size is also related. The discrete metric also explains the emergent ability :)
@louvishh
lovish
3 months
MMLU performance is at a chance level even after training for 210B tokens for the standard formulation (the model is presented with all the choices and asked to predict the most relevant choice). But MMLU-Cloze gives a better signal during the early stages of the training. [5/n]
Tweet media one
Tweet media two
1
1
9
1
0
9
@shizhediao
Shizhe Diao
8 months
Joint work with Blender Lab @UIUC , where I am a visiting scholar under the supervision of Professor Heng Ji @elgreco_winter . Thanks for everyone's contribution! @HanningZhangHK @YiFung10 @xingyaow_ @YangyiChen6666
1
0
8
@shizhediao
Shizhe Diao
2 years
#emnlp2022 Encountered an embarrassing situation: one reviewer thought our work lacks novelty because similar ideas have been proposed by previous arXiv paper A. However, A is our paper, and 80% same as our submission. How to explain this without violating double-blind policy
2
0
8
@shizhediao
Shizhe Diao
8 months
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not. When the question is out of the parametric knowledge, it will try to make up something and fail to indicate when it lacks knowledge.
Tweet media one
0
0
8
@shizhediao
Shizhe Diao
8 months
Nice work! Glad to see more specialized LLMs trained with LMFlow MarineGPT (), AstroLLaMA-Chat (). What is the next? 🤔
@rui4research
Rui
8 months
AstroLLaMA is upgraded! Thrilled to announce AstroLLaMA-chat, the successor of AstroLLaMA that you can interact with! * Demo: * Paper: // Powered by LMFlow
Tweet media one
1
12
38
0
0
6
@shizhediao
Shizhe Diao
6 months
It is a joint work with Blender Lab @UIUC , where I was a visiting scholar under the supervision of Professor Heng Ji @elgreco_winter . Thanks for everyone's contribution! @HanningZhangHK @YiFung10 @xingyaow_ @YangyiChen6666
0
0
5
@shizhediao
Shizhe Diao
20 days
@hengjinlp Cannot agree more🙌
0
0
7
@shizhediao
Shizhe Diao
2 years
Excited to share our new work, ExtremeBERT, an easy-to-use toolkit for accelerating your language model pre-training on customized datasets! 📃Paper: ⭐️Code: 🔍Documentation: (1/n)
Tweet media one
1
0
7
@shizhediao
Shizhe Diao
8 months
In this paper, we instruct LLMs to reject unknown questions by (1) measuring the knowledge gap between parametric knowledge and finetuning data, (2) constructing the refusal-aware data by padding the uncertainty expression, and (3) finetuning the model on the refusal-aware data.
Tweet media one
0
1
7
@shizhediao
Shizhe Diao
11 months
I definitely learnt at lot from all your advice & guidance, super grateful for the warm and generous support from Blender Lab during my research exchange ~ thank you so much! 😊
@hengjinlp
Heng Ji
11 months
Wonderful work by Shizhe and Tong, along with a toolkit!
0
0
24
0
0
7
@shizhediao
Shizhe Diao
7 months
@colinraffel Congrats on such great work! We studied "Mixture-of-Adapters" @ACL2023 , where we proposed training specialized adapters and utilizing a gate mechanism to fuse knowledge from different adapters. You might find it relevant, and we are happy to discussions!
0
0
7
@shizhediao
Shizhe Diao
3 months
@YangyiChen6666 SaySelf trains LLMs in two stages. First, supervised fine-tuning lets them generate self-reflective rationales and precise confidence estimates. Then, reinforcement learning refines these estimates through task supervision. [3/n]
Tweet media one
0
0
6
@shizhediao
Shizhe Diao
8 months
Our method is called Refusal-Aware Instruction Tuning(R-Tuning). The experiments demonstrate the ability to refuse to answer uncertain questions and improve the accuracy of the willingly answered questions. It also exhibits superior generalization performance on unseen datasets.
Tweet media one
Tweet media two
2
0
6
@shizhediao
Shizhe Diao
1 year
Glad to introduce our new work PTUnifier which is accepted by ICCV 2023 recently.🥳 Our latest paper presents a new prompt-based method for unifying medical vision-and-language pre-training. Paper: Code:
@zhjohnchan
Zhihong Chen
1 year
Recently, there are several __biomedical generalist foundation models__ with extremely large scales, e.g., MedPaLM M from @GoogleAI . Last year, we also developed a small yet powerful one, PTUnifer. Code has been released () now 🔥. @wabyking @shizhediao
1
2
6
1
0
6
@shizhediao
Shizhe Diao
8 months
@alexgraveley @abacaj Thanks for retweeting!
0
0
3
@shizhediao
Shizhe Diao
1 year
GPT-4 still doesn't support multimodal inputs☹️, but Bard can. And so can our MiniBard! 🥳 With our LMFlow toolkit, building your own multimodal conversational AI is now easier than ever. #NLP #AI #multimodality Code:
@OptimalScale
OptimalScale
1 year
Personalized chatbot toolbox LMFlow now supports image inputs!😆 With LMFlow, you can build "MiniBard" with ease!😉 Currently we support chatbots with minigpt4 + robin 7b/13b inference, the multimodal finetuning service will be available soon~
2
0
3
2
0
5
@shizhediao
Shizhe Diao
1 year
Representative examples of HH-RLHF experiments with randomly sampled prompts.
Tweet media one
0
0
5
@shizhediao
Shizhe Diao
1 year
@TheTuringPost Excited to share our Robin-7b and -13b, performing quite competitively to Vicuna. We also release a full solution of fine-tuning including instruction tuning and alignment tuning, LMFlow ()
1
0
4
@shizhediao
Shizhe Diao
11 months
Skill #2 Finetuning: I developed a large language model fine-tuning pipeline LMFlow (), allowing fine-tuning and deploying personalized LLMs with minimal cost and effort. It has accumulated 7000+ stars⭐️ on GitHub.
0
0
4
@shizhediao
Shizhe Diao
2 years
0
0
4
@shizhediao
Shizhe Diao
2 years
@donglixp We released an image-text foundation model with unified objectives (PrefixLM) in our previous work DaVinci (Prefix Language Models are Unified Modal Learners, ).Glad to see BEIT-3 achieves excellent performance.Congratulations! Hope to see more discussions!
0
1
3
@shizhediao
Shizhe Diao
11 months
Skill #3 Prompting: I have experience in LLM reasoning and released two works: automate-CoT () and active-Prompt (). I also proposed to tune LLMs with black-box prompts (BDPL ).
0
0
3
@shizhediao
Shizhe Diao
1 year
The performance on Huggingface Open LLM Leaderboard.
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
3 months
@YangyiChen6666 Case studies demonstrate SaySelf’s capability to generate insightful self-reflective rationales that effectively capture the internal uncertainty in LLMs. [5/n]
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
11 months
Skill #1 Pretraining: I pretrained a BERT-based model (ZEN) in 2019 achieving SOTA on several benchmark datasets (published in Findings of EMNLP 2020). I also have some experience in vision-language pre-training and published work in ICLR 2023 during my internship in Bytedance.
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
3 months
@YangyiChen6666 We present SaySelf to teach LLMs to generate more accurate and fine-grained confidence estimates. It goes beyond the confidence elicitation in previous work, enabling LLMs to generate self-reflective rationales, indicating the knowledge gap and explaining the uncertainty. [2/n]
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
15 days
It seems like the picture is not clear, here it is.
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
2 years
Made a bit of progress toward the unified foundation model. Welcome to check and follow our paper:
@wangchunshu
Wangchunshu Zhou
2 years
We find that by combining prefix language modeling and prefix image modeling, we can effectively train a multi-modal foundation model (we named as DaVinci🤣) capable of variety of tasks across modalities (language / vision / vision+language) and types (understanding / generation)
0
11
49
1
0
3
@shizhediao
Shizhe Diao
11 months
Skill #4 Vision-Language: Check out my recent work Detect What You Need via Reasoning (EMNLP 2023), Generative Vision-Language Pre-training (ICLR 2023), and a Multi-Task Benchmark for Evaluating Vision-Language Models (ICML 2022).
Tweet media one
Tweet media two
0
0
3
@shizhediao
Shizhe Diao
3 months
@YangyiChen6666 LLMs often hallucinate and fail to indicate uncertainty, necessitating more effective methods for obtaining accurate confidence estimates. Previous methods to derive confidence from LLMs often yield imprecise or binary estimates, failing to truly capture model confidence. [1/n]
0
0
3
@shizhediao
Shizhe Diao
3 months
@hengjinlp Thank you, Heng! The visit to UIUC was an amazing and rewarding experience to me!
0
0
3
@shizhediao
Shizhe Diao
1 year
@SamehKamalEldin @huggingface Yes, we will release a report next week. Please stay tuned~
0
0
3
@shizhediao
Shizhe Diao
5 months
🤗🤗🤗
@rui4research
Rui
5 months
Phi-3 template supported in #LMFlow , waiting for the base model 👀 Try further tuning Phi-3-instruct in LMFlow with SFT/ #LISA / #LoRA via 👇
Tweet media one
0
1
6
0
0
3
@shizhediao
Shizhe Diao
1 year
The performance of task-tuned LLaMA on three medical datasets:
Tweet media one
0
0
3
@shizhediao
Shizhe Diao
13 days
@kevinyli_ Hi Kevin, great work! A quick question: does the student mamba model and the teacher phi model have the same tokenizer?
1
0
3
@shizhediao
Shizhe Diao
1 year
@ClementDelangue Considering the cost of GPUs, we did not make all of them online. Try our Robin-33b-v2 online demo:
0
0
2
@shizhediao
Shizhe Diao
8 months
@hunkims Happy New Year Sung!
0
0
2
@shizhediao
Shizhe Diao
2 months
@shangbinfeng Very interesting work! The refusal / abstention capability of large models is crucial for them, and I have been focusing on this for a long time (e.g., our R-Tuning). It is nice to see that multi-LLM collaboration can assist in abstention.😀
1
0
2
@shizhediao
Shizhe Diao
8 months
@woojinrad Thanks for sharing our work!🤗
0
0
1
@shizhediao
Shizhe Diao
11 months
[3] DetGPT: Detect What You Need via Reasoning Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang
Tweet media one
0
1
2
@shizhediao
Shizhe Diao
8 months
@itsgalth Thank you for pointing that out! You're absolutely right! We appreciate your observation and will correct this in the next version. Thanks!
0
0
2
@shizhediao
Shizhe Diao
8 months
@paul_cal @_jasonwei Yeah, thanks for sharing! Strongly agree with Jason that comparison is a complication, and this is exactly the step where we put in the most effort. Fortunately, inspired by the object detection, we introduced average precision to consider both precision and recall (Section 3.3).
Tweet media one
1
0
0
@shizhediao
Shizhe Diao
1 year
@NLPiation @huggingface Hi Robin models are finetuned from LLaMA checkpoints. We will release a report with more details next week.
1
1
2
@shizhediao
Shizhe Diao
1 year
We propose a new alignment method called RAFT (Reward rAnked FineTuning). Here are the results on HH-RLHF datasets. Checkout the details of RAFT:
Tweet media one
0
0
2
@shizhediao
Shizhe Diao
7 months
@aadityaura We will consider conducting experiments on this great benchmark. Thanks!
0
0
1
@shizhediao
Shizhe Diao
27 days
0
0
2
@shizhediao
Shizhe Diao
1 year
@mrcrp96 @huggingface For the inference of 13b model, you may need about 26GB GPU memory. For the delta weights, you can directly merge it with the llama models. please refer to Thanks!
0
1
2
@shizhediao
Shizhe Diao
1 year
Four highlighted features of LMFlow: 🥳Extensible: Support common backbones (LLaMA, Galactica, GPT-2, etc.) 🚀Light-Weight: Extremely few parameters with LoRA 🎯Task-Oriented: Comparable with ChatGPT on 7B/30B models 🌐Open: The whole pipeline is open-source (2/n)
0
0
2
@shizhediao
Shizhe Diao
1 year
Try our online demo here:
0
0
2
@shizhediao
Shizhe Diao
8 months
@YiFung10 @elgreco_winter @HanningZhangHK @xingyaow_ @YangyiChen6666 Absolutely! Hanning is an amazing undergraduate student whose talents and dedication consistently exceed expectations. Please consider him if you have positions!🩷
0
0
0
@shizhediao
Shizhe Diao
8 months
@5hoges yes that's true. but 0-parameter LLM has 0-accuracy and 100% IDK right? which is worse than a model with 80% accuracy and 20% IDK (what we are aiming at).
1
0
1
@shizhediao
Shizhe Diao
10 months
🩷
@hendrydong
Hanze Dong
10 months
🚀 Exciting to share our latest TMLR paper — RAFT: Reward rAnked FineTuning for aligning Generative Foundation Models! 🧠💡
3
3
10
0
0
2
@shizhediao
Shizhe Diao
3 months
@YangyiChen6666 The ECE, AUROC, Accuracy, and faithfulness evaluation results demonstrate improved confidence calibration performance, better task performance, and more reasonable self-reflective rationales. [4/n]
Tweet media one
Tweet media two
0
0
2
@shizhediao
Shizhe Diao
23 days
@hengjinlp @Glaciohound Congratulations to the team! Amazing!
0
0
2
@shizhediao
Shizhe Diao
2 years
Three highlighted features of ExtremeBERT: 🥳Easy-to-use Pipeline: one-line command pipeline without pain 🚀Acceleration: train your own BERT in one day 🌐Customized Datasets: compatible with huggingface datasets, support customization as well (2/n)
0
0
2
@shizhediao
Shizhe Diao
1 year
@lori_1104 🤣🤣🤣
0
0
2
@shizhediao
Shizhe Diao
11 months
[2] Automate-CoT: Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data KaShun SHUM, Shizhe Diao, Tong Zhang
Tweet media one
0
0
2
@shizhediao
Shizhe Diao
5 years
@MichelYang same question
0
0
1
@shizhediao
Shizhe Diao
7 months
We systematically compare a wide range of methods, including model averaging regularization methods, RL methods, LoRA, experience replay… We found that model averaging is surprisingly strong!
Tweet media one
0
0
1
@shizhediao
Shizhe Diao
3 months
@HuayangLi Thanks Huayang!
0
0
1