Excited to share our R-Tuning got an outstanding paper award
@NAACL
2024! Take a look at this paper to see how to align your LLMs to honesty. This work is finished during my visit at UIUC. Thanks for Prof. Ji and Prof. Zhang’s supervision!
We have won two NAACL2024 Outstanding Paper Awards! Congratulations to Chi Han, Shizhe Diao, Yi Fung, Xingyao Wang, Yangyi Chen and all students and collaborators! Chi Han
@Glaciohound
will be on academic job market next year!
Can we align LLMs to honesty via instruction finetuning?
Can we instruct LLMs to say I Don't Know?
Can uncertainty learning improve prediction ability?
Excited to share R-Tuning, Refusal-Aware Instruction Tuning to tackle hallucination in LLMs.
Paper:
🌟A new chapter in my career journey
I’m thrilled to be joining
@NVIDIA
Research as a Research Scientist working on foundation models. Looking forward to contributing to groundbreaking innovations and working with an incredible team!
#NVIDIAlife
#NVIDIA
🎉Exciting news!🎉 Our finetuned Robin-33B-V2 scored an impressive 64.1 on the
@huggingface
LLM leaderboard in our offline evaluation! 🔍Check out our Robin-V2 series models, including 7B, 13B, 33B, and 65B versions. Upgrade your language modeling game today!
#NLP
#LanguageModels
Introducing the Robin V2 - a leap in LLM fine-tuning with LMFlow! 🚀Defeating major open-source LLMs like Falcon, LLaMA & more. Robin-7B to 65B boast impressive scores in OpenLLM.🥇Deep-tuned & optimized for accuracy in multiple domains. For results, check out LMFlow-benchmark.📊
Happy to share R-Tuning got accepted to
#NAACL2024
main!
We introduce Refusal-Aware Instruction Tuning to tackle hallucination in LLMs.
So that the LLMs could say I Don't Know now!
Goal: Alignment for Honesty
Paper:
🥳LMFlow paper is out!
⏩ LMFlow is an extensible toolkit for fine-tuning and inference of LLMs (e.g., Robin🐦!).
🔎Check out our implementation details at
Everything from code to model weights is fully available for you to explore!
We are hiring!😃
Our team is expanding, and we’re looking for passionate researchers to join us in advancing the frontiers of LLM/VLM efficiency.
Join us in doing impactful and innovative research!
🚀 Our team is hiring! Join to Advance Efficiency in Deep Learning at NVIDIA! 🚀
🔗 Apply here:
Our team, Deep Learning Efficiency Research () at NVIDIA Research, is about a year old, and we are expanding. We're looking for
Can we teach LLMs to express fine-grained confidence?
Can we instruct LLMs to explain their uncertainty?
Introducing SaySelf: a framework to boost LLMs' reliability by teaching them to provide fine-grained confidence and self-reflective rationales.
Paper:
Robin-7b-v2 is a model finetuned from LLaMA-7B and it scored 51.7 in the OpenLLM leaderboard.
@ClementDelangue
Checkpoints are released:
Try our online demo from HF space:
LMFlow got Best Demo Paper award! LMFlow is a lightweight toolkit for finetuning customised LLMs. We are iterating quickly, so stay tuned for more new features! 🥳
Over the past few weeks, we’ve been rolling out a series of pruned + distilled small models, along with a complete training recipe ().
Nemotron-4 15B ➡️ Minitron 4B
Llama-3.1 8B ➡️ Llama-3.1-Minitron-4B
Mistral NeMo 12B ➡️ MN-Minitron-8B
🔥 In addition,
I created a paper list about ChatGPT with the goal of helping everyone learn the techniques behind it.
🔥It reached 136 stars within just a few days
🙌Please, send PRs with your favorite work.
#MachineLearning
#ChatGPT
#paperlist
(generated by ChatGPT😁)
Check out our recent efforts on memory-efficient fine-tuning of LLMs!
We investigate the layerwise properties of LoRA on fine-tuning tasks and propose Layerwise Importance Sampled AdamW (𝑳𝑰𝑺𝑨), a promising alternative for LoRA.
HuggingFace paper:
Excited to share LISA, which enables
- 7B tuning on a 24GB GPU
- 70B tuning on 4x80GB GPUs
and obtains better performance than LoRA in ~50% less time 🚀
Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀.
⭐️Code:
🔍Documentation:
(1/n)
Recently I often reflect on the idea of overfitting benchmarks and wonder if we can achieve AGI by “overfitting the world.” Today, I tried a SOTA SLM, and it felt like a baby who has only ever studied textbooks and math, but can’t even say hello…
#AI
#AGI
#MachineLearning
Minitron is the latest LLM series developed by NVIDIA. It is super efficient (just 2.6B active non-embedding parameters) and super powerful (MMLU + 14%)! 🥳
The competition among small-scale models (under 10B parameters) seems to be getting increasingly intense...😮
🚀 40x Faster Model Training via Pruning and Distillation!
Permissive Minitron-4B and Minitron-8B models!
🔗 Paper:
🔗 GitHub:
🔗 Models on HF:
Key highlights of 4B/8B models:
📊 2.6B/6.2B active
We're excited to share that our work ‘Prefix Language Models are Unified Modal Learners’ has been accepted to
#ICLR2023
! 🥳
Paper📜:
Code🧑💻:
The code will be released soon!
@wangchunshu
Thanks for following our work!
Instructing towards honest LLMs is an interesting topic and it is also important for alignment research.
A surprising discovery: Learning uncertainty improves model calibration (expected) and prediction (surprising)! 🌟
🌟Read a paper on "Refusal-Aware Instruction Tuning", which is pretty interesting. They identified the knowledge gap between LLMs and instruction tuning data. With R-Tuning, they teach LLMs to recognize when they lack knowledge and resist making things up.
Curious about how severe the alignment tax is on LLMs' general capabilities?
Eager to mitigate the alignment tax?
We explored a frustratingly easy approach: Model Averaging. It's astonishingly effective, outperforming numerous baselines!
🔎Paper:
Very insightful post! I would like to mention our R-Tuning paper () towards refusing / abstaining to answer unknown questions by fine-tuning, showcasing how LLMs can effectively acknowledge when they don’t know the answer.
#MitigatingExtrinsicHallucinations
Wrote about extrinsic hallucinations during the July 4th break.
Here is what ChatGPT suggested as a fun tweet for the blog:
🚀 Dive into the wild world of AI hallucinations!
🤖 Discover how LLMs can conjure up some seriously creative (and sometimes
Davinci is a foundation model capable of various tasks across modalities (language/vision/vision+language) and types (understanding/generation)!
Don't miss our poster presentation at ICLR in Kigali tomorrow! Hit us up with a DM to hang out and chat about our research.
#ICLR2023
🌟 The best 8B Base model via pruning and distillation!
🚀 Introducing Mistral-NeMo-Minitron-8B-Base model we derived from the recent Mistral-NeMo-12B.
Our recipe: finetune teacher on 100B tokens, prune to 8B params, run teacher-student distillation on <400B tokens.
Result: the
@ClementDelangue
@huggingface
Hi,
Thanks for your attention!
We have released all of these checkpoints at huggingface hub. Please try it out:
Robin-65b-v2:
Robin-33b-v2:
Robin-13b-v2:
Robin-7b-v2:
🥰Happy to share LMFlow got accepted to
#NAACL2024
demo track!
now hosts camera-ready hot takes on:
-One-stop lightweight toolkit for LLM fine-tuning
-Support SOTA techniques like LISA
-Streamlining scientific LLM development like AstroLLaMA-Chat,MarineGPT
Excited to share our new work, LMFlow, an extensible toolkit for finetuning and inference of large foundation models. Use this general workflow to train your domain/task-specific language models 🚀.
⭐️Code:
🔍Documentation:
(1/n)
See how our
#NVIDIAResearch
team has developed a method to efficiently create smaller, accurate language models by using structured weight pruning and knowledge distillation - offering several advantages for developers:
✅ 16% better performance on MMLU scores
✅ 40x fewer
Excited to share DetGPT! With its powerful reasoning and image understanding capabilities, it accurately locates and provides details about target objects. It can even interpret user context to find objects that aren't explicitly named.
#Robotics
👋
🔍Check out our new object detector DetGPT! It's unlike any other - using instructions, it can not only find objects, but also reason under complex contexts. Whether you need a cold beer or help in Zelda, DetGPT has you covered.🎯
#objectdetection
#ChatGPT
LMFlow now supports Speculative Decoding! 🥳
Experience faster model inference without retraining or tweaking the architecture! We're seeing close to a 4x speed boost (gpt2-xl <- gpt2). 🚀
Check out the speculative_inference.py in LMFlow!
After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you:
⭐ Base and Instruct models of 5 sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Having been trained on data in 27 additional
Check out our LMFlow to use LISA, a memory-efficient finetuning algorithm that allows a tradeoff between memory and the number of randomly unfrozen layers.:
🌟 With the buzz around
#Mixtral
on Twitter, we're thrilled to share our previous work published at ACL2023, the mixture-of-domain-adapters.
🔍 We explore the effective combination of multiple domain adapters. Check out our research:
🔗
🚀 Introducing Flextron - a Many-in-One LLM - Oral at ICML!
Train one model and get many optimal models for each GPU at inference without any additional retraining. 🌟
🔗 Paper:
Main benefits with only 5% post-training finetuning:
✅ Best model for
We open-source a new benchmark for evaluating open-source LLMs! 🚀 It is cheap and user-friendly! Also, we release our model Robin-Chat-V2, achieving competitive performance on chitchat, commonsense, and instruction following. Try it out:
🤖💬Evaluating chat-style Large Language Models just got easier with LMFlow benchmark! A free and user-friendly evaluation framework now open-sourced for the entire community. Say goodbye to expensive human labeling and API calls. See 🌟
#LMFlow
#ChatGPT
Further analysis surprisingly shows that learning uncertainty during training yields better results than directly applying uncertainty filtering on test data.
It unveils a surprising bonus: learning uncertainty leads to better calibration and improved prediction ability.
LMFlow proposes a new alignment algorithm RAFT🚣 supporting RLHF! It aligns the model with human preferences & personalization through reward function ranking. More efficient & easy to use than PPO. Try it now! 🚀
Paper:
#ChatGPT
#alignment
#RLHF
Very insightful findings! I find the research in small language models prefers MMLU-cloze, where most of SLMs perform at chance-level on MMLU. In addition to the training token perspective, model size is also related. The discrete metric also explains the emergent ability :)
MMLU performance is at a chance level even after training for 210B tokens for the standard formulation (the model is presented with all the choices and asked to predict the most relevant choice). But MMLU-Cloze gives a better signal during the early stages of the training.
[5/n]
#emnlp2022
Encountered an embarrassing situation: one reviewer thought our work lacks novelty because similar ideas have been proposed by previous arXiv paper A. However, A is our paper, and 80% same as our submission. How to explain this without violating double-blind policy
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
When the question is out of the parametric knowledge, it will try to make up something and fail to indicate when it lacks knowledge.
AstroLLaMA is upgraded! Thrilled to announce AstroLLaMA-chat, the successor of AstroLLaMA that you can interact with!
* Demo:
* Paper:
// Powered by LMFlow
Excited to share our new work, ExtremeBERT, an easy-to-use toolkit for accelerating your language model pre-training on customized datasets!
📃Paper:
⭐️Code:
🔍Documentation:
(1/n)
In this paper, we instruct LLMs to reject unknown questions by (1) measuring the knowledge gap between parametric knowledge and finetuning data, (2) constructing the refusal-aware data by padding the uncertainty expression, and (3) finetuning the model on the refusal-aware data.
I definitely learnt at lot from all your advice & guidance, super grateful for the warm and generous support from Blender Lab during my research exchange ~ thank you so much! 😊
@colinraffel
Congrats on such great work! We studied "Mixture-of-Adapters"
@ACL2023
, where we proposed training specialized adapters and utilizing a gate mechanism to fuse knowledge from different adapters. You might find it relevant, and we are happy to discussions!
@YangyiChen6666
SaySelf trains LLMs in two stages. First, supervised fine-tuning lets them generate self-reflective rationales and precise confidence estimates. Then, reinforcement learning refines these estimates through task supervision. [3/n]
Our method is called Refusal-Aware Instruction Tuning(R-Tuning).
The experiments demonstrate the ability to refuse to answer uncertain questions and improve the accuracy of the willingly answered questions.
It also exhibits superior generalization performance on unseen datasets.
Glad to introduce our new work PTUnifier which is accepted by ICCV 2023 recently.🥳
Our latest paper presents a new prompt-based method for unifying medical vision-and-language pre-training.
Paper:
Code:
Recently, there are several __biomedical generalist foundation models__ with extremely large scales, e.g., MedPaLM M from
@GoogleAI
.
Last year, we also developed a small yet powerful one, PTUnifer. Code has been released () now 🔥.
@wabyking
@shizhediao
GPT-4 still doesn't support multimodal inputs☹️, but Bard can. And so can our MiniBard! 🥳
With our LMFlow toolkit, building your own multimodal conversational AI is now easier than ever.
#NLP
#AI
#multimodality
Code:
Personalized chatbot toolbox LMFlow now supports image inputs!😆 With LMFlow, you can build "MiniBard" with ease!😉 Currently we support chatbots with minigpt4 + robin 7b/13b inference, the multimodal finetuning service will be available soon~
@TheTuringPost
Excited to share our Robin-7b and -13b, performing quite competitively to Vicuna.
We also release a full solution of fine-tuning including instruction tuning and alignment tuning, LMFlow ()
Skill
#2
Finetuning: I developed a large language model fine-tuning pipeline LMFlow (), allowing fine-tuning and deploying personalized LLMs with minimal cost and effort. It has accumulated 7000+ stars⭐️ on GitHub.
@donglixp
We released an image-text foundation model with unified objectives (PrefixLM) in our previous work DaVinci (Prefix Language Models are Unified Modal Learners, ).Glad to see BEIT-3 achieves excellent performance.Congratulations! Hope to see more discussions!
Skill
#3
Prompting: I have experience in LLM reasoning and released two works: automate-CoT () and active-Prompt (). I also proposed to tune LLMs with black-box prompts (BDPL ).
@YangyiChen6666
Case studies demonstrate SaySelf’s capability to generate insightful self-reflective rationales that effectively capture the internal uncertainty in LLMs. [5/n]
Skill
#1
Pretraining: I pretrained a BERT-based model (ZEN) in 2019 achieving SOTA on several benchmark datasets (published in Findings of EMNLP 2020). I also have some experience in vision-language pre-training and published work in ICLR 2023 during my internship in Bytedance.
@YangyiChen6666
We present SaySelf to teach LLMs to generate more accurate and fine-grained confidence estimates. It goes beyond the confidence elicitation in previous work, enabling LLMs to generate self-reflective rationales, indicating the knowledge gap and explaining the uncertainty. [2/n]
We find that by combining prefix language modeling and prefix image modeling, we can effectively train a multi-modal foundation model (we named as DaVinci🤣) capable of variety of tasks across modalities (language / vision / vision+language) and types (understanding / generation)
Skill
#4
Vision-Language: Check out my recent work Detect What You Need via Reasoning (EMNLP 2023), Generative Vision-Language Pre-training (ICLR 2023), and a Multi-Task Benchmark for Evaluating Vision-Language Models (ICML 2022).
@YangyiChen6666
LLMs often hallucinate and fail to indicate uncertainty, necessitating more effective methods for obtaining accurate confidence estimates. Previous methods to derive confidence from LLMs often yield imprecise or binary estimates, failing to truly capture model confidence. [1/n]
@shangbinfeng
Very interesting work! The refusal / abstention capability of large models is crucial for them, and I have been focusing on this for a long time (e.g., our R-Tuning). It is nice to see that multi-LLM collaboration can assist in abstention.😀
@paul_cal
@_jasonwei
Yeah, thanks for sharing! Strongly agree with Jason that comparison is a complication, and this is exactly the step where we put in the most effort. Fortunately, inspired by the object detection, we introduced average precision to consider both precision and recall (Section 3.3).
@mrcrp96
@huggingface
For the inference of 13b model, you may need about 26GB GPU memory.
For the delta weights, you can directly merge it with the llama models. please refer to
Thanks!
Four highlighted features of LMFlow:
🥳Extensible: Support common backbones (LLaMA, Galactica, GPT-2, etc.)
🚀Light-Weight: Extremely few parameters with LoRA
🎯Task-Oriented: Comparable with ChatGPT on 7B/30B models
🌐Open: The whole pipeline is open-source
(2/n)
@5hoges
yes that's true. but 0-parameter LLM has 0-accuracy and 100% IDK right? which is worse than a model with 80% accuracy and 20% IDK (what we are aiming at).
Three highlighted features of ExtremeBERT:
🥳Easy-to-use Pipeline: one-line command pipeline without pain
🚀Acceleration: train your own BERT in one day
🌐Customized Datasets: compatible with huggingface datasets, support customization as well
(2/n)
We systematically compare a wide range of methods, including model averaging regularization methods, RL methods, LoRA, experience replay… We found that model averaging is surprisingly strong!