younes @younesbelkada Twitter profile

Last Seen Profiles

@Azam_195

@ClickNothing

@MtgArtcollector

@whl_ru

@RedTix

@Joe_Bennett_

@DingYuShao

@alwathiq203

@YasielPuig

@IDSMP

@DoodstreamNow

@ftwesports

@kodacars

@Robbiestapleto2

@sizzlerbuild

@7rop_

@Teresa1004

@OceanTownAES

@caferio

@LKabalama

@abg_padu18

@domothy

@ftmferraz

@MICAgrad

@kagaku_PHMH

@SPSullivan1972

@Rex__Smith

@BBCBenWright

@adelm_tye

@farmingtales

@SenAishaWahab

@fcb_oka

@iam_syrina

@stwmaniax

@Confidence58144

@metastreetapes

younes

@younesbelkada

1 year

Llama-2 just got released by @Meta AI and you can already use it in the @huggingface ecosystem. How to fine-tune the model on your own data? We release a simple fine-tuning script for single & multi-gpu to get you ready in few lines of code

9

266

1K

younes

@younesbelkada

2 years

You asked for it. You can now fine-tune a model that has been loaded in 8-bit. With 8-bit fine-tuning each 1B parameters only cost 1 GB of GPU RAM to fine-tune, making it easy to fine-tune any large models. Colab to fine-tune OPT-6.7B in Int8 below 🧵

11

159

891

younes

@younesbelkada

2 years

The first trillion parameter model on the Hub 🤯 Today we are proud to announce the release of the first Mixture of Experts (MoE) 🧙 models into @huggingface transformers! You can now easily, run, train and explore this fascinating architecture in the Hugging Face ecosystem! ⬇️

14

129

841

younes

@younesbelkada

1 year

A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: Useful resources below

Tim Dettmers

@Tim_Dettmers

1 year

QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: Code+Demo: Samples: Colab:

90

948

4K

5

169

753

younes

@younesbelkada

2 years

Fine tune a 20B Language Model with RLHF using a 24GB consumer GPU? 🤯 It is now possible using TRL + PEFT! Check out the blogpost that explains how we achieve this step by step! Blogpost:

6

161

718

younes

@younesbelkada

1 year

New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!

8

103

524

younes

@younesbelkada

2 years

Interested in applying RLHF (Reinforcement Learning with Human Feedback)? Try out trl! At @huggingface we now officially support RLHF training using PPO (Proximal Policy Optimization) Train your easily model in single, or multi-GPU setup. 🧵

3

87

459

younes

@younesbelkada

1 year

MatCha and DePlot from @GoogleAI ! 🧠 A set of foundation models for plots and charts that can perform complex visual reasoning tasks such as plot summarisation/VQA. When combined with instruction-tuned LMs, you can create interesting demos, such as the one below ↓

8

88

446

younes

@younesbelkada

10 months

IPO algorithm, a new method from Google Deepmind: has been just added in Hugging Face TRL library ! Try it out now by installing TRL from source, simply pass `loss_type="ipo"` when initializing DPOTrainer:

DPO Trainer

huggingface.co

5

72

429

younes

@younesbelkada

2 years

BLIP-2 8bit! 🧠 @Salesforce has uploaded the first multi-modal chatbot on Hugging Face Hub! 🤯 BLIP2 has been released and open-sourced last week by @Salesforce , run your model in 8-bit and start dialoguing with it with a few lines of code!

7

55

289

younes

@younesbelkada

7 months

Mixtral on a free-tier Google Colab with AQLM-2bit quantization ! 🤯 Similarly as Quip#, Aqlm quantization method makes it possible to squueze down LLMs into impressive compression format, with a peak memory of ~13GB for mixtral ! notebook:

aqlm_transformers.ipynb

Colaboratory notebook

colab.research.google.com

5

55

277

younes

@younesbelkada

2 years

You liked Flan-T5? 🍮 You'll like Flan-UL2 - now on Hugging Face - even more! Thanks @YiTayML @google for making the weights of the Flan-UL2 model open-source! Repo: Spaces: Inference endpoint: 🧵

11

42

273

younes

@younesbelkada

9 months

Following up from the great work from community that enabled bitsandbytes 4-bit serialization, I pushed Mixtral-Instruct-bnb-4bit on @huggingface for anyone that wants to easily load the model

ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit · Hugging Face

huggingface.co

7

32

243

younes

@younesbelkada

2 years

Do you know that you can load @OpenAI 's Whisper model in 8-bit using LLM.int8() from bitsandbytes & @TimDettmers ? How this quantization technique affects the performance of the model? @ArthurZucker ran some evaluation with 8-bit models and here are the results ⬇️

5

36

244

younes

@younesbelkada

1 year

New release for PEFT library! 🔥 Do you know that you can now easily "merge" the LoRA adapter weights into the base model, and use the merged model as a standalone model? How this is possible? Let's do some math... 🧵 Release notes:

5

60

231

younes

@younesbelkada

1 year

MPT models from @MosaicML are now part of the @huggingface transformers family 🤗 Among other things, this means a direct integration with PEFT library; you can fine-tune the 7B model on a free Google Colab instance or the 30B model on a 40GB GPU! 🧵

5

60

233

younes

@younesbelkada

11 months

New feature alert! 🚨 NEFTune, a new regularization technique to enhance model's performance for Supervised Fine-tuning has been now shipped in @huggingface 's TRL library! Boost your model performances with a single line of code! 🧵

6

56

208

younes

@younesbelkada

1 year

Brand new release for TRL library! - Train LLMs on small infra with QLoRA - Instruction tune in a few lines with the SFTTrainer - Simple reward modeling with the RMTrainer And much more! Checkout the 🧵 with all features:

GitHub - huggingface/trl: Train transformer language models with reinforcement learning.

Train transformer language models with reinforcement learning. - huggingface/trl

github.com

Argilla

@argilla_io

1 year

🔥Thrilled to share our new tutorial: Collecting human preference data and training a reward model with the awesome trl by @huggingface The very first end-to-end example of the new Reward Trainer of trl Colab with @younesbelkada & @lvwerra

1

25

93

4

44

188

younes

@younesbelkada

6 months

Very excited about this new collaboration with team @AndrewYNg , HF team @_marcsun @mariaKhalusova ! You will learn how to get started with transformers library and HF ecosystem and learn how to build AI applications from scratch - enjoy!

DeepLearning.AI: Start or Advance Your Career in AI

DeepLearning.AI | Andrew Ng | Join over 7 million people learning how to use and build AI through our online courses. Earn certifications, level up your skills, and stay ahead of the industry.

www.deeplearning.ai

Andrew Ng

@AndrewYNg

6 months

New short course: Open Source Models with Hugging Face 🤗, taught by @mariaKhalusova , @_marcsun , and Younes Belkada! @huggingface has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new

39

194

1K

7

24

148

younes

@younesbelkada

7 months

Huge news for the Open Source community ! 💎 @Google just released a new model - Gemma - a series of Language Models from 2B to 7B parameters with their instruction fine-tuned versions ! Play with it right now with and below is a thread of all the things you can do 🧵

3

27

172

younes

@younesbelkada

2 years

A new model has been added for Christmas in @huggingface transformers! 🎄 With BLIP from Salesforce you can perform 1- Visual question answering 2- Image captioning (with and without context) 3- Image-text retrieval Here are some cool demos you can build easily with this model 🧵

3

40

172

younes

@younesbelkada

4 months

Another quantization method dropped in @huggingface transformers library ! Half Quadratic Quantization 🔥 HQQ implements on-the-fly quantization via fast robust optimization. It doesn’t require calibration data and can be used to quantize any model, up to 1-bit precision !

5

33

168

younes

@younesbelkada

2 years

Can we detoxify a language model using the same techniques as in RL? Yes! We used TRL to detoxify LM with up to 6B parameters. A report of our journey 🧵 Docs: Demo:

7

44

160

younes

@younesbelkada

9 months

Blazing fast text generation using AWQ and fused modules! 🚀 Up to 3x speedup compared to native fp16 that you can use right now on any models supported by @TheBlokeAI Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method!

6

22

161

younes

@younesbelkada

1 year

A new release for TRL ! 🤯 1- Larger models support using Naive Pipeline Parallelism: you can now fit 100B scale models and apply RLHF! 2- PEFT + data parallelism: parallelise your training to train on more data and more GPUs Full release notes here:

0

41

150

younes

@younesbelkada

1 year

Pix2Struct, a set of image captioning & Visual Question Answering models have been recently released by @GoogleAI and added on the Hub - this includes no less than 20 new fine-tuned checkpoints! 🤯 How to use it in @huggingface transformers?

2

34

149

younes

@younesbelkada

9 months

In the recent TRL release (0.7.6) we are happy to ship many cool features to the community with respect to DPOTrainer! IPO, KTO, cDPO loss, support for pre-computed logits, and many more! Check out the thread below for more details 🧵

3

34

150

younes

@younesbelkada

10 months

New feature alert! 🚨 transformers 4.35.0 just got released .. and NEFTune is now extended into transformers' Trainer API! Simply pass `neftune_noise_alpha=xxx` in your TrainingArguments .. and that's it!

4

16

143

younes

@younesbelkada

1 year

Great that Segment Anything from @Meta has been open sourced! You can now use SAM with @huggingface transformers with few lines of code and generate segmentation masks. How to use it? 🧵 More to come very soon in the 🤗 ecosystem!

4

33

136

younes

@younesbelkada

11 months

Diffusers + PEFT 🤝 We have recently shipped PEFT integration in diffusers! You can now easily combine LoRA adapters from any format in @huggingface diffusers easily with few lines of code Check out in the thread below the new features and API ! 🧵

7

44

137

younes

@younesbelkada

11 months

Did you know that you can use 🤗 PEFT library to inject LoRA/AdaLoRA/IA3 into any PyTorch module? Simply use `inject_adapter_in_model` by passing the corresponding peft config and model. As simple as that! 🧵

1

25

133

younes

@younesbelkada

10 months

Few months ago, researchers from MIT-Han Lab released AWQ The method is now supported in 🤗 transformers library ! As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained` A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI 🧵

2

21

129

younes

@younesbelkada

1 year

PEFT + Transformers 🤝 With the new release of PEFT we propose an alternative and new way to load and train adapters with PEFT as backend What does that change in your codebase and what are the differences? 🧵

4

38

121

younes

@younesbelkada

7 months

A great work from the community to make @huggingface transformers + quantization easily extensible to add new quantization methods support in transformers core! poedator worked hard to create a a new class: `HfQuantizer` - why this is important and what the class does exactly?

1

16

116

younes

@younesbelkada

1 year

Excited to announce a new set of features in TRL library: RewardTrainer and SFTTrainer! Also with PEFT support 🔥 You can now use TRL and get all the components out of the box for an end-to-end Reinforcement Learning With Human Feedback (RLHF) training

2

39

118

younes

@younesbelkada

2 years

Use BetterTransformer from @PyTorch for faster @huggingface models using a one liner. On CPU and GPU. Supports most text models, vision models and audio models. Use it now! blogpost: Colab demo: ⬇️

PyTorch

@PyTorch

2 years

Better Transformer for #PyTorch out of the box performance on 🤗 @huggingface models now available! Want to know more about the collaboration? 👀 the blog: 🖥️ Watch the 🤗 talk at #PyTorchConference w/ @GuggerSylvain & @lysandrejik :

1

19

111

2

20

116

younes

@younesbelkada

8 months

For increasing QLoRA performance one should use LoftQ initialization trick: - amazing thing is that it should work OTB for SFT, DPO, etc. Did anyone tried it with DPO ? Read more about LoftQ:

LoRA

huggingface.co

2

22

115

younes

@younesbelkada

7 months

New Release for TRL library ! 0.7.11 DPO & IPO fixes, faster data processing on multi-GPU envs and many more nice contributions by the community ! 🔥

1

21

100

younes

@younesbelkada

10 months

New release for PEFT library ! 0.6.0 version 🤩 Diffusers integration, merging 4bit weights, new adapter methods and many more! Check out the highlights of this new release in this thread, or check out the release notes:

2

26

99

younes

@younesbelkada

4 months

You are new to @huggingface transformers+quantization? We just refactored the quantization documentation a bit and we made it clearer for users with respect to what features are supported for each quantization method Any feedback appreciated !

2

26

98

younes

@younesbelkada

9 months

Thanks @MuCai7 @imhaotian for your great work on open-sourcing VipLlava - a new version of Llava ! Thanks to that, the model is now fully integrated with @huggingface transformers for anyone to use it

2

25

92

younes

@younesbelkada

6 months

Excited to share the 0.8.0 release for TRL that packs many new features - CLI (Command Line Interfaces), KTOTrainer and QLoRA + FSDP integration ! Run SFT, DPO and chat with your model directly from the terminal ! 🧵

3

14

88

younes

@younesbelkada

9 months

unsloth: seems to be a great library for LLM-finetuning that relies on HF ecosystem (PEFT, TRL, ..) and compatible with HF models, has anyone tried it yet and what do you think?

GitHub - unslothai/unsloth: Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less...

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth

github.com

2

13

86

younes

@younesbelkada

1 year

Did you know that flash-attention 1 was already integrated in @huggingface transformers? Let us see how to use it and when it is not possible to use it 🧵

3

27

84

younes

@younesbelkada

4 months

🚨 New optimizer in @huggingface transformers Trainer 🚨 LOMO optimizer can be now used in transformers library Great work from LOMO authors! 🧵

2

17

80

younes

@younesbelkada

1 year

Brand new release for PEFT library! This includes more user friendly experience for PEFT users You can now load a PEFT model in one line of code using auto mapping from PEFT - what does that mean and how / when to use it? 🧵 Full release notes:

3

20

79

younes

@younesbelkada

9 months

Awesome to see Llava / BakLlava integrated in transformers with out of the box support for optimizations such as pipeline bitsandbytes and Flash Attention! Big thanks to @imhaotian , all Llava and Bakllava authors for making this happen!

4

17

76

younes

@younesbelkada

1 year

With the recent release of @huggingface transformers and @TimDettmers bitsandbytes you can push and load 8bit models out of the box! Let's make model repositories lighter and open up many usecases (Google Colab etc.) Who is going to push the first bloom-176-8bit on the Hub? 🧵

2

13

74

younes

@younesbelkada

5 months

Llama-3 from @metaai is out ! Fine-tune the model and chat with it directly from the terminal using TRL ! Below is an example of fine-tuning Llama-3 with QLoRA

1

7

71

younes

@younesbelkada

1 year

Fine-tune BLIP2 on captioning custom images at low cost using int8 quantization and PEFT on a Google Colab! 🧠 Here we decided to fine-tune BLIP2 on some favorite football players! Code: Notebook link:

5

22

66

younes

@younesbelkada

2 years

You can now fine-tune Whisper large on a single Google colab using 8-bit quantization! 🤯

0

7

65

younes

@younesbelkada

1 year

New release for @huggingface PEFT library 🤩 GPTQ integration, low-level API, .. What is new for the release ? 🧵 Release notes:

1

12

66

younes

@younesbelkada

4 months

🚨 New PEFT release ! 0.11.x 🚨 Among new PEFT methods (BOFT, VeRA, PiSSA initialization) you can also use LoRA with HQQ and EETQ quantization methods!

2

17

67

younes

@younesbelkada

2 years

Check out the latest blogpost on @huggingface What is 8-bit quantization? What is so special about LLM.int8() quantization? How to use it in your Hugging Face models? We answer all these questions there!

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers,...

huggingface.co

0

15

63

younes

@younesbelkada

1 year

I asked an RLHF-ed Llama if I should use Flutter or React Native to build an app, and here what it said Try now the amazing app: 🦙

3

15

62

younes

@younesbelkada

8 months

If you are interested in faster LLM fine-tuning, have a look at unsloth library ! @danielhanchen made a very nice and clear blogpost on how to use the library together with the expected speedups With reproducible GColab notebooks!

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

huggingface.co

Daniel Han

@danielhanchen

8 months

Want to finetune LLMs 2x faster and use 60% less VRAM? We did a blog with @huggingface to show how @unslothai makes SFT & DPO 2x faster with QLoRA! We provide benchmarks on A100s and Tesla T4s, and provide 4 free finetuning notebooks through Google Colab!

10

32

170

2

17

63

younes

@younesbelkada

6 months

🚨 New PEFT release 🚨 0.10.0 Check out the new 0.10.0 release for PEFT that packs some new features! How can we efficiently perform layer expansion using LoRA? Let's dive into a new feature: layer replication / expansion 🧵

Release v0.10.0: Fine-tune larger QLoRA models with DeepSpeed and FSDP, layer replication, enhance...

Highlights Support for QLoRA with DeepSpeed ZeRO3 and FSDP We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allow...

github.com

1

14

60

younes

@younesbelkada

4 months

🚨 New feature alert in transformers 🚨 Load GGUF models and convert them in transformers format - you can load any supported GGUF model from the Hub and convert it to transformers with this simple API 🧵

2

12

57

younes

@younesbelkada

2 years

What a contribution from @mennen_lars & @_navjotts_ You can now load and use flan-t5-xxl and flan-t5-xl in 8bit with no performance degradation 🔥 The PR for the fix:

1

12

53

younes

@younesbelkada

11 months

You can combine xformers memory efficient attention together with PEFT + diffusers thanks to from GitHub! Below is the script to mix a toy style and pixel art adapter to produce a combined result ! Check out the documentation:

4

11

52

younes

@younesbelkada

2 years

MoE models are considered to be the next breakthrough of NLP architectures due to their efficient scalable properties (👀 GPT-4) The openly accessible Switch Transformers () scales up to 1.6T parameters ! 🤯 - But what is exactly an "expert"?

Switch Transformers: Scaling to Trillion Parameter Models with...

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is...

arxiv.org

1

3

51

younes

@younesbelkada

8 months

A new feature in the 🤗 Ecosystem - Introducing Model tagging You can add custom tags into your transformers model in order to filter them out. Below is an example on how to tag a model with `trl` and `dpo` tags and push the model on the Hub! ⏬

3

17

51

younes

@younesbelkada

9 months

New release for PEFT! PEFT is now safer through safetensors, faster and has more adapter methods pip install -U peft

1

12

50

younes

@younesbelkada

2 years

Very cool to see that BLIP from @salesforce has been integrated to @huggingface transformers! Based on @NielsRogge 's tutorial on how to fine-tune GiT on your custom image captioning dataset, we just made a similar notebook for BLIP: Check it out now!

0

10

50

younes

@younesbelkada

9 months

Thanks to efforts from the community, 4-bit models from bitsandbytes are now serializable and you can push them on @huggingface Hub ! Great work from everyone involved in this!

2

10

48

younes

@younesbelkada

2 years

... seems that you can also load all these models in 8-bit for free using @Tim_Dettmers bitsandbytes library! Check it out!

Arthur Zucker

@art_zucker

2 years

@quocleix Ohhh looks like FLAN T5 is now also available in 🤗 ✌️👉() Was that quick? @younesbelkada anything to add?

3

19

140

1

8

44

younes

@younesbelkada

4 months

EETQ library now integrated in @huggingface ecosystem (transformers & PEFT) - great work from the EETQ authors and collaborators ! EETQ uses a simple int8 RTN quantization with efficient kernels for faster inference 🚀

1

10

41

younes

@younesbelkada

6 months

🚨 New release for PEFT ! 🚨 Support for AWQ & AQLM quantization - you can now: - Train adapters on top of 2-bit quantized models with AQLM - Train adapters on top of powerful AWQ quantized models Note for inference you can't merge the LoRA weights into the base model!

4

8

42

younes

@younesbelkada

11 months

Very cool to see @salesforce collection for the BLIP family on 🤗 Hub ! You can now easily pick up your favorite BLIP model and use it right away. BLIP models are one of my favorite architectures as of today - curious to hear what is your favorite collection on the Hub?

2

6

39

younes

@younesbelkada

1 year

Big milestone for @salesforce AI team! Blip image captioning models have now reached more than half-million downloads on 🤗 @huggingface Hub! 🤯 How do you use these models and what have you built on top of them?

2

6

33

younes

@younesbelkada

5 months

You can now upvote @huggingface blogposts ! What is your favorite blogpost and what blogpost would you like to see next?

0

4

35

younes

@younesbelkada

6 months

Galore officially supported in @huggingface transformers ! check out the blogpost below for more details !

Titus von Koeller

@Titus_vK

6 months

🔥 Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware! ⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank. 🔬 Read the blog:

2

37

158

0

6

33

younes

@younesbelkada

2 years

Now, users can easily exchange adapter weights and take advantage of fine-tuned performance without downloading the entire set of fine-tuned models. This leverages a new library in the @huggingface ecosystem called “peft” (Parameter Efficient Fine-tuning)

1

0

33

younes

@younesbelkada

1 year

Demo: Papers: MatCha: DePlot:

DePlot: One-shot visual language reasoning by plot-to-table translation

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least...

arxiv.org

3

5

33

younes

@younesbelkada

2 years

We also release a demo on how to fine-tune your first MoE model on text summarisation, check it out, fine-tune your first MoE and share it on the Hub!

HF-Demo-SwitchTransformers-summarization.ipynb

Colaboratory notebook

colab.research.google.com

1

2

30

younes

@younesbelkada

1 year

Very excited about the recent @1littlecoder 's tutorial on how to finetune Falcon-7B using many HF ecosystem tools such as TRL, PEFT, transformers: SFTTrainer from TRL offers a smoother UX when training models. Check it out now:

2

9

32

younes

@younesbelkada

1 year

New release for TRL library ! Brand new README, multiple important bug fixes for the SFTTrainer. Very cool to see the adoption of TRL in more and more OSS libraries, such as autotrain: Many very cool features coming in the next weeks, so stay tuned! 🧵

2

7

30

younes

@younesbelkada

7 months

New Release for PEFT library: 0.8.0 ! Very glad to see amazing contributions from the community to add new PEFT methods and enhancing the UX for LoRA ! Check out the full release notes: 🧵

1

9

30

younes

@younesbelkada

1 year

Few days ago, I had the honour to attend the DataFest conference at the American University of Yerevan, in Armenia! I learned a lot during that conference and met with a lot of amazing people. Curious about what I have talked about? ⬇️ 👀

1

4

29

younes

@younesbelkada

1 year

New release for TRL library! 🚨 DDPO (Denoising Diffusion Policy Optimization) for applying RLHF in diffusion models is now part of TRL library! A great contribution from metric-space that adds the first support for diffusion models in TRL

1

11

28

younes

@younesbelkada

2 years

Do you want to get rid of OOM issues when using huggingface models🤗 on GPU? Load your model in 8bit using bitsandbytes with `load_in_8bit=True`! Learn more about this feature here: in case you missed it - Check out also the new release of bitsandbytes!

Tim Dettmers

@Tim_Dettmers

2 years

Release 0.35 of bitsandbytes brings CUDA 11.8 to the library, making it more straightforward to fine-tune #stablediffusion Dreambooth on 12 GB Colab! At this point, bnb has been pip installed more than 100k times. Thanks for all your support and bug reports!

1

11

85

0

1

27

younes

@younesbelkada

2 years

With this integration we hope to make RLHF applied to Large Language Models more accessible to the community! Docs and links to scripts: Stay tuned for the next steps in TRL!

4

2

25

younes

@younesbelkada

6 months

This is live on bitsandbytes 0.43.0 !

Jeremy Howard

@jeremyphoward

6 months

Today, with @Tim_Dettmers , @huggingface , & @mobius_labs , we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵

85

682

4K

1

2

25

younes

@younesbelkada

2 years

Link to the largest model (Switch-C):

google/switch-c-2048 · Hugging Face

huggingface.co

2

23

younes

@younesbelkada

2 years

Few months ago, @GoogleAI publicly released Switch Transformers, a set of MoE models from 300M to 1.6T parameters! They are all available on the Hub for anyone to try them out! Note that the weights are pre-trained, one should fine-tune these models before using them!

1

0

24

younes

@younesbelkada

7 months

There are now cute and nice tags on @huggingface Hub such as axolotl / TRL tags ! Which tag should we add next on the Hub? 👀

2

1

23

younes

@younesbelkada

2 years

The architecture is pretty much similar to a classic T5 model, with the Feed Forward layer replaced by a Sparse Feed Forward layer. Each token will be processed by a different MLP thanks to the router module that will guide the processed token to a specific MLP (called "expert")

1

23

younes

@younesbelkada

1 year

You can now load GPTQ models out of the box thanks to @huggingface transformers and AutoGPTQ library, the integration comes also with various features (ex-llama kernels, PEFT support, ..) check out the thread below for more details.

Marc Sun

@_marcsun

1 year

LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost:

6

59

225

0

5

22

younes

@younesbelkada

2 years

.. and the colab link to fine-tune your model that is loaded in int8: : !

Finetune-opt-bnb-peft.ipynb

Colaboratory notebook

colab.research.google.com

3

19

younes

@younesbelkada

9 months

Check out the thread below for the summary of the amazing work from @casper_hansen_ and the community ! 🔥 You can now run Mixtral-AWQ in 4-bit using autoawq and transformers

casperhansen/mixtral-instruct-awq · Hugging Face

huggingface.co

Casper Hansen

@casper_hansen_

9 months

Since the summer of 2023, I have been working hard on AutoAWQ. You can now quantize Mixtral 8x7B, LLaVa, and other models and run inference through the transformers integration. 🧵 1/5

4

14

89

0

2

19

younes

@younesbelkada

1 year

Use 4bit quantization to fit the model on your GPU device and train LoRA adapters on top of it, this method, termed as QLoRA has been introduced by @ArtidoroPagnoni , @Tim_Dettmers et al. in this paper:

1

2

18

younes

@younesbelkada

2 years

DPT-hybrid depth estimation model from @Intel labs has just been added to @huggingface transformers main branch! Guess what is coming next ... 🧨 Check the hint below

1

6

19

younes

@younesbelkada

1 year

Leveraging 4bit, you can even fine-tune the largest model (70B) in a single A100 80GB GPU card! 🤯

1

0

18

younes

@younesbelkada

2 years

Run locally the model now using @huggingface transformers! You can run it in half-precision or in 8bit for memory efficient usage! 🔥

2

0

17

younes

@younesbelkada

2 years

Step 1: load the active model in 8-bit precision, thus drastically reduce the memory footprint of the model, you now need roughly 1GB per billion parameter This means that to load a 20B parameter, you would need roughly 20GB GPU VRAM to fit the model into it

1

17

younes

@younesbelkada

7 months

5/ Load the GGUF quantized models GGUF weights have been updated on the model repositories, you can load them using llama.cpp ! Find the GGUF weights here:

google/gemma-7b at main

huggingface.co

3

1

15

younes

@younesbelkada

2 years

Step 3: In PPO we also need a copy of the reference model. No worries, we can iteratively deactivate the adapters when doing inference, thus use the same model to act as a reference and active model!

1

0

16

younes

@younesbelkada

2 years

As we strive to make large models to be more accessible to anyone, the model supports int8 loading using @TimDettmers bitsandbytes, making it accessible for some checkpoints to run in Google Colab!