younes Profile
younes

@younesbelkada

3,793
Followers
294
Following
126
Media
1,490
Statuses

Joined July 2022
Don't wanna be here? Send us removal request.
@younesbelkada
younes
1 year
Llama-2 just got released by @Meta AI and you can already use it in the @huggingface ecosystem. How to fine-tune the model on your own data? We release a simple fine-tuning script for single & multi-gpu to get you ready in few lines of code
Tweet media one
9
266
1K
@younesbelkada
younes
2 years
You asked for it. You can now fine-tune a model that has been loaded in 8-bit. With 8-bit fine-tuning each 1B parameters only cost 1 GB of GPU RAM to fine-tune, making it easy to fine-tune any large models. Colab to fine-tune OPT-6.7B in Int8 below 🧵
Tweet media one
11
159
891
@younesbelkada
younes
2 years
The first trillion parameter model on the Hub 🤯 Today we are proud to announce the release of the first Mixture of Experts (MoE) 🧙 models into @huggingface transformers! You can now easily, run, train and explore this fascinating architecture in the Hugging Face ecosystem! ⬇️
14
129
841
@younesbelkada
younes
1 year
A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: Useful resources below
Tweet media one
@Tim_Dettmers
Tim Dettmers
1 year
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: Code+Demo: Samples: Colab:
Tweet media one
90
948
4K
5
169
753
@younesbelkada
younes
2 years
Fine tune a 20B Language Model with RLHF using a 24GB consumer GPU? 🤯 It is now possible using TRL + PEFT! Check out the blogpost that explains how we achieve this step by step! Blogpost:
Tweet media one
6
161
718
@younesbelkada
younes
1 year
New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!
Tweet media one
8
103
524
@younesbelkada
younes
2 years
Interested in applying RLHF (Reinforcement Learning with Human Feedback)? Try out trl! At @huggingface we now officially support RLHF training using PPO (Proximal Policy Optimization) Train your easily model in single, or multi-GPU setup. 🧵
Tweet media one
3
87
459
@younesbelkada
younes
1 year
MatCha and DePlot from @GoogleAI ! 🧠 A set of foundation models for plots and charts that can perform complex visual reasoning tasks such as plot summarisation/VQA. When combined with instruction-tuned LMs, you can create interesting demos, such as the one below ↓
Tweet media one
8
88
446
@younesbelkada
younes
10 months
IPO algorithm, a new method from Google Deepmind: has been just added in Hugging Face TRL library ! Try it out now by installing TRL from source, simply pass `loss_type="ipo"` when initializing DPOTrainer:
5
72
429
@younesbelkada
younes
2 years
BLIP-2 8bit! 🧠 @Salesforce has uploaded the first multi-modal chatbot on Hugging Face Hub! 🤯 BLIP2 has been released and open-sourced last week by @Salesforce , run your model in 8-bit and start dialoguing with it with a few lines of code!
Tweet media one
7
55
289
@younesbelkada
younes
7 months
Mixtral on a free-tier Google Colab with AQLM-2bit quantization ! 🤯 Similarly as Quip#, Aqlm quantization method makes it possible to squueze down LLMs into impressive compression format, with a peak memory of ~13GB for mixtral ! notebook:
5
55
277
@younesbelkada
younes
2 years
You liked Flan-T5? 🍮 You'll like Flan-UL2 - now on Hugging Face - even more! Thanks @YiTayML @google for making the weights of the Flan-UL2 model open-source! Repo: Spaces: Inference endpoint: 🧵
Tweet media one
11
42
273
@younesbelkada
younes
9 months
Following up from the great work from community that enabled bitsandbytes 4-bit serialization, I pushed Mixtral-Instruct-bnb-4bit on @huggingface for anyone that wants to easily load the model
7
32
243
@younesbelkada
younes
2 years
Do you know that you can load @OpenAI 's Whisper model in 8-bit using LLM.int8() from bitsandbytes & @TimDettmers ? How this quantization technique affects the performance of the model? @ArthurZucker ran some evaluation with 8-bit models and here are the results ⬇️
Tweet media one
5
36
244
@younesbelkada
younes
1 year
New release for PEFT library! 🔥 Do you know that you can now easily "merge" the LoRA adapter weights into the base model, and use the merged model as a standalone model? How this is possible? Let's do some math... 🧵 Release notes:
Tweet media one
5
60
231
@younesbelkada
younes
1 year
MPT models from @MosaicML are now part of the @huggingface transformers family 🤗 Among other things, this means a direct integration with PEFT library; you can fine-tune the 7B model on a free Google Colab instance or the 30B model on a 40GB GPU! 🧵
Tweet media one
5
60
233
@younesbelkada
younes
11 months
New feature alert! 🚨 NEFTune, a new regularization technique to enhance model's performance for Supervised Fine-tuning has been now shipped in @huggingface 's TRL library! Boost your model performances with a single line of code! 🧵
Tweet media one
6
56
208
@younesbelkada
younes
1 year
Brand new release for TRL library! - Train LLMs on small infra with QLoRA - Instruction tune in a few lines with the SFTTrainer - Simple reward modeling with the RMTrainer And much more! Checkout the 🧵 with all features:
@argilla_io
Argilla
1 year
🔥Thrilled to share our new tutorial: Collecting human preference data and training a reward model with the awesome trl by @huggingface The very first end-to-end example of the new Reward Trainer of trl Colab with @younesbelkada & @lvwerra
Tweet media one
1
25
93
4
44
188
@younesbelkada
younes
6 months
Very excited about this new collaboration with team @AndrewYNg , HF team @_marcsun @mariaKhalusova ! You will learn how to get started with transformers library and HF ecosystem and learn how to build AI applications from scratch - enjoy!
@AndrewYNg
Andrew Ng
6 months
New short course: Open Source Models with Hugging Face 🤗, taught by @mariaKhalusova , @_marcsun , and Younes Belkada! @huggingface has been a game changer by letting you quickly grab any of hundreds of thousands of already-trained open source models to assemble into new
39
194
1K
7
24
148
@younesbelkada
younes
7 months
Huge news for the Open Source community ! 💎 @Google just released a new model - Gemma - a series of Language Models from 2B to 7B parameters with their instruction fine-tuned versions ! Play with it right now with and below is a thread of all the things you can do 🧵
Tweet media one
3
27
172
@younesbelkada
younes
2 years
A new model has been added for Christmas in @huggingface transformers! 🎄 With BLIP from Salesforce you can perform 1- Visual question answering 2- Image captioning (with and without context) 3- Image-text retrieval Here are some cool demos you can build easily with this model 🧵
3
40
172
@younesbelkada
younes
4 months
Another quantization method dropped in @huggingface transformers library ! Half Quadratic Quantization 🔥 HQQ implements on-the-fly quantization via fast robust optimization. It doesn’t require calibration data and can be used to quantize any model, up to 1-bit precision !
Tweet media one
5
33
168
@younesbelkada
younes
2 years
Can we detoxify a language model using the same techniques as in RL? Yes! We used TRL to detoxify LM with up to 6B parameters. A report of our journey 🧵 Docs: Demo:
7
44
160
@younesbelkada
younes
9 months
Blazing fast text generation using AWQ and fused modules! 🚀 Up to 3x speedup compared to native fp16 that you can use right now on any models supported by @TheBlokeAI Simply pass an `AwqConfig` with `do_fuse=True` to `from_pretrained` method!
6
22
161
@younesbelkada
younes
1 year
A new release for TRL ! 🤯 1- Larger models support using Naive Pipeline Parallelism: you can now fit 100B scale models and apply RLHF! 2- PEFT + data parallelism: parallelise your training to train on more data and more GPUs Full release notes here:
Tweet media one
0
41
150
@younesbelkada
younes
1 year
Pix2Struct, a set of image captioning & Visual Question Answering models have been recently released by @GoogleAI and added on the Hub - this includes no less than 20 new fine-tuned checkpoints! 🤯 How to use it in @huggingface transformers?
Tweet media one
2
34
149
@younesbelkada
younes
9 months
In the recent TRL release (0.7.6) we are happy to ship many cool features to the community with respect to DPOTrainer! IPO, KTO, cDPO loss, support for pre-computed logits, and many more! Check out the thread below for more details 🧵
Tweet media one
3
34
150
@younesbelkada
younes
10 months
New feature alert! 🚨 transformers 4.35.0 just got released .. and NEFTune is now extended into transformers' Trainer API! Simply pass `neftune_noise_alpha=xxx` in your TrainingArguments .. and that's it!
Tweet media one
4
16
143
@younesbelkada
younes
1 year
Great that Segment Anything from @Meta has been open sourced! You can now use SAM with @huggingface transformers with few lines of code and generate segmentation masks. How to use it? 🧵 More to come very soon in the 🤗 ecosystem!
Tweet media one
4
33
136
@younesbelkada
younes
11 months
Diffusers + PEFT 🤝 We have recently shipped PEFT integration in diffusers! You can now easily combine LoRA adapters from any format in @huggingface diffusers easily with few lines of code Check out in the thread below the new features and API ! 🧵
Tweet media one
7
44
137
@younesbelkada
younes
11 months
Did you know that you can use 🤗 PEFT library to inject LoRA/AdaLoRA/IA3 into any PyTorch module? Simply use `inject_adapter_in_model` by passing the corresponding peft config and model. As simple as that! 🧵
Tweet media one
1
25
133
@younesbelkada
younes
10 months
Few months ago, researchers from MIT-Han Lab released AWQ The method is now supported in 🤗 transformers library ! As simple as 1- `pip install autoawq` or install llm-awq kernels and 2- call `from_pretrained` A great work from MIT-Han lab folks, Casper Hansen & @TheBlokeAI 🧵
Tweet media one
2
21
129
@younesbelkada
younes
1 year
PEFT + Transformers 🤝 With the new release of PEFT we propose an alternative and new way to load and train adapters with PEFT as backend What does that change in your codebase and what are the differences? 🧵
Tweet media one
4
38
121
@younesbelkada
younes
7 months
A great work from the community to make @huggingface transformers + quantization easily extensible to add new quantization methods support in transformers core! poedator worked hard to create a a new class: `HfQuantizer` - why this is important and what the class does exactly?
Tweet media one
1
16
116
@younesbelkada
younes
1 year
Excited to announce a new set of features in TRL library: RewardTrainer and SFTTrainer! Also with PEFT support 🔥 You can now use TRL and get all the components out of the box for an end-to-end Reinforcement Learning With Human Feedback (RLHF) training
Tweet media one
2
39
118
@younesbelkada
younes
2 years
Use BetterTransformer from @PyTorch for faster @huggingface models using a one liner. On CPU and GPU. Supports most text models, vision models and audio models. Use it now! blogpost: Colab demo: ⬇️
Tweet media one
@PyTorch
PyTorch
2 years
Better Transformer for #PyTorch out of the box performance on 🤗 @huggingface models now available! Want to know more about the collaboration? 👀 the blog: 🖥️ Watch the 🤗 talk at #PyTorchConference w/ @GuggerSylvain & @lysandrejik :
1
19
111
2
20
116
@younesbelkada
younes
8 months
For increasing QLoRA performance one should use LoftQ initialization trick: - amazing thing is that it should work OTB for SFT, DPO, etc. Did anyone tried it with DPO ? Read more about LoftQ:
2
22
115
@younesbelkada
younes
7 months
New Release for TRL library ! 0.7.11 DPO & IPO fixes, faster data processing on multi-GPU envs and many more nice contributions by the community ! 🔥
Tweet media one
1
21
100
@younesbelkada
younes
10 months
New release for PEFT library ! 0.6.0 version 🤩 Diffusers integration, merging 4bit weights, new adapter methods and many more! Check out the highlights of this new release in this thread, or check out the release notes:
Tweet media one
2
26
99
@younesbelkada
younes
4 months
You are new to @huggingface transformers+quantization? We just refactored the quantization documentation a bit and we made it clearer for users with respect to what features are supported for each quantization method Any feedback appreciated !
Tweet media one
2
26
98
@younesbelkada
younes
9 months
Thanks @MuCai7 @imhaotian for your great work on open-sourcing VipLlava - a new version of Llava ! Thanks to that, the model is now fully integrated with @huggingface transformers for anyone to use it
Tweet media one
2
25
92
@younesbelkada
younes
6 months
Excited to share the 0.8.0 release for TRL that packs many new features - CLI (Command Line Interfaces), KTOTrainer and QLoRA + FSDP integration ! Run SFT, DPO and chat with your model directly from the terminal ! 🧵
3
14
88
@younesbelkada
younes
9 months
unsloth: seems to be a great library for LLM-finetuning that relies on HF ecosystem (PEFT, TRL, ..) and compatible with HF models, has anyone tried it yet and what do you think?
2
13
86
@younesbelkada
younes
1 year
Did you know that flash-attention 1 was already integrated in @huggingface transformers? Let us see how to use it and when it is not possible to use it 🧵
Tweet media one
3
27
84
@younesbelkada
younes
4 months
🚨 New optimizer in @huggingface transformers Trainer 🚨 LOMO optimizer can be now used in transformers library Great work from LOMO authors! 🧵
Tweet media one
2
17
80
@younesbelkada
younes
1 year
Brand new release for PEFT library! This includes more user friendly experience for PEFT users You can now load a PEFT model in one line of code using auto mapping from PEFT - what does that mean and how / when to use it? 🧵 Full release notes:
Tweet media one
3
20
79
@younesbelkada
younes
9 months
Awesome to see Llava / BakLlava integrated in transformers with out of the box support for optimizations such as pipeline bitsandbytes and Flash Attention! Big thanks to @imhaotian , all Llava and Bakllava authors for making this happen!
4
17
76
@younesbelkada
younes
1 year
With the recent release of @huggingface transformers and @TimDettmers bitsandbytes you can push and load 8bit models out of the box! Let's make model repositories lighter and open up many usecases (Google Colab etc.) Who is going to push the first bloom-176-8bit on the Hub? 🧵
2
13
74
@younesbelkada
younes
5 months
Llama-3 from @metaai is out ! Fine-tune the model and chat with it directly from the terminal using TRL ! Below is an example of fine-tuning Llama-3 with QLoRA
Tweet media one
1
7
71
@younesbelkada
younes
1 year
Fine-tune BLIP2 on captioning custom images at low cost using int8 quantization and PEFT on a Google Colab! 🧠 Here we decided to fine-tune BLIP2 on some favorite football players! Code: Notebook link:
Tweet media one
5
22
66
@younesbelkada
younes
2 years
You can now fine-tune Whisper large on a single Google colab using 8-bit quantization! 🤯
0
7
65
@younesbelkada
younes
1 year
New release for @huggingface PEFT library 🤩 GPTQ integration, low-level API, .. What is new for the release ? 🧵 Release notes:
Tweet media one
1
12
66
@younesbelkada
younes
4 months
🚨 New PEFT release ! 0.11.x 🚨 Among new PEFT methods (BOFT, VeRA, PiSSA initialization) you can also use LoRA with HQQ and EETQ quantization methods!
Tweet media one
2
17
67
@younesbelkada
younes
2 years
Check out the latest blogpost on @huggingface What is 8-bit quantization? What is so special about LLM.int8() quantization? How to use it in your Hugging Face models? We answer all these questions there!
0
15
63
@younesbelkada
younes
1 year
I asked an RLHF-ed Llama if I should use Flutter or React Native to build an app, and here what it said Try now the amazing app: 🦙
Tweet media one
3
15
62
@younesbelkada
younes
8 months
If you are interested in faster LLM fine-tuning, have a look at unsloth library ! @danielhanchen made a very nice and clear blogpost on how to use the library together with the expected speedups With reproducible GColab notebooks!
@danielhanchen
Daniel Han
8 months
Want to finetune LLMs 2x faster and use 60% less VRAM? We did a blog with @huggingface to show how @unslothai makes SFT & DPO 2x faster with QLoRA! We provide benchmarks on A100s and Tesla T4s, and provide 4 free finetuning notebooks through Google Colab!
10
32
170
2
17
63
@younesbelkada
younes
6 months
🚨 New PEFT release 🚨 0.10.0 Check out the new 0.10.0 release for PEFT that packs some new features! How can we efficiently perform layer expansion using LoRA? Let's dive into a new feature: layer replication / expansion 🧵
1
14
60
@younesbelkada
younes
4 months
🚨 New feature alert in transformers 🚨 Load GGUF models and convert them in transformers format - you can load any supported GGUF model from the Hub and convert it to transformers with this simple API 🧵
Tweet media one
2
12
57
@younesbelkada
younes
2 years
What a contribution from @mennen_lars & @_navjotts_ You can now load and use flan-t5-xxl and flan-t5-xl in 8bit with no performance degradation 🔥 The PR for the fix:
Tweet media one
1
12
53
@younesbelkada
younes
11 months
You can combine xformers memory efficient attention together with PEFT + diffusers thanks to from GitHub! Below is the script to mix a toy style and pixel art adapter to produce a combined result ! Check out the documentation:
Tweet media one
4
11
52
@younesbelkada
younes
2 years
MoE models are considered to be the next breakthrough of NLP architectures due to their efficient scalable properties (👀 GPT-4) The openly accessible Switch Transformers () scales up to 1.6T parameters ! 🤯 - But what is exactly an "expert"?
1
3
51
@younesbelkada
younes
8 months
A new feature in the 🤗 Ecosystem - Introducing Model tagging You can add custom tags into your transformers model in order to filter them out. Below is an example on how to tag a model with `trl` and `dpo` tags and push the model on the Hub! ⏬
Tweet media one
3
17
51
@younesbelkada
younes
9 months
New release for PEFT! PEFT is now safer through safetensors, faster and has more adapter methods pip install -U peft
Tweet media one
1
12
50
@younesbelkada
younes
2 years
Very cool to see that BLIP from @salesforce has been integrated to @huggingface transformers! Based on @NielsRogge 's tutorial on how to fine-tune GiT on your custom image captioning dataset, we just made a similar notebook for BLIP: Check it out now!
0
10
50
@younesbelkada
younes
9 months
Thanks to efforts from the community, 4-bit models from bitsandbytes are now serializable and you can push them on @huggingface Hub ! Great work from everyone involved in this!
2
10
48
@younesbelkada
younes
2 years
... seems that you can also load all these models in 8-bit for free using @Tim_Dettmers bitsandbytes library! Check it out!
Tweet media one
@art_zucker
Arthur Zucker
2 years
@quocleix Ohhh looks like FLAN T5 is now also available in 🤗 ✌️👉() Was that quick? @younesbelkada anything to add?
Tweet media one
3
19
140
1
8
44
@younesbelkada
younes
4 months
EETQ library now integrated in @huggingface ecosystem (transformers & PEFT) - great work from the EETQ authors and collaborators ! EETQ uses a simple int8 RTN quantization with efficient kernels for faster inference 🚀
Tweet media one
1
10
41
@younesbelkada
younes
6 months
🚨 New release for PEFT ! 🚨 Support for AWQ & AQLM quantization - you can now: - Train adapters on top of 2-bit quantized models with AQLM - Train adapters on top of powerful AWQ quantized models Note for inference you can't merge the LoRA weights into the base model!
4
8
42
@younesbelkada
younes
11 months
Very cool to see @salesforce collection for the BLIP family on 🤗 Hub ! You can now easily pick up your favorite BLIP model and use it right away. BLIP models are one of my favorite architectures as of today - curious to hear what is your favorite collection on the Hub?
Tweet media one
2
6
39
@younesbelkada
younes
1 year
Big milestone for @salesforce AI team! Blip image captioning models have now reached more than half-million downloads on 🤗 @huggingface Hub! 🤯 How do you use these models and what have you built on top of them?
Tweet media one
2
6
33
@younesbelkada
younes
5 months
You can now upvote @huggingface blogposts ! What is your favorite blogpost and what blogpost would you like to see next?
Tweet media one
0
4
35
@younesbelkada
younes
6 months
Galore officially supported in @huggingface transformers ! check out the blogpost below for more details !
@Titus_vK
Titus von Koeller
6 months
🔥 Level up your model training w/ GaLore + Transformers for SOTA results on consumer-grade hardware! ⬇️ 82.5% less optimizer state memory footprint without performance degradation by expressing the gradient weight matrix as low rank. 🔬 Read the blog:
Tweet media one
2
37
158
0
6
33
@younesbelkada
younes
2 years
Now, users can easily exchange adapter weights and take advantage of fine-tuned performance without downloading the entire set of fine-tuned models. This leverages a new library in the @huggingface ecosystem called “peft” (Parameter Efficient Fine-tuning)
1
0
33
@younesbelkada
younes
2 years
We also release a demo on how to fine-tune your first MoE model on text summarisation, check it out, fine-tune your first MoE and share it on the Hub!
1
2
30
@younesbelkada
younes
1 year
Very excited about the recent @1littlecoder 's tutorial on how to finetune Falcon-7B using many HF ecosystem tools such as TRL, PEFT, transformers: SFTTrainer from TRL offers a smoother UX when training models. Check it out now:
Tweet media one
2
9
32
@younesbelkada
younes
1 year
New release for TRL library ! Brand new README, multiple important bug fixes for the SFTTrainer. Very cool to see the adoption of TRL in more and more OSS libraries, such as autotrain: Many very cool features coming in the next weeks, so stay tuned! 🧵
Tweet media one
2
7
30
@younesbelkada
younes
7 months
New Release for PEFT library: 0.8.0 ! Very glad to see amazing contributions from the community to add new PEFT methods and enhancing the UX for LoRA ! Check out the full release notes: 🧵
Tweet media one
1
9
30
@younesbelkada
younes
1 year
Few days ago, I had the honour to attend the DataFest conference at the American University of Yerevan, in Armenia! I learned a lot during that conference and met with a lot of amazing people. Curious about what I have talked about? ⬇️ 👀
Tweet media one
1
4
29
@younesbelkada
younes
1 year
New release for TRL library! 🚨 DDPO (Denoising Diffusion Policy Optimization) for applying RLHF in diffusion models is now part of TRL library! A great contribution from metric-space that adds the first support for diffusion models in TRL
Tweet media one
1
11
28
@younesbelkada
younes
2 years
Do you want to get rid of OOM issues when using huggingface models🤗 on GPU? Load your model in 8bit using bitsandbytes with `load_in_8bit=True`! Learn more about this feature here: in case you missed it - Check out also the new release of bitsandbytes!
Tweet media one
@Tim_Dettmers
Tim Dettmers
2 years
Release 0.35 of bitsandbytes brings CUDA 11.8 to the library, making it more straightforward to fine-tune #stablediffusion Dreambooth on 12 GB Colab! At this point, bnb has been pip installed more than 100k times. Thanks for all your support and bug reports!
1
11
85
0
1
27
@younesbelkada
younes
2 years
With this integration we hope to make RLHF applied to Large Language Models more accessible to the community! Docs and links to scripts: Stay tuned for the next steps in TRL!
Tweet media one
4
2
25
@younesbelkada
younes
6 months
This is live on bitsandbytes 0.43.0 !
@jeremyphoward
Jeremy Howard
6 months
Today, with @Tim_Dettmers , @huggingface , & @mobius_labs , we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵
85
682
4K
1
2
25
@younesbelkada
younes
2 years
Link to the largest model (Switch-C):
2
2
23
@younesbelkada
younes
2 years
Few months ago, @GoogleAI publicly released Switch Transformers, a set of MoE models from 300M to 1.6T parameters! They are all available on the Hub for anyone to try them out! Note that the weights are pre-trained, one should fine-tune these models before using them!
1
0
24
@younesbelkada
younes
7 months
There are now cute and nice tags on @huggingface Hub such as axolotl / TRL tags ! Which tag should we add next on the Hub? 👀
Tweet media one
Tweet media two
2
1
23
@younesbelkada
younes
2 years
The architecture is pretty much similar to a classic T5 model, with the Feed Forward layer replaced by a Sparse Feed Forward layer. Each token will be processed by a different MLP thanks to the router module that will guide the processed token to a specific MLP (called "expert")
Tweet media one
1
1
23
@younesbelkada
younes
1 year
You can now load GPTQ models out of the box thanks to @huggingface transformers and AutoGPTQ library, the integration comes also with various features (ex-llama kernels, PEFT support, ..) check out the thread below for more details.
@_marcsun
Marc Sun
1 year
LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost:
6
59
225
0
5
22
@younesbelkada
younes
2 years
.. and the colab link to fine-tune your model that is loaded in int8: : !
3
3
19
@younesbelkada
younes
9 months
Check out the thread below for the summary of the amazing work from @casper_hansen_ and the community ! 🔥 You can now run Mixtral-AWQ in 4-bit using autoawq and transformers
@casper_hansen_
Casper Hansen
9 months
Since the summer of 2023, I have been working hard on AutoAWQ. You can now quantize Mixtral 8x7B, LLaVa, and other models and run inference through the transformers integration. 🧵 1/5
4
14
89
0
2
19
@younesbelkada
younes
1 year
Use 4bit quantization to fit the model on your GPU device and train LoRA adapters on top of it, this method, termed as QLoRA has been introduced by @ArtidoroPagnoni , @Tim_Dettmers et al. in this paper:
1
2
18
@younesbelkada
younes
2 years
DPT-hybrid depth estimation model from @Intel labs has just been added to @huggingface transformers main branch! Guess what is coming next ... 🧨 Check the hint below
1
6
19
@younesbelkada
younes
1 year
Leveraging 4bit, you can even fine-tune the largest model (70B) in a single A100 80GB GPU card! 🤯
1
0
18
@younesbelkada
younes
2 years
Run locally the model now using @huggingface transformers! You can run it in half-precision or in 8bit for memory efficient usage! 🔥
Tweet media one
2
0
17
@younesbelkada
younes
2 years
Step 1: load the active model in 8-bit precision, thus drastically reduce the memory footprint of the model, you now need roughly 1GB per billion parameter This means that to load a 20B parameter, you would need roughly 20GB GPU VRAM to fit the model into it
Tweet media one
1
1
17
@younesbelkada
younes
7 months
5/ Load the GGUF quantized models GGUF weights have been updated on the model repositories, you can load them using llama.cpp ! Find the GGUF weights here:
3
1
15
@younesbelkada
younes
2 years
Step 3: In PPO we also need a copy of the reference model. No worries, we can iteratively deactivate the adapters when doing inference, thus use the same model to act as a reference and active model!
Tweet media one
1
0
16
@younesbelkada
younes
2 years
As we strive to make large models to be more accessible to anyone, the model supports int8 loading using @TimDettmers bitsandbytes, making it accessible for some checkpoints to run in Google Colab!
Tweet media one
1
1
16
@younesbelkada
younes
1 year
General usage notebook for loading any HF model in 4bit:
1
3
16
@younesbelkada
younes
2 years
日本からのお知らせ! @eltociear のコントリビューションのおかげで @huggingface tranformersのREADMEが日本語で翻訳されました! ぜひご覧ください! この先もコントリビューションよろしくお願いします 🙏
0
6
16