@ylecun
« unpredictable regulatory environment. »
You broke the already existing rules of not using personal and private data of your users to train any model, now dangling those models and complaining about EU’s regulations? If you want to cross the line of going from research to
Feeling the gap between the Llama-3-8B and Llama-3-70B models by
@AIatMeta
? Not sure how to use your extra vRAM? Look no further!
I am excited to introduce three new Llama-3 models in 11B, 13B, and 16B sizes!
Find all 3 models on
@huggingface
1/4 🚀 Exciting news for AI enthusiasts! Check out NuExtract, a cutting-edge LLM designed for structured extraction tasks. It transforms any text into a structured output with just a template!
🤗 Open-source and available on
@huggingface
!
🌟 More info:
Introducing an experimental Llama-3-8B-Instruct with 32k context-length in GGUF format:
- A big thanks to
@winglian
for doing the through test
- A big thanks to
@nisten
for running tests
Available on
@huggingface
and
@LMStudioAI
The most downloaded model I've ever had on
@huggingface
is the GGUF models for Llama-3-70B!!! 🚀
It has been downloaded more than 36,000 times in just under 24 hours!!! You people need to get a life! ❤️🙏🏽
Finally! A fix for Llama-3 tokenizer has been merged!
We had to make workarounds, but if the original tokenizer has the "eos_token" correctly set, we won't be needing any extra steps anymore.
Something terrible happened over the weekend! I waited until now, but it's time to bring it up.
@MistralAI
has put all their models behind gated access. You must now individually accept their use of your data for each model!
I have uploaded them all back on
@huggingface
It finally happened! I made it to the Top 10 of the Open LLM Leaderboard by
@huggingface
, right on the edge! Thank you all! ❤️
This is the very first fine-tuned model I've created based on Llama-3-70B, released by
@metaai
. I will be releasing 16 more fine-tuned models!🚀
Wait, we have new DeepSeek models?! For coding this time! 😳
It’s a 236B MoE that supports up to 338 programming languages! 😱 and it has 128k context length! 🚀
Available on
@huggingface
:
Google released PaliGemma models! They are open vision-language model inspired by PaLI-3 and built with open components such as the SigLIP vision model and the Gemma language model.
This space includes models fine-tuned on a mix of downstream tasks, inferred via
@huggingface
🤗
You guys are just too much! 😂 More than 45K downloads in less than 24 hours?
The local LLM community on
@huggingface
is on fire!🔥
Hugging Face recently introduced a "Use this model" to directly load your favorite model onto your desktop.
Say hello to the new model, 🦎Meta Chameleon, just released yesterday by
@AIatMeta
! 🔥
Available in 7B and 34B sizes that support mixed-modal input and text-only outputs.
Thanks for the open-source multimodal models! 🚀 (Please release them on
@huggingface
)
@mervenoyann
I duplicated your LLaVA NeXT space and made this one for LLaVa-Llama-3-8B:
Very interesting model! The space needs some improvements, especially in the chat interface to fill the height. Thanks for the jump start 😊
🎉 Excited to share some incredible milestones in my journey with fine-tuning Llama-3 and Phi-3 models on
@huggingface
! 🚀
🔹 My fine-tuned Llama-3 70B model is ranked
#1
among all other FT models
🔹 My Phi-3 Mini fine-tuned model holds the
#1
spot among all Phi-3 Mini FT
Weekend plan! 🎉
Attempting to learn TextGrad by
@stanford
, perhaps a mix of:
- AutoGrad,
- DSPy,
- natural language gradients,
- and some magic!
Just when I thought I was getting DSPy, boom, TextGrad drops in! My brain: "You shall not pass!" 😂
Wow! This thing is flying! This is a 2-bit quantized model of Qwen-7B-Instruct! It can’t get any smaller!
You can open this directly from
@huggingface
into
@LMStudioAI
. 🚀
Congrats to
@Alibaba_Qwen
and the whole team! 👏🏽💙
Thrilled to announce Mistral-Nemo-Instruct-2407 support in Llama.cpp!
This wouldn't have been possible without the incredible contributions from the community. A huge thank you to everyone who participated and helped make this happen! 🩷
🔥Having fun with Dolly v2 by
@databricks
!
Here is a quick demo of Dolly v2 (12B). Thanks to TextStreamer by
@huggingface
I can now get that nice feeling of ChatGPT🤗
Just uploaded "WizardLM_evol_instruct_V2_196k" back on the
@huggingface
Datasets.
It was removed and I used this dataset a lot in my fine-tuning. Hope it helps others.
Something big is about to happen to the Open LLM Leaderboard by
@huggingface
! 🤗
Place your bets, people! The closest guess wins a 1-year free HF Pro subscription. 🥳 (just kidding)
@OpenLLMLeaders
I spent the whole weekend trying to implement parallel function calling with local LLMs! I had some success, and now I'm off to create some GGUFs for the newly released 'Yi-1.5' models, which are licensed under Apache 2.0 by
@01AI_Yi
.
There is new Mistral-7B V3! It just dropped an hour ago by
@MistralAI
on
@huggingface
- Both Base and Instruct are available 👏🏽
- 32K context length
- Extended vocabulary to 32768
- Function calling support!
Another day, another series of Llama-3 models released on
@HuggingFace
! This time, it's the big brother, the 70B! 🚀
Since Llama-3's release last week, I've created 27 models with more than 230K downloads.
The community's support has been priceless! Thanks to all and
@metaai
❤️
@DC_Draino
@InvestigateJ6
Wait, is this real? What did I just watch? What was the point of launching that into the crowd for no reason! (Also, he seemed pretty proud!)
And it's done! 🚀 Every single possible GGUF model for Mixtral-8x22B-Instruct-v0.1 by
@MistralAI
is available to use on
@huggingface
.
From IQ1 all the way to the FP16! (you'll have imatrix data as well)
Thanks 🙏🏽
Phi-3: Do you love me? ❤️
Coming up next to
@huggingface
, new series of Phi-3 fine-tunes. 🚀
- I love it to code!
- I love it to say everything in JSON!
- I love it to talk!
Let's see if they will love me back!
🚀 The first fine-tuned models to score higher than Llama-3-70B & achieve the best MMLU/GSM8K at the same time!
- 3 out of the top 10 models on the Open LLM Leaderboard are now dominated by these fine-tuned models
- Achieved the highest MMLU / GSM8K on
@huggingface
Leaderboard
Oh my god! It's raining Llama-3 today!
Based on an amazing work of
@winglian
❤️, I created "Llama-3-8B-Instruct-64k" model!
Already being tested, quantized, and uploaded on
@huggingface
🚀
Who wants them?!
I'm up to 96k context for Llama 3 8B. Using PoSE, we did continued pre-training of the base model w 300M tokens to extend the context length to 64k. From there we increased the RoPE theta to further attempt to extend the context length.
🧵
🎉 Exciting news for all Qwen fans! 🚀 I've just released 8 new models based on the powerful Qwen2 base model!
Developed by
@Alibaba_Qwen
, these models are leading the way at
@OpenLLMLeaders
🤗 Find them all on
@huggingface
(7B models are under Apache 2.0 license 💙):
🚨New GGUF alert!
- 5x new high-quality IQ based quantized models for "Meta-Llama-3-70B-Instruct" by
@AIatMeta
- 10x new GGUF models for "Llama-3-Smaug-8B" by
@abacusai
Models are available on
@huggingface
(links are in the comments) - as always, thanks for the support ❤️
How is the fine-tuning of Qwen2-72B going so far? Getting there! 😅 Who's winning:
- MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
- MaziyarPanahi/Qwen2-72B-Instruct-v0.1
- meta-llama/Meta-Llama-3-70B-Instruct
- Qwen/Qwen2-72B
Thanks,
@winglian
for Qwen2 + FSDP in
@axolotl_ai
🚀
🎉 Another big news today! Llama.cpp now supports SmolLM models by
@huggingface
! 🤖
All 3 models will be quantized and available soon 🔥
✨ HuggingFaceTB/SmolLM-135M-Instruct
🌟 HuggingFaceTB/SmolLM-360M-Instruct
💪 HuggingFaceTB/SmolLM-1.7B-Instruct
Thrilled to announce Mistral-Nemo-Instruct-2407 support in Llama.cpp!
This wouldn't have been possible without the incredible contributions from the community. A huge thank you to everyone who participated and helped make this happen! 🩷
I have generated over 6K rows for a synthetic French CoT Legal dataset, and I'm quite satisfied with the results achieved using only a local LLM.
Special thanks to
@Teknium1
for the "Nous-Hermes-2-Mixtral-8x7B-DPO" model! Excellent balance between quality and inference speed.
I love using local LLMs to generate synthetic datasets! I test various models, have Claude judge the outputs, and then choose the best local LLM for each dataset.
@NousResearch
, this model consistently produces high-quality results. Any insights into why?
Excuse me?! Did they seriously just drop a record-breaking model like it's nothing? Zero shame! 🥰
Hey
@huggingface
, how about a new feature? Trending Alert: ping us when a model gets 20+ likes in 24 hours? For science, obviously! 🤗😅
New model added to the leaderboard!
Model Name
Overall rank: 1530
Rank in 3B category: 1
Benchmarks
Average: 70.26
ARC: 63.48
HellaSwag: 80.86
MMLU: 69.24
TruthfulQA: 60.66
Winogrande: 72.77
GSM8K: 74.53
From my experience testing Gemma-2 (27B), recently released by Google in the Medical Advanced Rag, I've learned the following.
Specs:
- DSPy
- Citations
- Long context (up to 7K to work with Llama-3 & Gemma-2)
- Handling patient reports (messy)
- Must answer fully!
New model added to the leaderboard!
Model Name
Overall rank: 527
Rank in 70+B category: 42
Benchmarks
Average: 75.04
ARC: 67.58
HellaSwag: 86.4
MMLU: 77.19
TruthfulQA: 54.68
Winogrande: 83.98
GSM8K: 80.44
So it is finally here! The new and much improved Open LLM Leaderboard 2.0! 👿
In honor of this new leaderboard, I am publishing new fine-tuned Qwen2 models! 🚀
Follow my 👑Queen collection on
@huggingface
for all the new Qwen2 models!
Now that's why I subscribe to
@huggingface
Pro plan! I don't even have access to the model, but I can hit those Inference Endpoints to use Llama-3.1 models!!! 🤗
One of the community members tried the IQ3-XS quants of "WizardLM-2-8x22B" by
@WizardLM_AI
—this is such a complicated question!
Such an advanced and coherent response! I am quite impressed!
People started to enjoy the new "
WizardLM-2-8x22B" by
@WizardLM_AI
in GGUF format!
If this is IQ3-XS, I don't even wanna know what the 8bit is capable of! 🚀
I am happy to announce the release of the v0.3 fine-tuning of Llama-3-8B-Instruct using the DPO dataset. As always, the template is ChatML! 😊
You can download it on
@huggingface
I am about to upload my second fine-tuned Llama-3-8B-Instruct (v0.2) on
@huggingface
In the meantime, could you please explain to me what is happening with my 3rd run? (v0.3)
Loss starts at : 0.6931
Loss ends at : 0.0026
How big is your LLM? 😁 Putting "databricks/dolly-v2-3b" to work! 😎
- I love taking AI solutions apart and trying to replace the closed parts with open-source solutions
- Nothing wrong with OpenAI, in fact, it's a great service. But it locks you in & you need to share data
Seriously?! It hasn't even been a day, and you've all downloaded the GGUF models as if your lives depend on them! Animals! ❤️
Nearly 17K downloads on
@huggingface
for the newly released Mathstral-7B model by
@MistralAI
! 🤗
It's like Black Friday for nerds!
We just released the first LLama-3 8B with a context length of over 160K onto Hugging Face! SOTA LLMs can learn to operate on long context with minimal training (< 200M tokens, powered by
@CrusoeEnergy
's compute) by appropriately adjusting RoPE theta.
🔗
Up next on
@huggingface
! Coming to you this week:
- New fine-tuned Llama-3-70B models
- New fine-tuned Llamixtral-3 models (Mixture of Llama-3 in 24B and 47B)
- New fine-tuned Qwen1.5-32B models
Oh no! I just kicked myself out of the top 10 on the Open LLM Leaderboard
@huggingface
! 🤣
v0.4, welcome to the top 10! Goodbye, v0.1; we wish we could have kept you around!
It finally happened! I made it to the Top 10 of the Open LLM Leaderboard by
@huggingface
, right on the edge! Thank you all! ❤️
This is the very first fine-tuned model I've created based on Llama-3-70B, released by
@metaai
. I will be releasing 16 more fine-tuned models!🚀
Mistral NeMo
Mistral NeMo: our new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license.
Thanks to this beautiful work, here are the GGUF models for the new “Codestral-22B-v0.1” model by
@MistralAI
🩷
You can find them on
@huggingface
and use them directly in your local LLM apps! 🚀
I won't go into detail about why such behavior, especially over the weekend, is far from professional. Nor will I entertain the idea that you should be grateful for free stuff and prepared for them to pull the rug out from under you!
Truly free access:
📢 New dataset alert!
🚀 Ready to use in
@axolotl_ai
! Just drop these into your SFT fine-tune YAML and you're all set! 💻✨ (maybe warm up a bit for Llama-3.1 release tomorrow!)
Can't wait to give this a spin! Big thanks to
@arcee_ai
for making it happen! 🙌
Today Arcee is releasing two datasets:
1. The Tome - this is a 1.75 million sample dataset that has been filtered to train strong generalist models. This is the dataset that was used to train Spark and Nova
2. Agent-Data: This is Arcee-Agent's dataset, comprising different
@mervenoyann
I duplicated your LLaVA NeXT space and made this one for LLaVa-Llama-3-8B:
Very interesting model! The space needs some improvements, especially in the chat interface to fill the height. Thanks for the jump start 😊
🔥 Excited to share the other key Technology of WizardLM-2!
📙AutoEvol: Automatic Instruction Evolving for Large Language Models
🚀We build a fully automated Evol-Instruct pipeline to create high-quality, highly complex instruction tuning data:
-------- 🧵 --------
Damn straight! Mistral just dropped the Mistral 8x22B Instruct weights 🔥
> 90.8% on GSM8K maj
@8
> 44.6% on math maj
@4
Also Mistral throwing shade on Cohere lol
12k downloads in less than 24hours!!! Please have some mercy on
@huggingface
🤗
Make AI accessible, and lo and behold, it's like everyone suddenly has a need for it! ❤️🚀
⚡️ Haha! I am planning to bring OpenAI in-house!
🖥️ Snagging a pre-loved Mac Studio to become my offline AI powerhouse!
💪 I am talking Qwen2, Llama-3, Yi-1.5, and Gemma – all the cool kids of the SLM world, ready to party on one machine!
Lab squad, assemble! 🔥
FastMLX: Turn your powerful Mac into an AI home server 🚀
Using my M3 Max (96GB URAM) to run a VLM, streaming responses to my M1 MacBook Air over WiFi.🔥
> pip install -U fastmlx
I am tired of your mediocrity, the "Real Housewives of AI" show, all the "regarding" statements, and the deception that something incredible is coming!
Maybe it's just me, but I get more from my local LLMs for €0! See ya! 👋🏼
@OpenAI
Finished up testing the experimental quants testing with some interesting results (tldr, no more fp16 embed/output, now q8 embed/output)
Basically across 8 categories I found that quantizing the embeddings and outputs to Q8 was equal to or better than FP16 in 6 of them. I feel
I just re-uploaded back "WizardLM-70B-V1.0" to
@huggingface
This has been one of my favorite LLMs for a long time, trained and released by
@WizardLM_AI
I am uploading it back since it doesn't exist anymore and I think it deserves to live on! ❤️🚀
Spark NLP just hit 100 million downloads on PyPI! 🚀Huge THANK YOU to our incredible community for your support over the past 7 years. I'm profoundly grateful to be part of this remarkable journey with such an inspiring and dedicated team. Here's to many more milestones ahead! 🥳
Considering deploying local AI for your business? Here's something to consider for your team:
1. Mac Studio: Approx. $2,500
- Full-featured workstation
- Versatile for various tasks
2. NVIDIA H100 GPU: $33K-$40K (if available)
- Requires server infrastructure
- High
How long does it take to get distributed inference running locally across 2 MacBook GPUs from a fresh install?
About 60 seconds, running
@exolabs_
Watch till the end, I chat to the cluster using
@__tinygrad__
ChatGPT web interface
Code is open source 👇
I am just gonna say it, being in Central European Time (CET) sucks! You wake up and models were announced, already converted, quantized, hell even fine-tuned on
@huggingface
There's nothing left to do but download and enjoy!
I’ve been testing the new GPT-4o (Omni) in ChatGPT. I am not impressed! Not even a little! Faster, cheaper, multimodal, these are not for me.
Code interpreter, that’s all I care and it’s as lazy as it was before!
We have
@AnthropicAI
in EU, I am considering the switch. Feedbacks?