I’ve been leading a secret project for months … and the word is finally out!
🛠️ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models 🔥
An open source Tool Use full finetune of Llama 3 that reaches the
#1
position on BFCL beating all other models, including
Never write a single shell command ever again! Today I'm releasing Shell AI ✨, an open source CLI you can `pip install shell-ai` right now to run things like `shai git diff but without the json blobs`
MIT licensed. Install, fork, PR & have fun!
Interesting idea from
@karpathy
at CUDA MODE: can LLMs become compilers such that we can skip building (imperfect) abstractions through frameworks and libraries?
If you’re manually prompting you probably want to start thinking about meta prompting strategies that allow you to treat prompting as a programming problem instead of a string manipulation problem.
DSPy is a library that takes a page out of PyTorch’s module based API for
I strongly feel that users should be able to steer which X posts they see based on their personal preferences.
👨💻 Which is why I've created an open source browser extension that uses fast LLM evaluation on posts and auto-hides them when your personal filter crosses the threshold
Frontier level Tool Calling now live on
@GroqInc
powered by Llama 3 🫡
Outperforms GPT-4 Turbo 2024-04-09 and Claude 3 Opus (FC version) in multiple subcategories
At 300 tokens/s 🚀
I've personally been working on this feature, and man, the new Llama is good!
Thank you
@swyx
for organizing the AI Engineer conference 🙏
Here are the key takeaways:
1. Better RAG = better LLM apps. If you’re not moving beyond the basics you’re leaving performance on the table.
2. Structured outputs (Pydantic, TypeScript, OpenAI Functions) and
300 minutes of audio in 10 minutes
Open question: how much better do the internal models of the data generating processes of human thought of LLMs become when fed with 100s of billions of tokens of humans thinking out loud (e.g. podcasts)?
We're expanding our GroqCloud support to image, audio & text. With LLaVA v1.5 7B, developers & businesses can tap into the vast potential of multimodal AI, enabling innovative applications that combine visual, auditory & textual inputs. Read more here:
GPT-4 consistently getting outclassed on code understanding was not on my list for Q1 24’. Great start of the year for AI 🔥
Must read 120K token task 👇
@nonmayorpete
Given GPT-3 equivalent pricing of $0.02 per 1k tokens, avg ChatGPT answer length of 150 tokens, $150B profit would require every human to ask a question every day for the next 34 years assuming 50% margin.
My take on Reflection 70B is simple: the scores I'm seeing on HumanEval, GPQA and MMLU are interesting and point to "training for test-time-inference CoT" seems to be working.
Glad everything is open source so the community can dive deep to see if this is some weird form of
On September 5th,
@mattshumer_
announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data.
Today, I'm sharing model artifacts to reproduce the initial claims and a post-mortem to address
This is a fantastic new standard being set by
@GoogleDeepMind
with Gemma and they should be applauded for it. Should be the new bar for calling a model open weights (open source bar is even higher, and understandably not always an option for org creating the model).
cc
Learn more about what we're up to at
@GroqInc
around tool use/function calling from today's AMA.
h/t to
@karpathy
for discussing his LLM OS ideas publicly - they are a big contributing factor to our vision for driving low-latency agentic loops with Groq LPUs!
@xiao_ted
Maybe MSR should have shipped a ChatGPT level product based on the fruits of their research and leadership would have taken a different stance.
I absolutely LOVE this! Agentless baselines should be mandatory for anyone claiming to have found an agentic approach that is better than direct model prompting.
If we don’t run ablations, how will we learn collectively what works and doesn’t?
Introducing OpenAutoCoder-Agentless😺:
A simple agentless solution solves 27.3% GitHub issues on SWE-bench Lite with ~$0.34 each, outperforming all open-source AI SW agents!
It's fully open-source, try it out:
🧑💻
📝
With accelerated growth at
@GroqInc
we're excited to announce the acquisition of
@DefinitiveIO
. Co-founder and CEO
@sundeep
will head and scale our GroqCloud™ business unit to meet increasing demand for our revolutionary AI inference technology. Read more:
@stanfordnlp
has released a framework for composing retrieval and language models, no need to re-invent the wheel when prompt engineering for knowledge heavy use cases, check out Demonstrate-Search-Predict framework for Python
.
@GroqInc
API support for combining streaming with tool use has been just been released
They quietly announced on their discord just in time for the weekend 🔥
Here is why I think Kevin is wrong. Story time!
A software veteran once confessed to me that they had wasted a few years reinventing a worse version of git. When they found out about git and took a closer look they realized how much better the abstractions in git were and how
I know
@tldraw
went viral with sketch-to-AI but can we just appreciate the attention to detail of their canvas for a moment 🙇♂️
h/t to
@steveruizok
for giving a great presentation in AMS yesterday - the company has a beautiful engineering+tinkering soul seen in few (Framer,
@recursiverealms
@karpathy
The idea is you can let the LLM compile from a much higher level “source code” than the current code that goes into a compiler. Can still be somewhat formal, but definitely doesn’t need to be as detailed as current level of expression (think Python, React as current, think
🚨 We've been working on something very exciting at
@DefinitiveIO
and I can finally show it to you:
Code Indexer Loop: a fully automated vector based indexer for your source code
Apache 2.0, continuous code chunking, embeddding & indexing, based on
Came across this neat project: a 💯% local grammar focused text rewriter for macOS, based on Mistral.
The fact that this is so easy is mind-blowing. It's a lot of "standing on the shoulders of giants", for sure.
Completely free (MIT) too 🙌 h/t
@ivanfioravanti
@MistralAI
Autogram-ollama and Autogram-mlx for your Apple Silicon Devices are here!
Open source, free, easy and fast grammar checker powered by
- Model: Mistral 7B Instruct 0.2
@MistralAI
- Ollama
@ollama
- Apple MLX
@apple
Go, play, copy, fork, experiment, have fun! 🎉🥳
You've probably all seen the phi-1 model from the MSFT paper "Textbooks Are All You Need" While a lot of attention has rightly gone to the efficiency and hence affordability (go Open Source!) that can be achieved in terms of training best-in-class language
Tool Use/Function Calling (beta) for Groq API is now available! 🚀 This highly anticipated feature allows models available on GroqCloud to take user-defined functions as inputs and generate structured output to invoke them from external tools / codebases. .
.
@GroqInc
fast LLM inference is a practical example of the Jevons paradox.
"technological progress increases the efficiency with which a resource is used (reducing the amount necessary for any one use), but the falling cost of use induces increases in demand enough that resource
Groq has set a world record in LLM inference API speed by serving Llama 3.2 1B at >3k output tokens/s 🏁
Meta's Llama 3.2 3B and 1B models are well positioned for two categories of use-cases. Firstly, applications running on edge devices or on-device, where compute resources are
Breaking! GitHub Copilot is experimenting with a new skill based agent. The endpoint lists these skills:
1. Code search
2. Find snippets
3. Find symbols from file
4. Ping
5. Read blob
6. Recent Changes
7. Docs search
And the model is apparently called “copilot-gpt-4-2”
@jeremyphoward
@__tinygrad__
Sounds like a great candidate for
Saw yesterday at PyTorch conference how torch.compile, packing, activation checkpointing, (q)loras allows you to get real ambitious real fast :)
I made a thing while at
@DataCouncilAI
.
@lloydtabb
made me think: wouldn’t it be interesting if you could write SQL only ETL pipelines with a better SQL. Better SQL you ask? Enter Malloy! Check out this Malloy pipeline built on top of Node-RED
@Teknium1
@intrstllrninja
@NousResearch
Definitely think in general you guys are doing the community a service with your public work on LLMs and want to acknowledge that. Added a special mention to both HF repos 🙌
Put Llama3 from
@GroqInc
live in production. The speed boost is incredible, but what's more interesting is our average session duration has jumped from 18 to around 31 minutes! Fast Responses = Better Experience = Stickier product. Thanks
@sundeep
!
I'm excited to announce that I'll be speaking at the AI Quality Conference in SF. What about? My favorite topic: evaluating LLM tool use 🙌
Let me know if you're in town!!
I gave a talk at
@aiDotEngineer
about Tool Use with Open-Source LLMs and luckily many of the other interesting talks at the event were recorded. I've listed some of the most interesting ones in this week's newsletter, check it out:
If you’re working on LLM powered products you want to watch this one:
Why? Oh I don’t know, maybe because this guy is responsible for the most successful LLM powered product in the world: GitHub Copilot. It’s absolute gold, I promise.
Places that host OSS models with pricing per token👇
Fireworks AI
Together AI
OpenRouter
Anyscale Endpoints
Vertex AI
You're welcome 🤗 Know of more? Leave them below!
.
@deepseek_ai
is the best open source code generation model. Just announced: their next code model is based on MoE for even more efficient inference. Checkout their 16B MoE Chat performance gap:
Now imagine that efficiency jump for code:
Honestly, I can't wait 🙌
A new image editing technique quietly landed on the hub 👀🤫
✨Turbo Edit ✨
🌬️blazing fast - works with as little as 3-4 steps
⚡️using SDXL Turbo
✍️ super clever approach for adapting edit friendly ddpm inversion to distilled & fast sampling models
we worked with groq to train the sota open source function calling model, yes literally the best! + we rank
#1
on the berkeley's function calling leaderboard.
if you or your company needs custom language models, try
@glaiveai
. also, we do highly custom language models, dm us :)
This will be _a lot_ of fun. The low latency Speech-to-Speech starter project that I’ve been developing for this hackathon has been continuously blowing my mind, truly where the Groq latency shines. Includes early access to our very low latency Whisper model 👀
We're excited to cosponsor the UC Berkeley AI Hackathon. Don't miss our workshop by
@RickLamers
and increased API rates during the event. Stop by our table to say hi. We can't wait to see what you build on Groq!
What I'm seeing when running benchmarks: boosts on MMLU, GPQA, HumanEval compared to vanilla Instruct Llama 3.1 70B
MATH and GSM8K were misreported earlier because of a bug in the LLM-as-a-judge code as I understand from Sahil.
I rarely try to give people FOMO but if you're not going to Normconf you're ... missing out.
P.S. we couldn’t be more proud to be a gold sponsor of the conference as we consider it a vote for this fresh and positive direction for the data industry. 11/11
We're in the LLM build phase, everyone is building and a lot of tools are coming out to expedite the process. Discover some you might not have heard about in this week's CoWI!
This week is all about SkyPilot: a project from Berkeley that makes it easy to launch compute jobs across heterogeneous cloud resources: bare metal k8s, AWS, GCP, Azure, …
A fine-tuning and a serving example should get you underway! h/t
@skypilot_org
This project lets you expose your local codebase (vectordb-indexed) to ChatGPT's GPT-4 through `localhost` ChatGPT plug-ins.
Try it out & study the code. This is a neat project!
@loladotdev
👏
‼️ You Don't Need To Depend On Proprietary LLMs! Open Source LLMs are becoming better because of:
- higher quality data;
- fewer bits during training/inference;
- inference sampling optimizations;
- decoding constraints;
- stronger base models;
- combining large and small
@fchollet
Do you believe we can formulate a test that if passed implies we have AGI? Would make it easier to understand what people mean by this AGI thing…
Implication is what
@AndrewYNg
has been saying for a while, focus on the training data. Mentally model transformers as lookup tables with some margin around the samples.
Introducing: BuildAnything
This was so much fun to build! It feels like true magic when using it 🪄
Generate any HTML page or app and see it appear fully functional right in front of you. MIT licensed & on GitHub!
ByteDance presents MegaScale
Scaling Large Language Model Training to More Than 10,000 GPUs
present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than
@skypilot_org
I want to give a shoutout to
@jxnlco
for helping folks with fine-tuning. One of the pesky problems when fine-tuning is generating the right training data traces to improve task performance.
The Instructor library can make your life easy here by
Keeping up or keeping track? The team is working hard nights and weekends. They keep telling us there's more to come, and we believe them. Thanks
@sundeep
for the screen grab!
30k t/s Input. 🔥⚡️ Llama 3, 8b.
The pace of innovation in Large Language Models & ML is truly mind-blowing 🤯 Even as a full-time Machine Learning Engineer I find staying up-to-date to be challenging. 1/5 🧵
The vLLM team
@zhuohan123
@woosuk_k
@simon_mo_
et al. is doing incredible work with the vLLM project. These were their priorities 3 months ago and boy did they deliver: their goals & more.
E.g. prefix caching; especially useful if you have beefy system prompts 🙌🏻
While these headlines can spark the imagination, I'd argue that this has practically nothing to do with reality for nearly all data professionals working on actual data initiatives in their companies. 3/
@malcolmtyson
@nonmayorpete
What eventual margin are you suggesting? Google search has become orders of magnitude cheaper since launch but Alphabet sits at 55% gross margin (granted this is more than search).
Hyper-parameters are tricky to dial in for optimal performance. Luckily we can build on existing benchmarks that match closely with the task we care about for parameter value selection.
To contribute to the community's understanding
@DefinitiveIO
is releasing a highly