Huge day at
@GroqInc
! 🚀
Our world-class engineering team has been relentlessly advancing the field of AI inference.
Today, their hard work pays off as we secure $640M in funding.
Massive kudos to the team! 🫡
AI chip startup Groq raised a $640M Series D led by BlackRock at a $2.8B valuation, up from $1B after raising $300M in 2021, and adds an Intel executive as COO (
@vandermey
/ Bloomberg)
📫 Subscribe:
The
@GroqInc
compiler team are all literal geniuses.
We improved Mixtral 8x7 t/s/u by an entire GPT-4o t/s/u (474 to 585 median) with compiler improvements.
Just getting started. 🫡
We just pushed another optimization to
@MistralAI
Mixtral 8x7 to
@GroqInc
. Users will see a ~20% throughput improvement 🙌. These enhancements are driven by compiler software team’s relentless focus on throughput and latency.
The
@GroqInc
team just shipped some optimizations pushing per user tokens per second higher for
@AIatMeta
Llama 3 70b. Looking forward to seeing what everyone builds this weekend.
AI gateway now supports
@GroqInc
and
@Cohere
! Unleash the full potential of your language model, no matter where you are.
👂 We're all ears - let us know which providers or features you'd like to see next!
I’ve been leading a secret project for months … and the word is finally out!
🛠️ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models 🔥
An open source Tool Use full finetune of Llama 3 that reaches the
#1
position on BFCL beating all other models, including
Are you:
1. A world class software engineer.
2. Obsessed with performance optimization.
3. Into helping build the world's fastest inference engine.
Nice. 🫡
@GroqInc
is hiring distributed systems engineers:
Join us on our quest to 0ms TTFT.
Groq extends its lead and is serving Llama 3 8B at almost 1,200 output tokens/s!
@GroqInc
's Llama 3 8B speed improvements seen in their chat interface we can now confirm are reflected in performance of their API. This represents the fastest language model inference performance
Fast to launch & very fast output speed! Groq has launched their Gemma 2 9B offering and is serving it at ~600 output tokens/s
Gemma 2 9B is worthy alternative to Llama 3 8B and other smaller models. It is particularly attractive for generalist and communication-focused
It really sucks that thousands of software companies and their owners, employees, etc are beholden to the dumbest people imaginable, but that’s where we are I guess.
H.R. 7024 is the easiest win-win-win to come out of Congress in 2 years. Just pass it.
News: Senate Minority Whip Thune says Senate Republicans will block the House-passes tax deal without an opportunity to amend it on the floor or in committee. Says GOP wants changes to child tax credit work requirements
@MingXDynasty
@GroqInc
Awesome.
IMHO, competition is great, and it’s nice to see what the H200 can do at scale.
OpenAI should simply run GPT on the LPU. :)
(The exciting thing about this is that we’re still super early in the LPU performance story.)
@samcraigjohnson
99% of SaaS co’s can get to 10M+ users on a single Scale-A5 from
@OVHcloud_US
for $663/m (maybe get a couple for redundancy, Postgres, etc).
Want to know how Groq can scale to accommodate the growing demand for inference and how the scaling limitations of traditional legacy architectures can be overcome? Tune in on June 5 to find out at our upcoming AMA.
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet.
Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context
Good news: The Child Tax Credit bill is headed to the Senate.
While
@POTUS
and I continue to fight for the full expanded Child Tax Credit, this bill should be passed quickly. President Biden is ready to sign it into law.
Can’t believe this hasn’t been fixed yet.
Everyone just keeps hoping Congress fixes this and we’re already at Dec 13.
Literal death sentence for most small tech companies. Even worse for LLCs and S-Corps as you personally generate highly inflated phantom income that gets taxed.
@brianwilt
My Waymo drove through the smoke and flames of a car that was on fire yesterday. Took like 2 seconds to think/wait for a lull in oncoming traffic and go around. Was awesome.
Wonder if you guys had that in the simulator…
I’ve retired from software… process. No scrum, dds, tdd, stand ups, devops, sre, micro services, retrospectives, pre and post mortems…
Instead, we just build and run software together.
We do use an issue tracker and a good readme.
Everyone posts an eod update to our group
@hive_echo
This hasn’t been fully answered yet because Zuck claims L3 was designed for tool use:
We (well,
@RickLamers
) implemented it ourselves.
Maybe you could work with Rick to see how we can improve our support.
@felixchin1
4o is running at 109 t/s/u on a H200 which is pretty impressive, but we don’t know enough about the model to say if that’s “super fast”, IMHO.
It’s very unlikely to be faster than the same model running on the LPU.
I would be more than happy to spin that up for OAI. :)