Aarush Sah Profile Banner
Aarush Sah Profile
Aarush Sah

@AarushSah_

1,812
Followers
305
Following
202
Media
1,132
Statuses

18 | Evals @GroqInc | ex. @NousResearch | Opinions aren't my employer's

Bay
Joined September 2022
Don't wanna be here? Send us removal request.
Pinned Tweet
@AarushSah_
Aarush Sah
3 months
🚨New Benchmark Alert!🚨 Introducing Set-Eval: a novel multimodal benchmark for testing visual reasoning capabilities of large language models. Claude 3.5 Sonnet has a score double that of GPT-4o, and both are below 15%! More details, precise scores, and analysis below: 🧡
Tweet media one
9
14
141
@AarushSah_
Aarush Sah
2 months
Internship got cut a little short. Happy to share that I'm now full-time at @GroqInc - LET'S COOK
Tweet media one
50
16
930
@AarushSah_
Aarush Sah
4 months
GUYS I MET KARPATHY!
Tweet media one
21
2
296
@AarushSah_
Aarush Sah
2 months
Just saying... Ever since I joined Groq a month ago we raised $640,000,000 πŸ€·β€β™‚οΈ
20
2
262
@AarushSah_
Aarush Sah
4 months
Currently at SFO - banger ad from @AnthropicAI ngl
Tweet media one
15
3
244
@AarushSah_
Aarush Sah
4 months
Excited to share that I'll be joining @GroqInc 's Cloud division as an intern. Huge thanks to @GavinSherry , @kraken_9076 , Jon Tait and Joanna Juarez for making this happen. Time to cook.
Tweet media one
25
5
223
@AarushSah_
Aarush Sah
3 months
I'm sitting on an eval that frontier models get 0% on - should I drop it now or wait for 3.5 Opus or GPT-5?
36
2
155
@AarushSah_
Aarush Sah
7 months
Another @naval observation.
Tweet media one
3
7
142
@AarushSah_
Aarush Sah
7 months
The 4 types of luck by @naval
Tweet media one
1
5
97
@AarushSah_
Aarush Sah
3 months
I JUST PUSHED TO PROD FOR THE FIRST TIME EVER LFGGGGGG
9
0
89
@AarushSah_
Aarush Sah
2 months
I turn 18 today! πŸ₯³
28
0
77
@AarushSah_
Aarush Sah
4 months
I graduated high school today. Here's what it taught me about B2B SaaS: πŸ‘‡πŸ§΅
Tweet media one
17
1
77
@AarushSah_
Aarush Sah
6 months
@gaxrav @naval His thesis is that audio is the best for output, text is the best for input. That's exactly what AirChat does.
9
0
75
@AarushSah_
Aarush Sah
3 months
@pixlg1rl YOU MADE THE MACHINE LEARN TO SAY HELLO WORLD LFGG
1
1
72
@AarushSah_
Aarush Sah
4 months
LLaMA 405B is on WhatsApp now?
Tweet media one
4
6
60
@AarushSah_
Aarush Sah
4 months
@nearcyan 's tweets have made it to the Anthropic keynote
Tweet media one
0
0
52
@AarushSah_
Aarush Sah
24 days
oh hey that's me
@sundeep
sunny madra
24 days
Come visit the @GroqInc team at Meta Connect!
Tweet media one
3
6
96
2
0
45
@AarushSah_
Aarush Sah
3 months
Introducing Eris: A Novel Evaluation Framework Using Debate Simulations Eris pits leading AI models against each other in structured debates, assessing reasoning, knowledge, and communication skills simultaneously. 1/ 🧡
Tweet media one
2
5
45
@AarushSah_
Aarush Sah
1 month
The Internship Game and why you shouldn't play it πŸ§΅πŸ‘‡
5
3
43
@AarushSah_
Aarush Sah
7 months
@LangChainAI @alexalbert__ Bro I literally did this a week ago without LangSmith in a single file
@AarushSah_
Aarush Sah
7 months
I hacked together a quick implementation of @alexalbert__ 's prompt engineering workflow! An explanation 🧡: 1/
2
8
67
2
4
43
@AarushSah_
Aarush Sah
4 months
graduating high school tomorrow
9
0
41
@AarushSah_
Aarush Sah
4 months
One of the most underrated things about Claude 3.5 Sonnet is that it knows the Anthropic API. You can ask 3.5 Sonnet to write code to call 3 Opus with no documentation and it'll generate working code
7
0
37
@AarushSah_
Aarush Sah
6 months
Why is my entire feed airchat
6
1
36
@AarushSah_
Aarush Sah
2 months
openai dot com is down
Tweet media one
4
0
32
@AarushSah_
Aarush Sah
4 months
I'm at @aiDotEngineer ! stop by the @GroqInc booth and say hi :))
Tweet media one
3
2
28
@AarushSah_
Aarush Sah
3 months
Two weeks at Groq finished. I need to cook harder So much to do and so much to learn
1
0
24
@AarushSah_
Aarush Sah
3 months
@HrubyOnRails but then it'll end up in the training set :/
1
0
24
@AarushSah_
Aarush Sah
2 months
Learning Go is...an experience
5
0
24
@AarushSah_
Aarush Sah
3 months
Meta just released Llama 3.1 8B, 70B, and 405B, setting new SOTA across the board! Llama 405B beats GPT-4o in all benchmarks except for HumanEval and MMLU_social_sciences. Benchmarks below:
Tweet media one
1
1
23
@AarushSah_
Aarush Sah
2 months
Github is down
Tweet media one
1
3
23
@AarushSah_
Aarush Sah
3 months
Announcing SOTA Tool Use models on Groq! Available in 8B and 70B, these models are state-of-the art, outperforming Claude 3.5 Sonnet in function calling. Try it out on the Groq Console, or download the weights on Huggingface!
Tweet media one
@RickLamers
Rick Lamers
3 months
I’ve been leading a secret project for months … and the word is finally out! πŸ› οΈ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models πŸ”₯ An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including
Tweet media one
74
236
1K
4
2
23
@AarushSah_
Aarush Sah
3 months
Excited to announce that we've partnered with @AIatMeta to bring Llama 3.1 to @GroqInc - any you can try it out RIGHT NOW on our website or via API! We have all three models hosted - 3.1 8B, 70B, and 405B!
Tweet media one
1
4
22
@AarushSah_
Aarush Sah
3 months
@DavidJAlba94 good idea - I'm considering this!
1
0
23
@AarushSah_
Aarush Sah
2 months
We at @GroqInc just released Distil-Whisper, our new English only Speech to Text endpoint. With a speed factor of 240x, it's the new SOTA for speedy STT! check it out at groq dot com!
@ArtificialAnlys
Artificial Analysis
2 months
Groq has just launched their record breaking Distil-Whisper endpoint! With a Speed Factor of 240x, it is the fastest Speech to Text endpoint we have benchmarked. It is also the lowest-cost Speech to Text endpoint we benchmark at $0.33 per 1000 minutes of audio. This means you
Tweet media one
6
46
287
2
2
23
@AarushSah_
Aarush Sah
4 months
@sundeep , GM of GroqCloud, is presenting!
Tweet media one
0
2
22
@AarushSah_
Aarush Sah
3 months
Llama 405B July 23rd
Tweet media one
2
2
22
@AarushSah_
Aarush Sah
1 month
I think a good way to describe working at @GroqInc is building affordable housing for sand aliens
1
1
21
@AarushSah_
Aarush Sah
4 months
@aidan_mclau im writing a blog post on this - it's so good with custom prompts but so trash on the leaderboard
2
0
21
@AarushSah_
Aarush Sah
2 months
Slack needs some unseriousness
Tweet media one
2
0
20
@AarushSah_
Aarush Sah
3 months
Bro isn't just shipping, bro is packaging, labeling, tracking, insuring, and expediting up in here, bro a whole logistics company
@AnthropicAI
Anthropic
3 months
You can now fine-tune Claude 3 Haikuβ€”our fastest and most cost-effective modelβ€”in Amazon Bedrock.
47
151
997
2
1
20
@AarushSah_
Aarush Sah
2 months
I just got called an 'engineer with rizz' and I'm not entirely sure how to feel about that
2
0
19
@AarushSah_
Aarush Sah
4 months
it's unfortunate that the bay and the east coast have two completely different attitudes toward young people. I was talking to a family friend's daughter in NJ and proferred my opinion on the intersection of AI and another field, and this is how the conversation went: Me:
4
0
19
@AarushSah_
Aarush Sah
2 months
went on a walk
Tweet media one
0
0
17
@AarushSah_
Aarush Sah
3 months
❀️❀️❀️
@alexalbert__
Alex Albert
3 months
Good news for @AnthropicAI devs: We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API. Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls.
Tweet media one
161
268
3K
0
1
17
@AarushSah_
Aarush Sah
2 months
Holy cow Mistral Large 2 is SMART
1
1
17
@AarushSah_
Aarush Sah
4 months
@theojaffee what's the 2023 one
1
0
17
@AarushSah_
Aarush Sah
3 months
RESULTS: Claude 3.5 Sonnet: 11.6% GPT-4o: 5.9% Set-Eval stands out from other visual reasoning tasks. While benchmarks like MathVista, AI2D, and MMMU have seen high performance from top models (60-95% accuracy), Set-Eval proves far more challenging, with SOTA being 11.6%. It
Tweet media one
1
0
16
@AarushSah_
Aarush Sah
4 months
I'm at @aiDotEngineer ! would love to chat with anyone who's here
Tweet media one
1
0
16
@AarushSah_
Aarush Sah
4 months
I'm doing s5.
Tweet media one
3
0
16
@AarushSah_
Aarush Sah
4 months
BUMRAH I LOVE YOU
Tweet media one
3
0
15
@AarushSah_
Aarush Sah
6 months
I'm on Airchat btw
Tweet media one
4
0
15
@AarushSah_
Aarush Sah
6 months
Airchat beef is crazy @indstryoutsider @calebsirak
3
0
14
@AarushSah_
Aarush Sah
3 months
Built on the Inspect-AI framework, Set-Eval offers easy reproducibility. The code for generating the dataset, evaluating it, and the dataset itself are all available on Github and Huggingface. Code: Dataset:
1
0
15
@AarushSah_
Aarush Sah
3 months
We aren't getting any new open-source models because everyone is waiting for LLaMA 3 405B to drop. Nobody wants to drop SoTA and get shown up in a few weeks
2
0
15
@AarushSah_
Aarush Sah
2 months
Huge thanks to @GavinSherry , Jon Tait, @kraken_9076 , @omarkilani , Joanna Juarez, @sundeep , @JonathanRoss321 , and so many other cool people for being able to do the things that I love to do ❀️
1
0
15
@AarushSah_
Aarush Sah
17 days
3,100 tokens per second for LLaMA 3.2 1B on @GroqInc
Tweet media one
1
1
15
@AarushSah_
Aarush Sah
1 month
Today, we at @GroqInc released LLaVA 1.5. With this release, developers now have access to three input modalities using Groq models - Audio, Text and Vision. Try LLaVA out on Groq at console dot groq dot com!! (P.S. - we offer a generous free tier πŸ˜‰)
1
0
15
@AarushSah_
Aarush Sah
1 month
Why does this approach work? 1. It's rare. Most students won't do it because it's harder than following tutorials. 2. It's concrete. You're not claiming competence; you're demonstrating it. 3. It's a natural network builder. Interesting work attracts interesting people. 4. It
1
0
15
@AarushSah_
Aarush Sah
3 months
For example, a valid set for this arrangement could be: One Green Empty Oval, Two Purple Empty Squiggles, and Three Red Empty Diamonds.
Tweet media one
1
0
15
@AarushSah_
Aarush Sah
5 months
it was so fun judging the @runpod_io hackathon - shout-out to @LukePiette , @aadillpickle , @trishaprile , @xPolarrr and so many other cool people for working tirelessly to make such a cool event happen :))
2
0
14
@AarushSah_
Aarush Sah
7 months
@victormustar @ClementDelangue GPT-4 weights, code, and data under an Apache-2.0 License
1
0
14
@AarushSah_
Aarush Sah
5 months
@JustinLin610 multimodality :))
1
0
14
@AarushSah_
Aarush Sah
6 months
@cameron_pfiffer Nougat by Meta
1
0
13
@AarushSah_
Aarush Sah
4 months
@karpathy SIGNED MY PHONE CASE
Tweet media one
@AarushSah_
Aarush Sah
4 months
GUYS I MET KARPATHY!
Tweet media one
21
2
296
3
0
14
@AarushSah_
Aarush Sah
1 month
The answer, I think, is disarmingly simple: Make things that matter. "Matter" here doesn't mean passing your data structures class or building a todo app. It means creating something that real engineers find useful or interesting.
2
0
14
@AarushSah_
Aarush Sah
3 months
@lumpenspace the general idea behind each example is the same, so if I share one example I share all
1
0
14
@AarushSah_
Aarush Sah
1 month
The best opportunities in tech, internships included, often come through side doors. They're not so much applied for as created. This isn't chance; it's a consequence of how value is generated in our industry.
1
1
12
@AarushSah_
Aarush Sah
3 months
Mistral releases NeMo - 12B model with 128K context length, built in collaboration with NVIDIA. New SOTA for small models!
Tweet media one
@MistralAI
Mistral AI
3 months
99
233
2K
1
0
13
@AarushSah_
Aarush Sah
4 months
any good grants for ai research out there? I'm running an experiment and need money for @OpenRouterAI credits πŸ’€
5
0
13
@AarushSah_
Aarush Sah
3 months
New eval dropping tomorrow morning Code is cleaned, dataset is ready, evaluation is done and announcement is written You'll like this one - Claude 3.5 Sonnet gets <15%
0
0
13
@AarushSah_
Aarush Sah
6 months
2
1
13
@AarushSah_
Aarush Sah
3 months
First, what are the rules of Set? - 12 cards are laid out - Each card has 4 features: color, shape, number, and shading - A valid set is 3 cards where for each, it's either all the same or all different across the 3 cards - No two cards can be identical The task of the model is
1
0
13
@AarushSah_
Aarush Sah
4 months
this kind of post makes me sick. why is it that racism towards Indians is normalized?
@leonardaisfunE
Leonarda Jonie
4 months
My gym is full of Indian men. Great. Now I have to compete with them for the 10-pound dumbbells.
5K
4K
106K
2
1
10
@AarushSah_
Aarush Sah
6 months
What happened to @TheBlokeAI
2
2
12
@AarushSah_
Aarush Sah
1 month
This could be many things: - A novel tool that simplifies a common development task - An insightful analysis of an emerging technology - Thoughtful contributions to open-source projects The common thread is value creation. Not for a grade or a resume line, but for its own sake.
1
0
12
@AarushSah_
Aarush Sah
4 months
@TommyFalkowski @WolframRvnwlf i made a hacky version of this a few months ago lol
@AarushSah_
Aarush Sah
7 months
Introducing LLM-PCI: a Python script that injects your entire project context into long-context LLMs like Claude Opus. Designed to enhance coding assistance by providing comprehensive project information to the AI. 🧡:
1
1
2
1
1
12
@AarushSah_
Aarush Sah
2 months
Does my Groqpoasting count as DevRel?
4
0
12
@AarushSah_
Aarush Sah
4 months
WHAT THE FUCK IS A KILOMETER πŸ‡ΊπŸ‡²πŸ‡ΊπŸ‡²πŸ‡ΊπŸ‡²πŸ‡ΊπŸ‡²πŸ‡ΊπŸ‡²πŸ‡ΊπŸ‡²πŸ¦…πŸ¦…πŸ¦…πŸ¦…
@Meleern
Meleern
4 months
based Zuck "Happy birthday, America!πŸ‡ΊπŸ‡Έ"
75
102
976
0
0
11
@AarushSah_
Aarush Sah
1 month
The encouraging thing is that this doesn't require years. A few months of focused, meaningful work can outweigh years of coursework in terms of practical value. The key is to find something you care about. It could be systems design, user interfaces, machine learningβ€”anything.
1
0
12
@AarushSah_
Aarush Sah
1 month
So perhaps we're asking the wrong question. Instead of "How do I get an internship?", maybe we should ask, "How do I become the kind of person companies are looking for?"
1
0
12
@AarushSah_
Aarush Sah
3 months
. @omarkilani is a true 100x engineer
4
0
12
@AarushSah_
Aarush Sah
3 months
We are in the calm before the storm. LLaMA 3 405B, Gemini 2.0, and new models from Cohere and OpenAI are all gonna drop one after the other.
1
0
11
@AarushSah_
Aarush Sah
3 months
JOIN THE GROQ DISCORD SERVER OR I WILL BE SAD
@AarushSah_
Aarush Sah
3 months
Announcing SOTA Tool Use models on Groq! Available in 8B and 70B, these models are state-of-the art, outperforming Claude 3.5 Sonnet in function calling. Try it out on the Groq Console, or download the weights on Huggingface!
Tweet media one
4
2
23
1
1
11
@AarushSah_
Aarush Sah
1 month
But perhaps most importantly, it shifts the conversation. You're no longer an applicant, hoping for a chance. You become someone worth talking to. This isn't just theory. Look at interns at top tech companies. Many were known quantities before they applied, because of work they'd
1
0
11
@AarushSah_
Aarush Sah
1 month
Do that consistently, and opportunities may start chasing you. ---
1
0
11
@AarushSah_
Aarush Sah
3 months
Would highly recommend reading this - very interesting implementation of MoA
@KapadiaSoami
Soami Kapadia
3 months
Mixture of Agents on Groq Introducing a fully configurable, Mixture-of-Agents framework powered by @GroqInc using @LangChainAI You can configure your own MoA version using the @streamlit UI through the framework. details + links belowπŸ‘‡πŸ§΅
9
131
785
2
2
11
@AarushSah_
Aarush Sah
3 months
@sam_paech will finetune a model and check before I release 🫑
0
0
11
@AarushSah_
Aarush Sah
3 months
Great application of LLaMA 3 on Groq!
@mrsiipa
maharshi
3 months
pip install smoltex
Tweet media one
12
15
425
1
0
11
@AarushSah_
Aarush Sah
6 months
We should have a teen tech bro gc
2
1
10
@AarushSah_
Aarush Sah
7 months
@shauseth 10 steps ahead of you - 17 and working on B2B SaaS
1
0
9
@AarushSah_
Aarush Sah
3 months
8/12, 10:00 AM PT evals for the next generation of models
1
1
10
@AarushSah_
Aarush Sah
3 months
We've got the Karpathy stamp of approval, folks
@karpathy
Andrej Karpathy
3 months
@JonathanRoss321 This is so cool. Feeling the AGI - you just talk to your computer and it does stuff, instantly. Speed really makes AI so much more pleasing.
28
73
2K
2
0
10
@AarushSah_
Aarush Sah
1 month
It's kinda wild that I work with people who have been coding for longer than I've been alive
1
0
10
@AarushSah_
Aarush Sah
7 months
Loved @alexalbert__ 's presentation at the memory hackathon yesterday! Didn't get a chance to meet you but would love to chat about @AnthropicAI 's tool use and using it in the most effective way
Tweet media one
1
1
10
@AarushSah_
Aarush Sah
1 month
In an environment where many are trying to game the system, there's a surprising edge in simply not playing the game. Instead of trying to check all the right boxes, focus on becoming genuinely, deeply good at things that matter.
1
0
9
@AarushSah_
Aarush Sah
3 months
Acknowledgements Eris was built with a generous grant from @OpenRouterAI and leveraging @weights_biases ' Weave library for monitoring and visualization. Thanks to @DynamicWebPaige for helping refine the idea!
1
0
10
@AarushSah_
Aarush Sah
2 months
"Hey Claude, roast my Twitter with dripping sarcasm in one paragraph"
Tweet media one
3
0
10