Aarush Sah @AarushSah_ Twitter profile | Pikagi

Pikagi

Aarush Sah

@AarushSah_

1,812

Followers

305

Following

202

Media

1,132

Statuses

18 | Evals @GroqInc | ex. @NousResearch | Opinions aren't my employer's

Bay

https://t.co/R83BgUKu1Q

Joined September 2022

Don't wanna be here? Send us removal request.

Pinned Tweet

@AarushSah_

Aarush Sah

3 months

🚨New Benchmark Alert!🚨 Introducing Set-Eval: a novel multimodal benchmark for testing visual reasoning capabilities of large language models. Claude 3.5 Sonnet has a score double that of GPT-4o, and both are below 15%! More details, precise scores, and analysis below: 🧵

Tweet media one

9

14

141

Last Seen Profiles

@TransitRidersEB

@canola_flower75

@jagerchief

@BosDanang73091

@auwroral

@bokeplokalmalam

@stw_pdg

@100_unite

@dea_semok

@the_NaturalDiva

@xhelent

@Tante_Binal69

@bokeplokalmalam

@jandakembangstw

@BOKEPBOCILVIRA3

@HaPhuong1976

@stwmaniax

@jandakembangstw

@_Tammy_pesa_

@stw_pdg

@galery_basah10

@rama_brownies

@yanisapostat

@_Tammy_pesa_

@miyahlmaoo

@ANDREASHAD61119

@bokeplokalmalam

@nagatafuuk68492

@vilens

@Tante_Binal69

@JMarie98458

@BinorRaja

@llluuunnaaa

@bokeplokalmalam

@galery_basah10

@_Tammy_pesa_

@AarushSah_

Aarush Sah

2 months

Internship got cut a little short. Happy to share that I'm now full-time at @GroqInc - LET'S COOK

Tweet media one

50

16

930

@AarushSah_

Aarush Sah

4 months

GUYS I MET KARPATHY!

Tweet media one

21

2

296

@AarushSah_

Aarush Sah

2 months

Just saying... Ever since I joined Groq a month ago we raised $640,000,000 🤷‍♂️

20

2

262

@AarushSah_

Aarush Sah

4 months

Currently at SFO - banger ad from @AnthropicAI ngl

Tweet media one

15

3

244

@AarushSah_

Aarush Sah

4 months

Excited to share that I'll be joining @GroqInc 's Cloud division as an intern. Huge thanks to @GavinSherry , @kraken_9076 , Jon Tait and Joanna Juarez for making this happen. Time to cook.

Tweet media one

25

5

223

@AarushSah_

Aarush Sah

3 months

I'm sitting on an eval that frontier models get 0% on - should I drop it now or wait for 3.5 Opus or GPT-5?

36

2

155

@AarushSah_

Aarush Sah

7 months

Another @naval observation.

Tweet media one

3

7

142

@AarushSah_

Aarush Sah

7 months

The 4 types of luck by @naval

Tweet media one

1

5

97

@AarushSah_

Aarush Sah

3 months

I JUST PUSHED TO PROD FOR THE FIRST TIME EVER LFGGGGGG

9

0

89

@AarushSah_

Aarush Sah

2 months

I turn 18 today! 🥳

28

0

77

@AarushSah_

Aarush Sah

4 months

I graduated high school today. Here's what it taught me about B2B SaaS: 👇🧵

Tweet media one

17

1

77

@AarushSah_

Aarush Sah

6 months

@gaxrav @naval His thesis is that audio is the best for output, text is the best for input. That's exactly what AirChat does.

9

0

75

@AarushSah_

Aarush Sah

3 months

@pixlg1rl YOU MADE THE MACHINE LEARN TO SAY HELLO WORLD LFGG

1

1

72

@AarushSah_

Aarush Sah

7 months

I hacked together a quick implementation of @alexalbert__ 's prompt engineering workflow! An explanation 🧵: 1/

Tweet card media

GitHub - AarushSah/prompt-optimizer: Automates the process of prompt engineering using Anthropic's...

Automates the process of prompt engineering using Anthropic's Claude language model. - AarushSah/prompt-optimizer

2

8

67

@AarushSah_

Aarush Sah

4 months

LLaMA 405B is on WhatsApp now?

Tweet media one

4

6

60

@AarushSah_

Aarush Sah

4 months

@nearcyan 's tweets have made it to the Anthropic keynote

Tweet media one

0

0

52

@AarushSah_

Aarush Sah

24 days

oh hey that's me

@sundeep

sunny madra

24 days

Come visit the @GroqInc team at Meta Connect!

Tweet media one

3

6

96

2

0

45

@AarushSah_

Aarush Sah

3 months

Introducing Eris: A Novel Evaluation Framework Using Debate Simulations Eris pits leading AI models against each other in structured debates, assessing reasoning, knowledge, and communication skills simultaneously. 1/ 🧵

Tweet media one

2

5

45

@AarushSah_

Aarush Sah

1 month

The Internship Game and why you shouldn't play it 🧵👇

5

3

43

@AarushSah_

Aarush Sah

7 months

@LangChainAI @alexalbert__ Bro I literally did this a week ago without LangSmith in a single file

@AarushSah_

Aarush Sah

7 months

I hacked together a quick implementation of @alexalbert__ 's prompt engineering workflow! An explanation 🧵: 1/

2

8

67

2

4

43

@AarushSah_

Aarush Sah

4 months

graduating high school tomorrow

9

0

41

@AarushSah_

Aarush Sah

4 months

One of the most underrated things about Claude 3.5 Sonnet is that it knows the Anthropic API. You can ask 3.5 Sonnet to write code to call 3 Opus with no documentation and it'll generate working code

7

0

37

@AarushSah_

Aarush Sah

6 months

Why is my entire feed airchat

6

1

36

@AarushSah_

Aarush Sah

2 months

openai dot com is down

Tweet media one

4

0

32

@AarushSah_

Aarush Sah

4 months

I'm at @aiDotEngineer ! stop by the @GroqInc booth and say hi :))

Tweet media one

3

2

28

@AarushSah_

Aarush Sah

3 months

Two weeks at Groq finished. I need to cook harder So much to do and so much to learn

1

0

24

@AarushSah_

Aarush Sah

3 months

@HrubyOnRails but then it'll end up in the training set :/

1

0

24

@AarushSah_

Aarush Sah

2 months

Learning Go is...an experience

5

0

24

@AarushSah_

Aarush Sah

3 months

Meta just released Llama 3.1 8B, 70B, and 405B, setting new SOTA across the board! Llama 405B beats GPT-4o in all benchmarks except for HumanEval and MMLU_social_sciences. Benchmarks below:

Tweet media one

1

1

23

@AarushSah_

Aarush Sah

2 months

Github is down

Tweet media one

1

3

23

@AarushSah_

Aarush Sah

3 months

Announcing SOTA Tool Use models on Groq! Available in 8B and 70B, these models are state-of-the art, outperforming Claude 3.5 Sonnet in function calling. Try it out on the Groq Console, or download the weights on Huggingface!

Tweet media one

@RickLamers

Rick Lamers

3 months

I’ve been leading a secret project for months … and the word is finally out! 🛠️ I'm proud to announce the Llama 3 Groq Tool Use 8B and 70B models 🔥 An open source Tool Use full finetune of Llama 3 that reaches the #1 position on BFCL beating all other models, including

Tweet media one

74

236

1K

4

2

23

@AarushSah_

Aarush Sah

3 months

Excited to announce that we've partnered with @AIatMeta to bring Llama 3.1 to @GroqInc - any you can try it out RIGHT NOW on our website or via API! We have all three models hosted - 3.1 8B, 70B, and 405B!

Tweet media one

1

4

22

@AarushSah_

Aarush Sah

3 months

@DavidJAlba94 good idea - I'm considering this!

1

0

23

@AarushSah_

Aarush Sah

2 months

We at @GroqInc just released Distil-Whisper, our new English only Speech to Text endpoint. With a speed factor of 240x, it's the new SOTA for speedy STT! check it out at groq dot com!

@ArtificialAnlys

Artificial Analysis

@ArtificialAnlys

2 months

Groq has just launched their record breaking Distil-Whisper endpoint! With a Speed Factor of 240x, it is the fastest Speech to Text endpoint we have benchmarked. It is also the lowest-cost Speech to Text endpoint we benchmark at $0.33 per 1000 minutes of audio. This means you

Tweet media one

6

46

287

2

2

23

@AarushSah_

Aarush Sah

3 months

Excited to share that @weights_biases Weave now has a native @GroqInc integration! Weave will now automatically capture traces of any LLM calls that use the Groq SDK.

Groq | W&B Weave

Groq is the AI infrastructure company that delivers fast AI inference. The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and...

weave-docs.wandb.ai

0

5

22

@AarushSah_

Aarush Sah

4 months

@sundeep , GM of GroqCloud, is presenting!

Tweet media one

0

2

22

@AarushSah_

Aarush Sah

3 months

Llama 405B July 23rd

Tweet media one

2

2

22

@AarushSah_

Aarush Sah

1 month

I think a good way to describe working at @GroqInc is building affordable housing for sand aliens

1

1

21

@AarushSah_

Aarush Sah

4 months

@aidan_mclau im writing a blog post on this - it's so good with custom prompts but so trash on the leaderboard

2

0

21

@AarushSah_

Aarush Sah

2 months

Slack needs some unseriousness

Tweet media one

2

0

20

@AarushSah_

Aarush Sah

3 months

Bro isn't just shipping, bro is packaging, labeling, tracking, insuring, and expediting up in here, bro a whole logistics company

@AnthropicAI

Anthropic

3 months

You can now fine-tune Claude 3 Haiku—our fastest and most cost-effective model—in Amazon Bedrock.

47

151

997

2

1

20

@AarushSah_

Aarush Sah

2 months

I just got called an 'engineer with rizz' and I'm not entirely sure how to feel about that

2

0

19

@AarushSah_

Aarush Sah

4 months

it's unfortunate that the bay and the east coast have two completely different attitudes toward young people. I was talking to a family friend's daughter in NJ and proferred my opinion on the intersection of AI and another field, and this is how the conversation went: Me:

4

0

19

@AarushSah_

Aarush Sah

2 months

went on a walk

Tweet media one

0

0

17

@AarushSah_

Aarush Sah

3 months

❤️❤️❤️

@alexalbert__

Alex Albert

3 months

Good news for @AnthropicAI devs: We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API. Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls.

Tweet media one

161

268

3K

0

1

17

@AarushSah_

Aarush Sah

2 months

Holy cow Mistral Large 2 is SMART

1

1

17

@AarushSah_

Aarush Sah

4 months

@theojaffee what's the 2023 one

1

0

17

@AarushSah_

Aarush Sah

3 months

RESULTS: Claude 3.5 Sonnet: 11.6% GPT-4o: 5.9% Set-Eval stands out from other visual reasoning tasks. While benchmarks like MathVista, AI2D, and MMMU have seen high performance from top models (60-95% accuracy), Set-Eval proves far more challenging, with SOTA being 11.6%. It

Tweet media one

1

0

16

@AarushSah_

Aarush Sah

4 months

I'm at @aiDotEngineer ! would love to chat with anyone who's here

Tweet media one

1

0

16

@AarushSah_

Aarush Sah

4 months

I'm doing s5.

Tweet media one

3

0

16

@AarushSah_

Aarush Sah

4 months

BUMRAH I LOVE YOU

Tweet media one

3

0

15

@AarushSah_

Aarush Sah

6 months

I'm on Airchat btw

Tweet media one

4

0

15

@AarushSah_

Aarush Sah

6 months

Airchat beef is crazy @indstryoutsider @calebsirak

3

0

14

@AarushSah_

Aarush Sah

3 months

Built on the Inspect-AI framework, Set-Eval offers easy reproducibility. The code for generating the dataset, evaluating it, and the dataset itself are all available on Github and Huggingface. Code: Dataset:

Tweet card media

AarushSah/Set_Eval · Datasets at Hugging Face

1

0

15

@AarushSah_

Aarush Sah

3 months

We aren't getting any new open-source models because everyone is waiting for LLaMA 3 405B to drop. Nobody wants to drop SoTA and get shown up in a few weeks

2

0

15

@AarushSah_

Aarush Sah

2 months

Huge thanks to @GavinSherry , Jon Tait, @kraken_9076 , @omarkilani , Joanna Juarez, @sundeep , @JonathanRoss321 , and so many other cool people for being able to do the things that I love to do ❤️

1

0

15

@AarushSah_

Aarush Sah

17 days

3,100 tokens per second for LLaMA 3.2 1B on @GroqInc

Tweet media one

1

1

15

@AarushSah_

Aarush Sah

1 month

Today, we at @GroqInc released LLaVA 1.5. With this release, developers now have access to three input modalities using Groq models - Audio, Text and Vision. Try LLaVA out on Groq at console dot groq dot com!! (P.S. - we offer a generous free tier 😉)

1

0

15

@AarushSah_

Aarush Sah

1 month

Why does this approach work? 1. It's rare. Most students won't do it because it's harder than following tutorials. 2. It's concrete. You're not claiming competence; you're demonstrating it. 3. It's a natural network builder. Interesting work attracts interesting people. 4. It

1

0

15

@AarushSah_

Aarush Sah

3 months

For example, a valid set for this arrangement could be: One Green Empty Oval, Two Purple Empty Squiggles, and Three Red Empty Diamonds.

Tweet media one

1

0

15

@AarushSah_

Aarush Sah

5 months

it was so fun judging the @runpod_io hackathon - shout-out to @LukePiette , @aadillpickle , @trishaprile , @xPolarrr and so many other cool people for working tirelessly to make such a cool event happen :))

2

0

14

@AarushSah_

Aarush Sah

7 months

@victormustar @ClementDelangue GPT-4 weights, code, and data under an Apache-2.0 License

1

0

14

@AarushSah_

Aarush Sah

5 months

@JustinLin610 multimodality :))

1

0

14

@AarushSah_

Aarush Sah

6 months

@cameron_pfiffer Nougat by Meta

1

0

13

@AarushSah_

Aarush Sah

4 months

@karpathy SIGNED MY PHONE CASE

Tweet media one

@AarushSah_

Aarush Sah

4 months

GUYS I MET KARPATHY!

Tweet media one

21

2

296

3

0

14

@AarushSah_

Aarush Sah

1 month

The answer, I think, is disarmingly simple: Make things that matter. "Matter" here doesn't mean passing your data structures class or building a todo app. It means creating something that real engineers find useful or interesting.

2

0

14

@AarushSah_

Aarush Sah

3 months

@lumpenspace the general idea behind each example is the same, so if I share one example I share all

1

0

14

@AarushSah_

Aarush Sah

1 month

The best opportunities in tech, internships included, often come through side doors. They're not so much applied for as created. This isn't chance; it's a consequence of how value is generated in our industry.

1

1

12

@AarushSah_

Aarush Sah

3 months

Mistral releases NeMo - 12B model with 128K context length, built in collaboration with NVIDIA. New SOTA for small models!

Tweet media one

@MistralAI

Mistral AI

3 months

99

233

2K

1

0

13

@AarushSah_

Aarush Sah

4 months

any good grants for ai research out there? I'm running an experiment and need money for @OpenRouterAI credits 💀

5

0

13

@AarushSah_

Aarush Sah

3 months

New eval dropping tomorrow morning Code is cleaned, dataset is ready, evaluation is done and announcement is written You'll like this one - Claude 3.5 Sonnet gets <15%

0

0

13

@AarushSah_

Aarush Sah

6 months

Peak Airchat @alyriadefi @0xSigil

2

1

13

@AarushSah_

Aarush Sah

3 months

First, what are the rules of Set? - 12 cards are laid out - Each card has 4 features: color, shape, number, and shading - A valid set is 3 cards where for each, it's either all the same or all different across the 3 cards - No two cards can be identical The task of the model is

1

0

13

@AarushSah_

Aarush Sah

4 months

this kind of post makes me sick. why is it that racism towards Indians is normalized?

@leonardaisfunE

Leonarda Jonie

@leonardaisfunE

4 months

My gym is full of Indian men. Great. Now I have to compete with them for the 10-pound dumbbells.

5K

4K

106K

2

1

10

@AarushSah_

Aarush Sah

6 months

What happened to @TheBlokeAI

2

2

12

@AarushSah_

Aarush Sah

1 month

This could be many things: - A novel tool that simplifies a common development task - An insightful analysis of an emerging technology - Thoughtful contributions to open-source projects The common thread is value creation. Not for a grade or a resume line, but for its own sake.

1

0

12

@AarushSah_

Aarush Sah

4 months

@TommyFalkowski @WolframRvnwlf i made a hacky version of this a few months ago lol

@AarushSah_

Aarush Sah

7 months

Introducing LLM-PCI: a Python script that injects your entire project context into long-context LLMs like Claude Opus. Designed to enhance coding assistance by providing comprehensive project information to the AI. 🧵:

1

1

2

1

1

12

@AarushSah_

Aarush Sah

2 months

Does my Groqpoasting count as DevRel?

4

0

12

@AarushSah_

Aarush Sah

4 months

WHAT THE FUCK IS A KILOMETER 🇺🇲🇺🇲🇺🇲🇺🇲🇺🇲🇺🇲🦅🦅🦅🦅

@Meleern

Meleern

4 months

based Zuck "Happy birthday, America!🇺🇸"

75

102

976

0

0

11

@AarushSah_

Aarush Sah

1 month

The encouraging thing is that this doesn't require years. A few months of focused, meaningful work can outweigh years of coursework in terms of practical value. The key is to find something you care about. It could be systems design, user interfaces, machine learning—anything.

1

0

12

@AarushSah_

Aarush Sah

1 month

So perhaps we're asking the wrong question. Instead of "How do I get an internship?", maybe we should ask, "How do I become the kind of person companies are looking for?"

1

0

12

@AarushSah_

Aarush Sah

3 months

. @omarkilani is a true 100x engineer

4

0

12

@AarushSah_

Aarush Sah

3 months

We are in the calm before the storm. LLaMA 3 405B, Gemini 2.0, and new models from Cohere and OpenAI are all gonna drop one after the other.

1

0

11

@AarushSah_

Aarush Sah

3 months

JOIN THE GROQ DISCORD SERVER OR I WILL BE SAD

Tweet card media

Join the GroqCloud Discord Server!

Groq provides the world's fastest AI inference. | 21097 members

@AarushSah_

Aarush Sah

3 months

Announcing SOTA Tool Use models on Groq! Available in 8B and 70B, these models are state-of-the art, outperforming Claude 3.5 Sonnet in function calling. Try it out on the Groq Console, or download the weights on Huggingface!

Tweet media one

4

2

23

1

1

11

@AarushSah_

Aarush Sah

1 month

But perhaps most importantly, it shifts the conversation. You're no longer an applicant, hoping for a chance. You become someone worth talking to. This isn't just theory. Look at interns at top tech companies. Many were known quantities before they applied, because of work they'd

1

0

11

@AarushSah_

Aarush Sah

3 months

@Suhail @GroqInc @GroqInc @GroqInc !

0

0

11

@AarushSah_

Aarush Sah

1 month

Do that consistently, and opportunities may start chasing you. ---

1

0

11

@AarushSah_

Aarush Sah

3 months

Would highly recommend reading this - very interesting implementation of MoA

@KapadiaSoami

Soami Kapadia

3 months

Mixture of Agents on Groq Introducing a fully configurable, Mixture-of-Agents framework powered by @GroqInc using @LangChainAI You can configure your own MoA version using the @streamlit UI through the framework. details + links below👇🧵

9

131

785

2

2

11

@AarushSah_

Aarush Sah

3 months

@sam_paech will finetune a model and check before I release 🫡

0

0

11

@AarushSah_

Aarush Sah

3 months

Great application of LLaMA 3 on Groq!

@mrsiipa

maharshi

3 months

pip install smoltex

Tweet media one

12

15

425

1

0

11

@AarushSah_

Aarush Sah

6 months

We should have a teen tech bro gc

2

1

10

@AarushSah_

Aarush Sah

7 months

@shauseth 10 steps ahead of you - 17 and working on B2B SaaS

1

0

9

@AarushSah_

Aarush Sah

3 months

8/12, 10:00 AM PT evals for the next generation of models

1

1

10

@AarushSah_

Aarush Sah

3 months

We've got the Karpathy stamp of approval, folks

@karpathy

Andrej Karpathy

3 months

@JonathanRoss321 This is so cool. Feeling the AGI - you just talk to your computer and it does stuff, instantly. Speed really makes AI so much more pleasing.

28

73

2K

2

0

10

@AarushSah_

Aarush Sah

1 month

It's kinda wild that I work with people who have been coding for longer than I've been alive

1

0

10

@AarushSah_

Aarush Sah

7 months

Loved @alexalbert__ 's presentation at the memory hackathon yesterday! Didn't get a chance to meet you but would love to chat about @AnthropicAI 's tool use and using it in the most effective way

Tweet media one

1

1

10

@AarushSah_

Aarush Sah

1 month

In an environment where many are trying to game the system, there's a surprising edge in simply not playing the game. Instead of trying to check all the right boxes, focus on becoming genuinely, deeply good at things that matter.

1

0

9

@AarushSah_

Aarush Sah

3 months

Acknowledgements Eris was built with a generous grant from @OpenRouterAI and leveraging @weights_biases ' Weave library for monitoring and visualization. Thanks to @DynamicWebPaige for helping refine the idea!

1

0

10

@AarushSah_

Aarush Sah

2 months

"Hey Claude, roast my Twitter with dripping sarcasm in one paragraph"

Tweet media one

3

0

10