Mahesh Sathiamoorthy Profile Banner
Mahesh Sathiamoorthy Profile
Mahesh Sathiamoorthy

@madiator

9,298
Followers
996
Following
435
Media
3,013
Statuses

LLMs and Data. Founder of something new. Discuss about data for LLMs: Personal page: . Ex-GoogleDeepMind

Bay Area
Joined February 2008
Don't wanna be here? Send us removal request.
@madiator
Mahesh Sathiamoorthy
11 months
This makes me sad:
Tweet media one
401
828
7K
@madiator
Mahesh Sathiamoorthy
1 year
Someone wrote article on Transformers from scratch: I mean, really from scratch..
Tweet media one
37
441
2K
@madiator
Mahesh Sathiamoorthy
7 months
Update: I recently left my dream job at Google DeepMind to start something new. It was not an easy decision, given how amazing Google DeepMind has been and how much fun it has been to work with the incredible set of people there. I want to thank all my colleagues, managers, and
95
31
1K
@madiator
Mahesh Sathiamoorthy
1 year
If you are a PhD student, you should check out the book called "How to take smart notes". I wrote a bit about this book and what I have learned about note-taking in .
Tweet media one
12
169
1K
@madiator
Mahesh Sathiamoorthy
10 months
Wow, run LLMs like BitTorrents!
Tweet media one
14
131
812
@madiator
Mahesh Sathiamoorthy
1 year
Transformers optimized for Apple laptops. "..achieve up to 10x faster and 14x lower peak memory consumption compared to baseline implementations." People are already running LLaMA on their M1 laptops. This makes room for even bigger models on *laptops*!
8
134
766
@madiator
Mahesh Sathiamoorthy
1 year
This is a very interesting project worth keeping an eye on. It's not AGI, of course, but points towards that direction (autonomous agents). No wonder, it's been trending on GitHub at number 1 position! Description: "Auto-GPT is an experimental
Tweet media one
13
78
581
@madiator
Mahesh Sathiamoorthy
2 months
gpt-4o overhyped and failing expectations. gemini-1.5-flash underhyped and exceeding expectations.
31
43
568
@madiator
Mahesh Sathiamoorthy
1 year
Paper: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face from MSR. I came across this paper a few weeks back and I can't get it out of my head! It's such a powerful idea that shows us how things will evolve in the future. 🧵⬇️
Tweet media one
11
102
516
@madiator
Mahesh Sathiamoorthy
11 months
@cwizprod1 I have benefited a lot from it, especially when I was new to programming.
6
1
510
@madiator
Mahesh Sathiamoorthy
1 year
Meta released a paper called "A Cookbook of Self-supervised learning" (44 pages content + rest for references). Seems to cover a lot of stuff, from role of data-augmentation to multi-modality to hyperparameters..
Tweet media one
@omarsar0
elvis
1 year
“The Dark Matter of Intelligence” Self-supervised learning (SSL) underpins the recent success of deep learning in areas like language modeling and computer vision. This 70 pages cookbook released by Meta AI and collaborators provides an overview of fundamental techniques and
Tweet media one
7
104
420
3
128
484
@madiator
Mahesh Sathiamoorthy
1 year
Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)
Tweet media one
13
73
479
@madiator
Mahesh Sathiamoorthy
1 year
Me: Read this book that's like 100k tokens and answer a question I have. LLM: sure, let me read the book first and .. here's your answer. Me: That's very good! You are smart! Now answer this other question. LLM: Let me start reading the book from the beginning again.. Me: oh
36
41
448
@madiator
Mahesh Sathiamoorthy
2 years
I am hiring for a Student Researcher position at Google Brain for 2023. If you have experience in LLMs, are interested in doing research on recommender systems, and not graduating soon, please email me (email below). Others, please help refer, or *retweet*! More info below⬇️
10
121
436
@madiator
Mahesh Sathiamoorthy
3 months
PSA: Stanford's "CS25: Transformers United V4" course is available for free to the public: * Thursdays 4:30 - 5:50pm PDT. * Zoom link: [Meeting ID: 999 2215 1759, Password: 123456] More info:
Tweet media one
3
85
427
@madiator
Mahesh Sathiamoorthy
1 year
is actually very good. I have used it a few times and plan to use it often since it is a great tool for learning. Here's why: * Anytime you read something, it is very useful to ask yourself questions about what just read and try to answer it. That will
Tweet media one
14
72
426
@madiator
Mahesh Sathiamoorthy
5 months
In case you thought Perplexity's journey was straightforward and linear.
Tweet media one
6
24
410
@madiator
Mahesh Sathiamoorthy
1 year
ResearchGPT: "An autonomous statistics helper that converts your natural language queries about a data set to insights." Site: Repo: Video: What's cool is that it writes python code, executes it, and
Tweet media one
9
87
375
@madiator
Mahesh Sathiamoorthy
1 year
Nice and comprehensive set of slides on vector search. Library from Meta for the same called Faiss:
Tweet media one
4
84
375
@madiator
Mahesh Sathiamoorthy
11 months
Our team at Google DeepMind is hiring in the area of LLMs + Recommender Systems. Come talk to me if you are at @icmlconf !
Tweet media one
4
19
364
@madiator
Mahesh Sathiamoorthy
1 year
Three year olds think that if they can't see you, you can't see them. 🙈 It makes for hilarious hide and seek games, where they are "hiding" in plain sight of you. It is just that they have closed their eyes, or they can't see you and they think you can't see them. This is
Tweet media one
13
63
354
@madiator
Mahesh Sathiamoorthy
4 months
Very interesting paper from Colin's group: just fine-tune with the few-shot examples via PEFT and it's better than just using ICL (in terms of accuracy and cost). Wondering why it is not widely adopted.
Tweet media one
9
56
322
@madiator
Mahesh Sathiamoorthy
1 year
Cerebras recently released Cerebras-GPT, their own LLMs trained following Chinchilla strategy on Cerebras wafers. These wafers are so different compared to other offerings. Models up to 13B in size are available at . Paper:
Tweet media one
Tweet media two
Tweet media three
7
46
313
@madiator
Mahesh Sathiamoorthy
10 months
Without a doubt this is one of the best books I have read as well. It totally changes how you view and understand the world. I highly recommend it!
@AviSchiffmann
Avi
11 months
Thinking in Systems is the best book I’ve read all year. Feels like I gained 10 IQ points just by opening it. Having clarity over how the world works is essential for anyone trying to do anything worthwhile.
Tweet media one
237
845
12K
3
29
298
@madiator
Mahesh Sathiamoorthy
1 year
Universal Speech Model from Google Research. Impressive that it works for languages that have very few speakers (in the millions). Key idea: "We demonstrate that utilizing a large unlabeled multilingual dataset to pre-train the encoder of our model and
Tweet media one
7
58
289
@madiator
Mahesh Sathiamoorthy
1 year
This looks extremely promising: be able to replace RLHF with a much simpler called supervised learning algorithm called DPO. Paper: Code:
Tweet media one
@archit_sharma97
Archit Sharma
1 year
Ever wondered if the RL in RLHF is really needed? Worried that you might really need to understand how PPO works? Worry no more, Direct Preference Optimization (DPO) allows you to fine-tune LMs directly from preferences via a simple classification loss, no RL required. 🧵 ->
16
134
788
5
74
285
@madiator
Mahesh Sathiamoorthy
1 year
I am baffled how the authors didn't realize that using GPT4 to evaluate GPT4 is a bad idea.
24
12
282
@madiator
Mahesh Sathiamoorthy
1 year
I am hiring a research intern for the summer. If you are interested in the intersection of LLMs and Recommender systems and are graduating, please get in touch with me at nlogn at . Or if you know someone who fits the profile, please share this with them.
5
59
279
@madiator
Mahesh Sathiamoorthy
10 months
This is honestly a very good and well written article on fine-tuning with LLaMA 2:
Tweet media one
2
48
278
@madiator
Mahesh Sathiamoorthy
9 months
I am devastated by this finding.
@OwainEvans_UK
Owain Evans
10 months
Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!
Tweet media one
175
709
4K
27
11
258
@madiator
Mahesh Sathiamoorthy
1 year
Detailed instructions on how to run LLaMA on Macbook M1: It's crazy how one guy (Georgi Gerganov, @ggerganov ) changed the landscape for so many people, by releasing llama.cpp.
6
40
252
@madiator
Mahesh Sathiamoorthy
1 year
Left: me Right: AI progress
10
38
248
@madiator
Mahesh Sathiamoorthy
10 months
Our work "Recommender Systems with Generative Retrieval" got accepted to NeurIPS 😊🎉 Congrats again to my co-authors @shashank_r12 , @_nikhilmehta , @vqctran , @YiTayML , @jonahsamost , @Maciej_Kula , @edchi Latest version at
@madiator
Mahesh Sathiamoorthy
1 year
Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)
Tweet media one
13
73
479
7
31
248
@madiator
Mahesh Sathiamoorthy
8 months
Is it just me? My timeline is only ❤️s.
29
5
239
@madiator
Mahesh Sathiamoorthy
1 year
Another interesting repo: "The system uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks. The main idea behind this system is that it creates tasks based on the result of previous tasks and a predefined objective." Writeup:
Tweet media one
2
41
246
@madiator
Mahesh Sathiamoorthy
1 year
Interesting paper from Meta: Learning to Reason and Memorize with Self-Notes So as the LLM reads the context, it can deviate any time and generate notes for itself. Example from the paper: Given “Alice has the box” and “Alice is at the park” one can
Tweet media one
5
61
235
@madiator
Mahesh Sathiamoorthy
9 months
I am so humbled to get an opportunity to see all the luminaries who inspired me to get into deep learning, all in one place. And I wish Geoffrey a great retirement! So yeah, ~seven years back, I was happily doing distributed storage at Google and impacting jaw-dropping amounts
Tweet media one
@AndrewYNg
Andrew Ng
9 months
Attending ⁦ @geoffreyhinton ⁩’s retirement celebration at Google with old friends. Thank you for everything you’ve done for AI! ⁦ @JeffDean ⁩ ⁦ @quocleix
Tweet media one
70
262
4K
0
8
238
@madiator
Mahesh Sathiamoorthy
6 months
@SchmidhuberAI @DjokerNole Training the weights and training with weights are both useful.
4
8
234
@madiator
Mahesh Sathiamoorthy
6 months
You can now run Mixtral on free Google colab instances:
Tweet media one
3
40
233
@madiator
Mahesh Sathiamoorthy
1 year
LangChain's implementation of AutoGPT:
Tweet media one
1
56
222
@madiator
Mahesh Sathiamoorthy
1 year
Anthropic introduces 100k context length for their Claude model. This is probably going to be expensive. For example, GPT-4's 32k context length costs $1.96. So yeah, the cost can quickly add up if you are not careful. This is why vector databases are going to get more popular
Tweet media one
23
33
219
@madiator
Mahesh Sathiamoorthy
2 years
@daniel_eth When they stop working, they become airpeace.
5
16
213
@madiator
Mahesh Sathiamoorthy
1 year
Came across this repository of instruction-tuning papers. This so far has 71 papers! First paper on this list is the pioneering work from @Swarooprm7 !
Tweet media one
5
49
207
@madiator
Mahesh Sathiamoorthy
1 year
Good article from less than a month back on prompt engineering: It's crazy how much you can do with just prompt engineering. This paper from Stanford is an example:
1
35
205
@madiator
Mahesh Sathiamoorthy
1 year
Paper: This is Toolformer on steroids. But I think the "million" is somewhat misleading: I think they could control a 1M, but they don't do it currently. Do we even have 1M APIs? In fact if there are 1M APIs, how do we know the selection is correct?
Tweet media one
5
30
201
@madiator
Mahesh Sathiamoorthy
1 year
Vint Cerf’s Career Advice for Engineers • “If you really want to do something big, get help, and preferably from people who are smarter than you are.” • “Be humble, because unless you approach things with the understanding that you really don’t know
1
35
203
@madiator
Mahesh Sathiamoorthy
1 year
This paper scales up Toolformer to 1000s of APIs. The novelty here is that they use the Self-Instruct framework to generate finetuning data (for LLaMA). What is exciting here is that this can help automate a lot of mundane tasks (see the example video).
Tweet media one
@shishirpatil_
Shishir Patil
1 year
📢 Excited to release Gorilla🦍 Gorilla picks from 1000s of APIs to complete user tasks, surpassing even GPT-4! LLMs need to interact with the world through APIs, and Gorilla teaches LLMs APIs. Presenting Gorilla-Spotlight demo🤩 Webpage:
32
207
977
1
39
203
@madiator
Mahesh Sathiamoorthy
1 year
Another cool repo! Multi-GPT: using multiple agents to perform a given task. "Multiple expertGPTs collaborate to perform a task. Each with their own short and long-term memory and the ability to communicate with each other." From @md_rumpf
4
48
192
@madiator
Mahesh Sathiamoorthy
11 months
There is some misconception here: the downward trend started in Jan 2022, way before ChatGPT. Reading through the comments, I see a lot of people express their frustration with SO, due to primarily the toxicity they experienced there. So perhaps it's a bittersweet outcome: we
12
8
182
@madiator
Mahesh Sathiamoorthy
10 months
We need more papers to follow this recipe :) Found this in
Tweet media one
1
22
185
@madiator
Mahesh Sathiamoorthy
1 year
I read the Stanford "Simulacra paper" [1] almost end-to-end. Hopefully I will tweet about it with some comments, but two quick observations: 1. The paper is very well-written. 2. Nowhere do they use the term LLM or even "Foundation model". [1]
3
36
178
@madiator
Mahesh Sathiamoorthy
3 months
Happy to share our survey preprint on using generative models for recommender systems. Awesome collaboration across industry and academia! This is my first paper after GDM. :) Paper:
Tweet media one
@yashardel
Yashar Deldjoo
3 months
📘 New Research Alert📊 "A Review of Modern #RecommenderSystems Using Generative Models (Gen-RecSys)" is online. link: An important milestone in generative information-seeking research. #recsys #generative #llm #evaluation #harm #foundationmodel
2
9
34
3
37
167
@madiator
Mahesh Sathiamoorthy
4 months
So the model tried to train another model, but failed to debug multi-GPU training. Thank God for multi-GPU setups..
Tweet media one
5
13
164
@madiator
Mahesh Sathiamoorthy
11 months
Haha, hope we don't have to do this meme. #LK99
Tweet media one
1
10
151
@madiator
Mahesh Sathiamoorthy
1 year
This looks useful: Use LLM on your Pandas DataFrames to get answers. I think this will truly shine when the DataFrame is quite complex. You can describe a transformation and it will return the transformed DF. Author: @lele_venturi
Tweet media one
2
40
153
@madiator
Mahesh Sathiamoorthy
1 year
Paper: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor So they start with 3 instructions and use GPT3 to generate 64k instructions. They use this to finetune T5 XXL and get pretty good results.
Tweet media one
5
25
153
@madiator
Mahesh Sathiamoorthy
1 year
I want an AI assistant partner when reading books, to whom I can talk and ask questions about the book I am reading, and also it can quiz me.
16
6
145
@madiator
Mahesh Sathiamoorthy
1 year
Glad to call this as my new office. It's quite colorful inside :D
Tweet media one
9
0
145
@madiator
Mahesh Sathiamoorthy
11 months
Our paper won the Best paper award in the Applied Data Science (ADS) category at #KDD2023 ! 🥳 Tagging the awesome co-authors: @jmgilmer , @edchi !
Tweet media one
@madiator
Mahesh Sathiamoorthy
1 year
Happy to share that our #MLOps paper "Improving Training Stability for Multitask Ranking Models in Recommender Systems" got accepted to KDD 2023 🎉 Joint work between Google DeepMind ( @edchi , @jmgilmer and others) and YouTube. Link to code below. 🧵
Tweet media one
3
15
77
9
12
146
@madiator
Mahesh Sathiamoorthy
1 year
Offend a ML Researcher in one tweet.
110
10
135
@madiator
Mahesh Sathiamoorthy
1 year
Congrats @jmgilmer for solving a tough conjecture 🤯! Proud to count you as a collaborator but it's not helping my impostor syndrome 😂
4
10
136
@madiator
Mahesh Sathiamoorthy
1 year
Came across "Parameter-Efficient Fine-Tuning (PEFT)" from Huggingface 🤗. Supports a list of methods to decrease fine-tuning cost, including the popular LoRA method.
Tweet media one
4
37
131
@madiator
Mahesh Sathiamoorthy
5 months
Ok so few days back when I posted this, nobody noticed. And now my timeline is full of groq. Look at this throughput! I think the founders are ex-TPU folks, a great testament to Google engineering. :)
Tweet media one
@madiator
Mahesh Sathiamoorthy
5 months
The latency and throughput of is insanely good. (together seems to claim about 100T/s: )
Tweet media one
0
3
17
7
11
130
@madiator
Mahesh Sathiamoorthy
1 year
Cool paper that distills the rationales from larger LLMs into smaller ones for good model size reduction & get task-specific models. See this: ".. our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task."
Tweet media one
2
40
129
@madiator
Mahesh Sathiamoorthy
10 months
Cool work from Meta + UCSD: LLMs for Compiler Optimization Paper: They train a 7B seq2seq* model from scratch to optimize LLVM assembly code for size and the model is able to reduce the instruction count by 3% on average.
Tweet media one
2
31
129
@madiator
Mahesh Sathiamoorthy
1 year
A LLM trained specifically for Bloomberg financial data. "We plan to release training logs.. " 😂
Tweet media one
2
11
130
@madiator
Mahesh Sathiamoorthy
1 year
Nice article: Prove to yourself that you can do hard things. "The proof you can do hard things is one of the most powerful gifts you can give yourself."
0
22
128
@madiator
Mahesh Sathiamoorthy
2 months
1) what
Tweet media one
31
3
126
@madiator
Mahesh Sathiamoorthy
1 year
Oh haha, our society could collapse in the future 😅
Tweet media one
8
16
125
@madiator
Mahesh Sathiamoorthy
1 year
Very good read if you are getting started with the field (and even if you know this stuff). Covers a lot of good stuff.
1
23
125
@madiator
Mahesh Sathiamoorthy
8 months
Tool use + LLaVA:
Tweet media one
4
27
123
@madiator
Mahesh Sathiamoorthy
1 year
Looks like a new useful dataset for LLM + RecSys:
Tweet media one
0
24
121
@madiator
Mahesh Sathiamoorthy
1 month
Yann is killing it.
Tweet media one
Tweet media two
7
9
123
@madiator
Mahesh Sathiamoorthy
1 year
Head over to to see various links and resources to LLMs (some sections seem to be still under construction). Curating this graph should have taken a lot of time :)
3
33
121
@madiator
Mahesh Sathiamoorthy
11 months
This is probably the reason why Apple had a market cap of 4B USD in 2000 and now is above 3T.
Tweet media one
6
16
119
@madiator
Mahesh Sathiamoorthy
2 months
Here you go. This is what I was talking about earlier. The paper has up to 2048 I think. Use k-shot for high k, get the best long context teacher model, then distill it and profit.
Tweet media one
@madiator
Mahesh Sathiamoorthy
4 months
What's the largest k, for which someone has tried k-shot prompt?
4
0
16
8
20
120
@madiator
Mahesh Sathiamoorthy
1 year
Someone made a chat app out of LLaMA. It obviously is very rough, but kudos to the author for shipping something!
9
17
115
@madiator
Mahesh Sathiamoorthy
4 months
Noam Shazeer has never been wrong, except once. It looks like he wanted to name the architecture in the "attention is all you need" paper as CargoNet instead of Transformers. Good God.
5
4
111
@madiator
Mahesh Sathiamoorthy
11 months
LinkedIn is getting out of hand. I saw someone posting a two paragraph announcement about an online course that they ENROLLED IN.
6
4
112
@madiator
Mahesh Sathiamoorthy
9 months
"Chain-of-Verification Reduces Hallucination in Large Language Models" from Meta. Intuitive method in image:
Tweet media one
2
24
108
@madiator
Mahesh Sathiamoorthy
1 year
Haven't read this yet but looks interesting. "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
Tweet media one
2
24
106
@madiator
Mahesh Sathiamoorthy
6 months
Was completely blown away by @thtrieu_ 's talk a few months ago on this at our team meeting. Trieu worked persistently on this for a few years and designed his own symbolic engine and what not. Insane amount of dedication and hard work and no wonder he "pulled a rabbit out of the
@GoogleDeepMind
Google DeepMind
6 months
Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵
127
1K
4K
1
8
104
@madiator
Mahesh Sathiamoorthy
1 year
Avogadro's constant (6.022 × 10^23) was always something that I thought of as an absurdly high number. It's more than the number of stars in the universe. But now we have the FLOPS of modern LLMs exceeding this number! Amusingly, emergence seems to occur around this value. :)
Tweet media one
9
16
100
@madiator
Mahesh Sathiamoorthy
1 year
Live footage of me trying to keep up with ML stuff
3
9
100
@madiator
Mahesh Sathiamoorthy
1 year
Google Research was, is, and will continue to be an incredible powerhouse of research. Many years ago, when I wasn't even in the Research org, I would go to the annual Google Research conference and have my mind blown all the time; and that's what made me decide to jump into ML.
@JeffDean
Jeff Dean (@🏡)
1 year
Very proud to see the many ways that @GoogleResearch contributed to the many announcements at the recent Google IO event. PaLM 2 in dozens of uses, Imagen, Phenaki, Chirp, MusicLM, flood forecasting, Green Light, fair and inclusive ML work, and more!
12
60
414
0
5
99
@madiator
Mahesh Sathiamoorthy
3 months
Before putting out ads on the capability of your model, please check if they make sense!
Tweet media one
14
2
99
@madiator
Mahesh Sathiamoorthy
1 year
Every now and then I look at this from the PaLM paper and it always blows my mind.
Tweet media one
6
11
97
@madiator
Mahesh Sathiamoorthy
1 year
Hugging Face 🤗 has a newer kind of inference method called "contrastive search" implemented: Based on The demo looks good (see image from the paper).
Tweet media one
1
14
94
@madiator
Mahesh Sathiamoorthy
10 months
Great read on open challenges in LLM research:
@chipro
Chip Huyen
11 months
Open challenges in LLM research The first two challenges, hallucinations and context learning, are probably the most talked about today. I’m the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). Number 5 and number 6, new architectures and
Tweet media one
53
410
2K
1
16
93
@madiator
Mahesh Sathiamoorthy
1 year
Great video from @FelixHill84 on Transformers and why they work well for language modeling. Video:
Tweet media one
@scychan_brains
Stephanie Chan
1 year
Why do transformers work so well? @FelixHill84 explains how the architectural features of transformers correspond to features of language! Alternatively check out his excellent lecture covering similar topics:
1
10
73
2
28
91
@madiator
Mahesh Sathiamoorthy
1 year
Hot take: we should accept more papers (in CS/ML). Let a thousand flowers bloom. The community is smart enough to figure out what is a good paper and what is not, instead of having a handful of people decide that.
18
3
86
@madiator
Mahesh Sathiamoorthy
10 months
Great article covering six papers on Mixture of Experts, of which one is ours 🙂 (DSelect-K with @hazimeh_h , @achowdhery , and others):
@finbarrtimbers
finbarr
10 months
my article about MoE routing layers is out! I took it down to 6 routing papers:
Tweet media one
3
80
438
1
11
80