Mahesh Sathiamoorthy @madiator Twitter profile

Last Seen Profiles

@fuckthesys777

@Shyneez_

@Cryto

@IlSorfista

@TopDawgPicksz

@cha_moscic

@carigurusejati

@thomas_flrt

@racing_guys

@novinkycz

@vitucab

@Jubaljr

@laynie_2024

@CGaming_LatAm

@RGendur

@NCSChicago

@glenn_hughes

@MaiDireLegacy

@oceanbrabbs

@agencesdepapa

@pvangellica

@lord_lamba84098

@EscobarLicense

@PARKJIHOONTH_

@Masterof_Kinks

@MinInfra

@jandakembangstw

@yonaka_nemu

@dosas_m3

@MelanieWhi93657

@MrsTylerKSR

@OniGben95231983

@gorand0mm

@pengen_stw

@nickboumerhi

@RSibbitts

Mahesh Sathiamoorthy

@madiator

11 months

This makes me sad:

401

828

7K

Mahesh Sathiamoorthy

@madiator

1 year

Someone wrote article on Transformers from scratch: I mean, really from scratch..

37

441

2K

Mahesh Sathiamoorthy

@madiator

7 months

Update: I recently left my dream job at Google DeepMind to start something new. It was not an easy decision, given how amazing Google DeepMind has been and how much fun it has been to work with the incredible set of people there. I want to thank all my colleagues, managers, and

95

31

1K

Mahesh Sathiamoorthy

@madiator

1 year

If you are a PhD student, you should check out the book called "How to take smart notes". I wrote a bit about this book and what I have learned about note-taking in .

12

169

1K

Mahesh Sathiamoorthy

@madiator

10 months

Wow, run LLMs like BitTorrents!

14

131

812

Mahesh Sathiamoorthy

@madiator

1 year

Transformers optimized for Apple laptops. "..achieve up to 10x faster and 14x lower peak memory consumption compared to baseline implementations." People are already running LLaMA on their M1 laptops. This makes room for even bigger models on *laptops*!

GitHub - apple/ml-ane-transformers: Reference implementation of the Transformer architecture...

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE) - apple/ml-ane-transformers

github.com

8

134

766

Mahesh Sathiamoorthy

@madiator

1 year

Totally worth your time: Andrej Karpathy's State of GPT talk

State of GPT | BRK216HFS

Learn about the training pipeline of GPT assistants like ChatGPT, from tokenization to pretraining, supervised finetuning, and Reinforcement Learning from Hu...

www.youtube.com

8

76

591

Mahesh Sathiamoorthy

@madiator

1 year

This is a very interesting project worth keeping an eye on. It's not AGI, of course, but points towards that direction (autonomous agents). No wonder, it's been trending on GitHub at number 1 position! Description: "Auto-GPT is an experimental

13

78

581

Mahesh Sathiamoorthy

@madiator

2 months

gpt-4o overhyped and failing expectations. gemini-1.5-flash underhyped and exceeding expectations.

31

43

568

Mahesh Sathiamoorthy

@madiator

1 year

Paper: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face from MSR. I came across this paper a few weeks back and I can't get it out of my head! It's such a powerful idea that shows us how things will evolve in the future. 🧵⬇️

11

102

516

Mahesh Sathiamoorthy

@madiator

11 months

@cwizprod1 I have benefited a lot from it, especially when I was new to programming.

6

1

510

Mahesh Sathiamoorthy

@madiator

1 year

Meta released a paper called "A Cookbook of Self-supervised learning" (44 pages content + rest for references). Seems to cover a lot of stuff, from role of data-augmentation to multi-modality to hyperparameters..

elvis

@omarsar0

1 year

“The Dark Matter of Intelligence” Self-supervised learning (SSL) underpins the recent success of deep learning in areas like language modeling and computer vision. This 70 pages cookbook released by Meta AI and collaborators provides an overview of fundamental techniques and

7

104

420

3

128

484

Mahesh Sathiamoorthy

@madiator

1 year

And so as of this Thursday morning, I am part of Google DeepMind.

Announcing Google DeepMind

DeepMind and the Brain team from Google Research will join forces to accelerate progress towards a world in which AI helps solve the biggest challenges facing humanity.

deepmind.google

19

483

Mahesh Sathiamoorthy

@madiator

1 year

Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)

13

73

479

Mahesh Sathiamoorthy

@madiator

1 year

Me: Read this book that's like 100k tokens and answer a question I have. LLM: sure, let me read the book first and .. here's your answer. Me: That's very good! You are smart! Now answer this other question. LLM: Let me start reading the book from the beginning again.. Me: oh

36

41

448

Mahesh Sathiamoorthy

@madiator

2 years

I am hiring for a Student Researcher position at Google Brain for 2023. If you have experience in LLMs, are interested in doing research on recommender systems, and not graduating soon, please email me (email below). Others, please help refer, or *retweet*! More info below⬇️

10

121

436

Mahesh Sathiamoorthy

@madiator

3 months

PSA: Stanford's "CS25: Transformers United V4" course is available for free to the public: * Thursdays 4:30 - 5:50pm PDT. * Zoom link: [Meeting ID: 999 2215 1759, Password: 123456] More info:

3

85

427

Mahesh Sathiamoorthy

@madiator

1 year

is actually very good. I have used it a few times and plan to use it often since it is a great tool for learning. Here's why: * Anytime you read something, it is very useful to ask yourself questions about what just read and try to answer it. That will

14

72

426

Mahesh Sathiamoorthy

@madiator

5 months

In case you thought Perplexity's journey was straightforward and linear.

6

24

410

Mahesh Sathiamoorthy

@madiator

1 year

Some sort of unofficial Bard API :)

GitHub - dsdanielpark/Bard-API: The unofficial python package that returns response of Google Bard...

The unofficial python package that returns response of Google Bard through cookie value. - dsdanielpark/Bard-API

github.com

6

77

401

Mahesh Sathiamoorthy

@madiator

1 year

ResearchGPT: "An autonomous statistics helper that converts your natural language queries about a data set to insights." Site: Repo: Video: What's cool is that it writes python code, executes it, and

9

87

375

Mahesh Sathiamoorthy

@madiator

1 year

Nice and comprehensive set of slides on vector search. Library from Meta for the same called Faiss:

4

84

375

Mahesh Sathiamoorthy

@madiator

11 months

Our team at Google DeepMind is hiring in the area of LLMs + Recommender Systems. Come talk to me if you are at @icmlconf !

4

19

364

Mahesh Sathiamoorthy

@madiator

1 year

Three year olds think that if they can't see you, you can't see them. 🙈 It makes for hilarious hide and seek games, where they are "hiding" in plain sight of you. It is just that they have closed their eyes, or they can't see you and they think you can't see them. This is

13

63

354

Mahesh Sathiamoorthy

@madiator

1 year

Step 1: Install Chrome Canary Step 2: Run Vicuna-7B LLM right on your browser. More info: Repo:

GitHub - mlc-ai/web-llm: High-performance In-browser LLM Inference Engine

High-performance In-browser LLM Inference Engine . Contribute to mlc-ai/web-llm development by creating an account on GitHub.

github.com

7

69

333

Mahesh Sathiamoorthy

@madiator

4 months

Very interesting paper from Colin's group: just fine-tune with the few-shot examples via PEFT and it's better than just using ICL (in terms of accuracy and cost). Wondering why it is not widely adopted.

9

56

322

Mahesh Sathiamoorthy

@madiator

1 year

Cerebras recently released Cerebras-GPT, their own LLMs trained following Chinchilla strategy on Cerebras wafers. These wafers are so different compared to other offerings. Models up to 13B in size are available at . Paper:

7

46

313

Mahesh Sathiamoorthy

@madiator

10 months

Without a doubt this is one of the best books I have read as well. It totally changes how you view and understand the world. I highly recommend it!

Avi

@AviSchiffmann

11 months

Thinking in Systems is the best book I’ve read all year. Feels like I gained 10 IQ points just by opening it. Having clarity over how the world works is essential for anyone trying to do anything worthwhile.

237

845

12K

3

29

298

Mahesh Sathiamoorthy

@madiator

1 year

Universal Speech Model from Google Research. Impressive that it works for languages that have very few speakers (in the millions). Key idea: "We demonstrate that utilizing a large unlabeled multilingual dataset to pre-train the encoder of our model and

7

58

289

Mahesh Sathiamoorthy

@madiator

1 year

This looks extremely promising: be able to replace RLHF with a much simpler called supervised learning algorithm called DPO. Paper: Code:

Archit Sharma

@archit_sharma97

1 year

Ever wondered if the RL in RLHF is really needed? Worried that you might really need to understand how PPO works? Worry no more, Direct Preference Optimization (DPO) allows you to fine-tune LMs directly from preferences via a simple classification loss, no RL required. 🧵 ->

16

134

788

5

74

285

Mahesh Sathiamoorthy

@madiator

1 year

I am baffled how the authors didn't realize that using GPT4 to evaluate GPT4 is a bad idea.

24

12

282

Mahesh Sathiamoorthy

@madiator

1 year

I am hiring a research intern for the summer. If you are interested in the intersection of LLMs and Recommender systems and are graduating, please get in touch with me at nlogn at . Or if you know someone who fits the profile, please share this with them.

5

59

279

Mahesh Sathiamoorthy

@madiator

10 months

This is honestly a very good and well written article on fine-tuning with LLaMA 2:

2

48

278

Mahesh Sathiamoorthy

@madiator

1 year

PaLM-2 is now available though Vertex AI for everyone, along with other APIs.

Generative AI support on Vertex AI generally available | Google Cloud Blog

Google Cloud announces Generative AI support on Vertex AI generally available.

cloud.google.com

3

48

261

Mahesh Sathiamoorthy

@madiator

9 months

I am devastated by this finding.

Owain Evans

@OwainEvans_UK

10 months

Does a language model trained on “A is B” generalize to “B is A”? E.g. When trained only on “George Washington was the first US president”, can models automatically answer “Who was the first US president?” Our new paper shows they cannot!

175

709

4K

27

11

258

Mahesh Sathiamoorthy

@madiator

1 year

Detailed instructions on how to run LLaMA on Macbook M1: It's crazy how one guy (Georgi Gerganov, @ggerganov ) changed the landscape for so many people, by releasing llama.cpp.

6

40

252

Mahesh Sathiamoorthy

@madiator

1 year

Left: me Right: AI progress

10

38

248

Mahesh Sathiamoorthy

@madiator

10 months

Our work "Recommender Systems with Generative Retrieval" got accepted to NeurIPS 😊🎉 Congrats again to my co-authors @shashank_r12 , @_nikhilmehta , @vqctran , @YiTayML , @jonahsamost , @Maciej_Kula , @edchi Latest version at

Recommender Systems with Generative Retrieval

Modern recommender systems perform large-scale retrieval by first embedding queries and item candidates in the same unified space, followed by approximate nearest neighbor search to select top...

arxiv.org

Mahesh Sathiamoorthy

@madiator

1 year

Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)

13

73

479

7

31

248

Mahesh Sathiamoorthy

@madiator

8 months

Is it just me? My timeline is only ❤️s.

29

5

239

Mahesh Sathiamoorthy

@madiator

1 year

Another interesting repo: "The system uses OpenAI and Pinecone APIs to create, prioritize, and execute tasks. The main idea behind this system is that it creates tasks based on the result of previous tasks and a predefined objective." Writeup:

2

41

246

Mahesh Sathiamoorthy

@madiator

1 year

Interesting paper from Meta: Learning to Reason and Memorize with Self-Notes So as the LLM reads the context, it can deviate any time and generate notes for itself. Example from the paper: Given “Alice has the box” and “Alice is at the park” one can

5

61

235

Mahesh Sathiamoorthy

@madiator

9 months

I am so humbled to get an opportunity to see all the luminaries who inspired me to get into deep learning, all in one place. And I wish Geoffrey a great retirement! So yeah, ~seven years back, I was happily doing distributed storage at Google and impacting jaw-dropping amounts

Andrew Ng

@AndrewYNg

9 months

Attending ⁦ @geoffreyhinton ⁩’s retirement celebration at Google with old friends. Thank you for everything you’ve done for AI! ⁦ @JeffDean ⁩ ⁦ @quocleix ⁩

70

262

4K

0

8

238

Mahesh Sathiamoorthy

@madiator

6 months

@SchmidhuberAI @DjokerNole Training the weights and training with weights are both useful.

4

8

234

Mahesh Sathiamoorthy

@madiator

6 months

You can now run Mixtral on free Google colab instances:

3

40

233

Mahesh Sathiamoorthy

@madiator

1 year

LangChain's implementation of AutoGPT:

1

56

222

Mahesh Sathiamoorthy

@madiator

1 year

Anthropic introduces 100k context length for their Claude model. This is probably going to be expensive. For example, GPT-4's 32k context length costs $1.96. So yeah, the cost can quickly add up if you are not careful. This is why vector databases are going to get more popular

23

33

219

Mahesh Sathiamoorthy

@madiator

2 years

@daniel_eth When they stop working, they become airpeace.

5

16

213

Mahesh Sathiamoorthy

@madiator

1 year

Came across this repository of instruction-tuning papers. This so far has 71 papers! First paper on this list is the pioneering work from @Swarooprm7 !

5

49

207

Mahesh Sathiamoorthy

@madiator

1 year

Good article from less than a month back on prompt engineering: It's crazy how much you can do with just prompt engineering. This paper from Stanford is an example:

Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this...

arxiv.org

1

35

205

Mahesh Sathiamoorthy

@madiator

9 months

Computer vision in the wild: a reading list of papers in computer vision focusing on its application in real-world scenarios.

GitHub - Computer-Vision-in-the-Wild/CVinW_Readings: A collection of papers on the topic of...

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)'' - Computer-Vision-in-the-Wild/CVinW_Readings

github.com

2

47

208

Mahesh Sathiamoorthy

@madiator

1 year

Paper: This is Toolformer on steroids. But I think the "million" is somewhat misleading: I think they could control a 1M, but they don't do it currently. Do we even have 1M APIs? In fact if there are 1M APIs, how do we know the selection is correct?

5

30

201

Mahesh Sathiamoorthy

@madiator

1 year

Vint Cerf’s Career Advice for Engineers • “If you really want to do something big, get help, and preferably from people who are smarter than you are.” • “Be humble, because unless you approach things with the understanding that you really don’t know

Vint Cerf’s Career Advice for Engineers

The Internet’s co-creator on humility, collaboration, and cultivating soft skills

spectrum.ieee.org

1

35

203

Mahesh Sathiamoorthy

@madiator

1 year

This paper scales up Toolformer to 1000s of APIs. The novelty here is that they use the Self-Instruct framework to generate finetuning data (for LLaMA). What is exciting here is that this can help automate a lot of mundane tasks (see the example video).

Shishir Patil

@shishirpatil_

1 year

📢 Excited to release Gorilla🦍 Gorilla picks from 1000s of APIs to complete user tasks, surpassing even GPT-4! LLMs need to interact with the world through APIs, and Gorilla teaches LLMs APIs. Presenting Gorilla-Spotlight demo🤩 Webpage:

32

207

977

1

39

203

Mahesh Sathiamoorthy

@madiator

1 year

Another cool repo! Multi-GPT: using multiple agents to perform a given task. "Multiple expertGPTs collaborate to perform a task. Each with their own short and long-term memory and the ability to communicate with each other." From @md_rumpf

GitHub - sidhq/Multi-GPT: An experimental open-source attempt to make GPT-4 fully autonomous.

An experimental open-source attempt to make GPT-4 fully autonomous. - sidhq/Multi-GPT

github.com

4

48

192

Mahesh Sathiamoorthy

@madiator

11 months

There is some misconception here: the downward trend started in Jan 2022, way before ChatGPT. Reading through the comments, I see a lot of people express their frustration with SO, due to primarily the toxicity they experienced there. So perhaps it's a bittersweet outcome: we

12

8

182

Mahesh Sathiamoorthy

@madiator

10 months

We need more papers to follow this recipe :) Found this in

1

22

185

Mahesh Sathiamoorthy

@madiator

1 year

I read the Stanford "Simulacra paper" [1] almost end-to-end. Hopefully I will tweet about it with some comments, but two quick observations: 1. The paper is very well-written. 2. Nowhere do they use the term LLM or even "Foundation model". [1]

Generative Agents: Interactive Simulacra of Human Behavior

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this...

arxiv.org

3

36

178

Mahesh Sathiamoorthy

@madiator

1 year

Nice overview of using LLMs for recommendation tasks Mentions our paper, which I will announce later 😄

Tuning Large Language Models for Recommendation Tasks

Instruction-tuning methods enable open-source Large Language Models (LLMs) usage for building highly effective recommender systems on private data. This article highlights the latest research work on...

blog.reachsumit.com

3

40

170

Mahesh Sathiamoorthy

@madiator

3 months

Happy to share our survey preprint on using generative models for recommender systems. Awesome collaboration across industry and academia! This is my first paper after GDM. :) Paper:

Yashar Deldjoo

@yashardel

3 months

📘 New Research Alert📊 "A Review of Modern #RecommenderSystems Using Generative Models (Gen-RecSys)" is online. link: An important milestone in generative information-seeking research. #recsys #generative #llm #evaluation #harm #foundationmodel

2

9

34

3

37

167

Mahesh Sathiamoorthy

@madiator

4 months

So the model tried to train another model, but failed to debug multi-GPU training. Thank God for multi-GPU setups..

5

13

164

Mahesh Sathiamoorthy

@madiator

11 months

Haha, hope we don't have to do this meme. #LK99

1

10

151

Mahesh Sathiamoorthy

@madiator

1 year

This looks useful: Use LLM on your Pandas DataFrames to get answers. I think this will truly shine when the DataFrame is quite complex. You can describe a transformation and it will return the transformed DF. Author: @lele_venturi

2

40

153

Mahesh Sathiamoorthy

@madiator

1 year

Paper: Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor So they start with 3 instructions and use GPT3 to generate 64k instructions. They use this to finetune T5 XXL and get pretty good results.

5

25

153

Mahesh Sathiamoorthy

@madiator

1 year

I want an AI assistant partner when reading books, to whom I can talk and ask questions about the book I am reading, and also it can quiz me.

16

6

145

Mahesh Sathiamoorthy

@madiator

1 year

Glad to call this as my new office. It's quite colorful inside :D

9

0

145

Mahesh Sathiamoorthy

@madiator

11 months

Our paper won the Best paper award in the Applied Data Science (ADS) category at #KDD2023 ! 🥳 Tagging the awesome co-authors: @jmgilmer , @edchi !

Mahesh Sathiamoorthy

@madiator

1 year

Happy to share that our #MLOps paper "Improving Training Stability for Multitask Ranking Models in Recommender Systems" got accepted to KDD 2023 🎉 Joint work between Google DeepMind ( @edchi , @jmgilmer and others) and YouTube. Link to code below. 🧵

3

15

77

9

12

146

Mahesh Sathiamoorthy

@madiator

1 year

Offend a ML Researcher in one tweet.

110

10

135

Mahesh Sathiamoorthy

@madiator

1 year

Congrats @jmgilmer for solving a tough conjecture 🤯! Proud to count you as a collaborator but it's not helping my impostor syndrome 😂

Long Out of Math, an AI Programmer Cracks a Pure Math Problem | Quanta Magazine

On nights and weekends, Justin Gilmer attacked an old question in pure math using the tools of information theory.

www.quantamagazine.org

4

10

136

Mahesh Sathiamoorthy

@madiator

1 year

Came across "Parameter-Efficient Fine-Tuning (PEFT)" from Huggingface 🤗. Supports a list of methods to decrease fine-tuning cost, including the popular LoRA method.

4

37

131

Mahesh Sathiamoorthy

@madiator

5 months

Ok so few days back when I posted this, nobody noticed. And now my timeline is full of groq. Look at this throughput! I think the founders are ex-TPU folks, a great testament to Google engineering. :)

Mahesh Sathiamoorthy

@madiator

5 months

The latency and throughput of is insanely good. (together seems to claim about 100T/s: )

0

3

17

7

11

130

Mahesh Sathiamoorthy

@madiator

1 year

Cool paper that distills the rationales from larger LLMs into smaller ones for good model size reduction & get task-specific models. See this: ".. our 770M T5 model outperforms the 540B PaLM model using only 80% of available data on a benchmark task."

2

40

129

Mahesh Sathiamoorthy

@madiator

10 months

Cool work from Meta + UCSD: LLMs for Compiler Optimization Paper: They train a 7B seq2seq* model from scratch to optimize LLVM assembly code for size and the model is able to reduce the instruction count by 3% on average.

2

31

129

Mahesh Sathiamoorthy

@madiator

1 year

A LLM trained specifically for Bloomberg financial data. "We plan to release training logs.. " 😂

2

11

130

Mahesh Sathiamoorthy

@madiator

1 year

Nice article: Prove to yourself that you can do hard things. "The proof you can do hard things is one of the most powerful gifts you can give yourself."

Proof You Can Do Hard Things

An Essential Piece to Your Life Resume

blog.nateliason.com

0

22

128

Mahesh Sathiamoorthy

@madiator

2 months

1) what

31

3

126

Mahesh Sathiamoorthy

@madiator

1 year

Oh haha, our society could collapse in the future 😅

8

16

125

Mahesh Sathiamoorthy

@madiator

1 year

Very good read if you are getting started with the field (and even if you know this stuff). Covers a lot of good stuff.

1

23

125

Mahesh Sathiamoorthy

@madiator

8 months

Tool use + LLaVA:

4

27

123

Mahesh Sathiamoorthy

@madiator

1 year

Looks like a new useful dataset for LLM + RecSys:

0

24

121

Mahesh Sathiamoorthy

@madiator

1 month

Yann is killing it.

7

9

123

Mahesh Sathiamoorthy

@madiator

1 year

Head over to to see various links and resources to LLMs (some sections seem to be still under construction). Curating this graph should have taken a lot of time :)

3

33

121

Mahesh Sathiamoorthy

@madiator

11 months

This is probably the reason why Apple had a market cap of 4B USD in 2000 and now is above 3T.

6

16

119

Mahesh Sathiamoorthy

@madiator

2 months

Here you go. This is what I was talking about earlier. The paper has up to 2048 I think. Use k-shot for high k, get the best long context teacher model, then distill it and profit.

Mahesh Sathiamoorthy

@madiator

4 months

What's the largest k, for which someone has tried k-shot prompt?

4

0

16

8

20

120

Mahesh Sathiamoorthy

@madiator

1 year

Someone made a chat app out of LLaMA. It obviously is very rough, but kudos to the author for shipping something!

Deploy AI models in production

Effortlessly serve optimized open source & custom models on the fastest, most reliable model delivery network

www.baseten.co

9

17

115

Mahesh Sathiamoorthy

@madiator

4 months

Noam Shazeer has never been wrong, except once. It looks like he wanted to name the architecture in the "attention is all you need" paper as CargoNet instead of Transformers. Good God.

5

4

111

Mahesh Sathiamoorthy

@madiator

11 months

LinkedIn is getting out of hand. I saw someone posting a two paragraph announcement about an online course that they ENROLLED IN.

6

4

112

Mahesh Sathiamoorthy

@madiator

9 months

"Chain-of-Verification Reduces Hallucination in Large Language Models" from Meta. Intuitive method in image:

2

24

108

Mahesh Sathiamoorthy

@madiator

1 year

Haven't read this yet but looks interesting. "Unlimiformer: Long-Range Transformers with Unlimited Length Input"

2

24

106

Mahesh Sathiamoorthy

@madiator

6 months

Was completely blown away by @thtrieu_ 's talk a few months ago on this at our team meeting. Trieu worked persistently on this for a few years and designed his own symbolic engine and what not. Insane amount of dedication and hard work and no wonder he "pulled a rabbit out of the

Google DeepMind

@GoogleDeepMind

6 months

Introducing AlphaGeometry: an AI system that solves Olympiad geometry problems at a level approaching a human gold-medalist. 📐 It was trained solely on synthetic data and marks a breakthrough for AI in mathematical reasoning. 🧵

127

1K

4K

1

8

104

Mahesh Sathiamoorthy

@madiator

1 year

Avogadro's constant (6.022 × 10^23) was always something that I thought of as an absurdly high number. It's more than the number of stars in the universe. But now we have the FLOPS of modern LLMs exceeding this number! Amusingly, emergence seems to occur around this value. :)

9

16

100

Mahesh Sathiamoorthy

@madiator

1 year

Live footage of me trying to keep up with ML stuff

3

9

100

Mahesh Sathiamoorthy

@madiator

1 year

Google Research was, is, and will continue to be an incredible powerhouse of research. Many years ago, when I wasn't even in the Research org, I would go to the annual Google Research conference and have my mind blown all the time; and that's what made me decide to jump into ML.

Jeff Dean (@🏡)

@JeffDean

1 year

Very proud to see the many ways that @GoogleResearch contributed to the many announcements at the recent Google IO event. PaLM 2 in dozens of uses, Imagen, Phenaki, Chirp, MusicLM, flood forecasting, Green Light, fair and inclusive ML work, and more!

12

60

414

0

5

99

Mahesh Sathiamoorthy

@madiator

3 months

Before putting out ads on the capability of your model, please check if they make sense!

14

2

99

Mahesh Sathiamoorthy

@madiator

1 year

Every now and then I look at this from the PaLM paper and it always blows my mind.

6

11

97

Mahesh Sathiamoorthy

@madiator

1 year

Hugging Face 🤗 has a newer kind of inference method called "contrastive search" implemented: Based on The demo looks good (see image from the paper).

1

14

94

Mahesh Sathiamoorthy

@madiator

10 months

Great read on open challenges in LLM research:

Open challenges in LLM research

Never before in my life had I seen so many smart people working on the same goal: making LLMs better. After talking to many people working in both industry and academia, I noticed the 10 major...

huyenchip.com

Chip Huyen

@chipro

11 months

Open challenges in LLM research The first two challenges, hallucinations and context learning, are probably the most talked about today. I’m the most excited about 3 (multimodality), 5 (new architecture), and 6 (GPU alternatives). Number 5 and number 6, new architectures and

53

410

2K

1

16

93

Mahesh Sathiamoorthy

@madiator

1 year

Great video from @FelixHill84 on Transformers and why they work well for language modeling. Video:

Stephanie Chan

@scychan_brains

1 year

Why do transformers work so well? @FelixHill84 explains how the architectural features of transformers correspond to features of language! Alternatively check out his excellent lecture covering similar topics:

1

10

73

2

28

91

Mahesh Sathiamoorthy

@madiator

1 year

Hot take: we should accept more papers (in CS/ML). Let a thousand flowers bloom. The community is smart enough to figure out what is a good paper and what is not, instead of having a handful of people decide that.

18

3

86

Mahesh Sathiamoorthy

@madiator

1 year

Very nice video from John Schulman about the use of RLHF (at OpenAI).

John Schulman - Reinforcement Learning from Human Feedback: Progress...

EECS Colloquium Wednesday, April 19, 2023Banatao Auditorium5-6p

www.youtube.com

0

17

84

Mahesh Sathiamoorthy

@madiator

10 months

Great article covering six papers on Mixture of Experts, of which one is ours 🙂 (DSelect-K with @hazimeh_h , @achowdhery , and others):

DSelect-k: Differentiable Selection in the Mixture of Experts with...

The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art...

arxiv.org

finbarr

@finbarrtimbers

10 months

my article about MoE routing layers is out! I took it down to 6 routing papers:

3

80

438

1

11

80