Yi Tay @YiTayML Twitter profile

Last Seen Profiles

@ytvsof

@rahef_z

@momo_ne03

@NGhydnt88267

@stw_pdg

@bapuchi1986

@OllieJenkins22

@TeamQuetta

@almtlqah

@Carlene8707

@Unclevuzi

@__eriiiiiii

@MBdyl91082

@allatoona_hs

@ivanducorte_77

@macimasters

@Nermin_s1

@rosewilliamspr

@TJRSaovivo

@rahef_z

@SoulisticAgency

@olle1769

@2c31aaff4552406

@pengchengwei7

@stoner__1114

@dingshuo123

@fir3awn_

@cupidxashl3y

@BigBell06926972

@almtlqah

@animesmoz

@ohsemindd

@AnnaHouduki

@TuRuptis

@jocansel

@DonXxxs

Yi Tay

@YiTayML

5 months

Long overdue but here's a new blogpost on training LLMs in the wilderness from the ground up 😄🧐 In this blog post, I discuss: 1. Experiences in procuring compute & variance in different compute providers. Our biggest finding/surprise is that variance is super high and it's

44

255

2K

Yi Tay

@YiTayML

1 year

New open source Flan-UL2 20B checkpoints :) - Truly open source 😎 No forms! 🤭 Apache license 🔥 - Best OS model on MMLU/Big-Bench hard 🤩 - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:

A New Open Source Flan 20B with UL2 — Yi Tay

Releasing the new open source Flan-UL2 20B model.

www.yitay.net

51

346

2K

Yi Tay

@YiTayML

10 months

It’s been a short 6 months since I left Google Brain and it has been a uniquely challenging yet interesting experience to build everything from the ground up in an entirely new environment (e.g., the wilderness) Today, we’re excited to announce the first version of the

Reka

@RekaAILabs

10 months

We are excited to announce the 1st version of our multimodal assistant, Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution 🪄. Yasa-1 can understand text, images, videos, sounds & more! 🚀 Check out more details below👇

24

159

901

84

140

1K

Yi Tay

@YiTayML

1 year

Hot take 🔥: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models 😀 Take a look at this table from this new InstructEval paper: . Some thoughts/observations: 1.

48

207

1K

Yi Tay

@YiTayML

1 year

Over the past 3.3 years at Google, I have been blessed with so many wonderful friendships and experiences. I have grown so much. However, it’s time to move on to a new adventure! I wrote a blogpost about my wonderful experience here:

Leaving Google Brain — Yi Tay

Documenting my 3.3 years at Google Research and Brain.

www.yitay.net

65

63

992

Yi Tay

@YiTayML

2 years

"Scaling laws vs Model Architectures" from @GoogleAI . Lessons: - Not all arch scale the same way. - Vanilla Transformer does pretty well 😀 - Touching the attention too much is "dangerous". 😔 - Perf at base may not translate to large+ scale. pdf:

18

205

987

Yi Tay

@YiTayML

4 months

It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a

Reka

@RekaAILabs

4 months

Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body

53

244

1K

66

96

960

Yi Tay

@YiTayML

1 year

We’re coming out of stealth with $58M in funding to build generative models and advance AI research at @RekaAILabs 🔥🚀 Language models and their multimodal counterparts are already ubiquitous and massively impactful everywhere. That said, we are still at the beginning of this

94

75

925

Yi Tay

@YiTayML

4 years

Inspired by the dizzying number of efficient Transformers ("x-formers") models that are coming out lately, we wrote a survey paper to organize all this information. Check it out at . Joint work with @m__dehghani @dara_bahri and @metzlerd . @GoogleAI 😀😃

16

266

873

Yi Tay

@YiTayML

2 years

Excited to share our latest work at @GoogleAI on "Transformer Memory as a Differentiable Search Index"! TL;DR? We parameterize a search system with only a single Transformer model 😎. Everything in the corpus is encoded in the model! 🙌 Paper:

10

153

728

Yi Tay

@YiTayML

4 months

not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens

Matt Shumer

@mattshumer_

4 months

The dataset is everything. Great read:

120

585

3K

31

62

663

Yi Tay

@YiTayML

25 days

Decided to start a new blog series about model architectures in the era of LLMs. 😀 Here's part 1 on broader architectures like Transformer Encoders/Encoder-Decoders, PrefixLM and denoising objectives. 😄 A frequently asked question: "The people who worked on language and NLP

5

120

664

Yi Tay

@YiTayML

1 year

So many misconceptions about architectures (esp encoder-decoder vs decoder) partially due to nomenclature being confusing. - EncDec, PrefixLMs, Causal Dec-onlys are *all* autoregressive. Even T5/UL2's objective is autoregressive. - All 3 archs are not that different. People

Yann LeCun

@ylecun

1 year

A survey of LLMs with a practical guide and evolutionary tree. Number of LLMs from Meta = 7 Number of open source LLMs from Meta = 7 The architecture nomenclature for LLMs is somewhat confusing and unfortunate. What's called "encoder only" actually has an encoder and a decoder

75

742

3K

26

91

633

Yi Tay

@YiTayML

2 years

"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." - This has got to be my favorite line written in a paper ever (from ).

7

84

587

Yi Tay

@YiTayML

6 months

We are excited to share Reka Flash ✨, a new state-of-the-art 21B multimodal model that rivals Gemini Pro and GPT 3.5 on key language & vision benchmarks 📈. We've trained this model from scratch and ground zero with a small (but amazingly capable team 🧙‍♂️) and relatively finite

Reka

@RekaAILabs

6 months

Introducing Reka Flash, our efficient and highly capable multimodal language model. Try it at Reka playground 🛝 for free today. 🧵 Thread, blog & links below 👇

13

44

257

52

72

571

Yi Tay

@YiTayML

3 months

New paper from @RekaAILabs 🔥 (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval 😃 This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to

22

86

575

Yi Tay

@YiTayML

1 year

Efficient 64k context window! New long range transformer that uses UL2 objective for long context few shot! 🔥 Glad to advise on this work on ul2 training.

Aran Komatsuzaki

@arankomatsuzaki

1 year

CoLT5: Faster Long-Range Transformers with Conditional Computation Achieves: - stronger performance than LongT5 with much faster training and inference - SOTA on the SCROLLS benchmark - strong gains up to 64k input length

6

110

666

15

69

521

Yi Tay

@YiTayML

3 years

Happy to share our latest paper from @GoogleAI , @DeepMind "ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning" where we scaled multi-task learning to 107 NLP tasks! link :

8

96

520

Yi Tay

@YiTayML

3 years

Excited to share our new work from @GoogleAI and @DeepMind . "Charformer: Fast Character Transformers via Gradient-based Subword Tokenization (paper: )

5

108

513

Yi Tay

@YiTayML

2 years

My wife was planning our upcoming trip to Tokyo and was having a hard time with transport planning. Jokingly, I asked her to ask ChatGPT to do it for us. Within 1 min, ChatGPT found us an optimal & efficient route that we never came across while doing research. Mindblown 🤯

11

34

501

Yi Tay

@YiTayML

2 years

Introducing U-PaLM 540B! @GoogleAI Training PaLM w UL2's mixture-of-denoisers with only 0.1% more compute unlocks: - Much better scaling 📈 - Emergent abilities on BIGBench 😎 - Saving 2x compute (4.4 million TPU hours!) 🔥 - New prompting ability link:

8

87

509

Yi Tay

@YiTayML

10 months

When comparing two models, a common reference point of compute is often used. If you trained a 7b model with 3x the number of tokens/compute to beat a 13b model, did you really beat it? Probably not. 😶 Here's a paper we wrote in 2021 () that I still

The Efficiency Misnomer

Model efficiency is a critical aspect of developing and deploying machine learning models. Inference time and latency directly affect the user experience, and some applications have hard...

arxiv.org

17

73

491

Yi Tay

@YiTayML

1 year

For many years I have always eagerly camped at the "gates of arxiv" to check out cool new stuff/papers. But this has somehow stopped being the case 😐. - If there's a paper important enough to read, it will somehow "intrusively" appear in my face on twitter anyway. - AI research

16

38

472

Yi Tay

@YiTayML

7 months

Promoted to dad 😃

58

1

469

Yi Tay

@YiTayML

1 year

How about pausing all the LLM influencer nonsense on twitter for 6 months! 😃 That, I would sign.

François Chollet

@fchollet

1 year

Personally I'd suggest a 6 month moratorium on people overreacting to LLMs (in either direction)

56

210

2K

18

39

462

Yi Tay

@YiTayML

2 years

Bard announcement! 🔥🎉 We are working hard to bring the best large language models to the world! Stoked and excited to be part of this, i.e., the Bard team.

Sundar Pichai

@sundarpichai

2 years

1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.

741

3K

15K

10

18

429

Yi Tay

@YiTayML

4 months

Our @RekaAILabs Tech Report / Paper is out! 🔥 Tech reports with completely no information are kinda boring so we’re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. 😃 We tried

10

56

415

Yi Tay

@YiTayML

4 months

"To be frontier you first need to be be pareto-frontier". ~ First law of LLM training. 😃

21

40

410

Yi Tay

@YiTayML

1 year

Someone said it finally 🔥. Can you believe a 'leading institution' started this trend? 👀 This is why desperate hype grabbing is so harmful 😦

Aran Komatsuzaki

@arankomatsuzaki

1 year

The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM.

51

260

1K

11

50

404

Yi Tay

@YiTayML

22 days

Working idea but I've noticed a bunch of archetypes of AI researchers & engineers in my career. 😂 Here are some of them: 1. Carry: Hero-level person capable of making unprecedented (alone or in a small group). Either in terms of modeling, infra or making impact in general. Very

18

42

412

Yi Tay

@YiTayML

5 months

Blogpost link: Happy to hear feedback. I may get back to blogging more in general so ideas would also be welcome! 😄

Training great LLMs entirely from ground up in the wilderness as a startup — Yi Tay

Chronicles of training strong LLMs from scratch in the wild

www.yitay.net

12

57

394

Yi Tay

@YiTayML

1 year

Community: Eval for LLMs are broken! Academic benchmarks are not representative of real world performance! 🙅‍♂️. We need better evals! Also the same community: Lets make definitive rankings & leaderboards based on just four zero-shot "LM harness" tasks!🤷‍♂️🤷‍♂️ Not wanting to single

12

43

373

Yi Tay

@YiTayML

1 year

Just a few years ago, research is mostly sorted by "applications". When folks asked what research you're working on, you're expected to say something like "oh I work in question answering" or "sentiment analysis" or something 😅. In fact, all the conference tracks are sorted as

20

43

370

Yi Tay

@YiTayML

2 years

Happy to share that we have updated and published v2 of our "efficient transformer" survey! Major updates: ✅ Expanded our scope to sparse models and added a ton of new models! ✅ Wrote a retrospective post about the advances in the past year. Link:

6

70

366

Yi Tay

@YiTayML

2 years

🚨 New blogpost about my fav language/nlp AI papers of 2022. 10 awesome best papers + 22 interesting papers to read. Check it out! 😎 Hope it's helpful for everyone! 😀

2022 in Review: Top language AI research papers + interesting papers to read — Yi Tay

Here are some of the best language AI / NLP papers of 2022!

www.yitay.net

7

78

365

Yi Tay

@YiTayML

1 year

Singaporean AI researchers are kind of rare. Over the years, many people have asked me if I know other Singaporean AI researchers working on the bleeding age. Here's a thread introducing some of these Singaporean AI researchers who I know of that are doing amazing work! 👇 1)

26

38

325

Yi Tay

@YiTayML

1 year

Happy birthday to me 🎂🥳 I turn 33 this year. 33 is my lucky number so I'm hoping to have an amazing year. 😀

43

4

290

Yi Tay

@YiTayML

2 years

New UL2 model/paper from @GoogleAI ! "Unifying Language Learning Paradigms" ✅ SOTA on 50-ish diverse NLP tasks ✅ Outperforms GPT-3 175B on 0-shot SGLUE ✅ 3 x perf vs T5 XXL (+LM) on 1-Shot XSUM ✅ Open code & 20B Flax checkpoints. Paper:

5

70

293

Yi Tay

@YiTayML

3 years

Excited to share that we have released 170+ pretrained transformer checkpoints of many different shape & sizes as part of our #ICLR2022 paper on "Scaling Transformers Efficiently" 😄. Checkpoints: Paper:

7

40

287

Yi Tay

@YiTayML

5 months

There are two major things that have happened in my life since 1 year ago. The 1st is now I am a co-founder of a globally distributed LLM startup. The 2nd is that I had a baby recently. Here’s a typical day of my life 😂: [3:00pm] Wake up officially. [3:30pm] Stagger to desk and

18

10

285

Yi Tay

@YiTayML

5 months

Hey Singapore government 🇸🇬 if you're interested in LLMs, instead of "Le Model", we can build you "Model La". Just 200 million dollars will do and would be light years ahead of anything you'll be able to train by yourselves. 🤭

25

12

280

Yi Tay

@YiTayML

1 year

Hot take: 2023 is not a good year for being an AI researcher.

Pablo Samuel Castro

@pcastr

1 year

Hot take: 2017-2019 were the golden years for being an AI researcher.

19

7

213

22

15

275

Yi Tay

@YiTayML

1 year

New PaLM API launched! 🔥🔥🔥 Feels amazing to be able to share your work with the world! 😁 Glad to have contributed to this massive team effort by helping to lead architecture and objective improvements for this latest generation of PaLM! 😎

Sundar Pichai

@sundarpichai

1 year

Excited about PaLM API: an easy and safe way for developers to build on top of our language models, and MakerSuite, a tool to jumpstart prototyping - both in private preview today. @googlecloud customers can also access these models + more via Vertex AI.

174

769

4K

19

26

264

Yi Tay

@YiTayML

1 year

"I'm still not completely sure what LangChain does, and at this point I'm too afraid to ask" Same lol.

Jing Yu Koh

@kohjingyu

1 year

@srush_nlp This is how I feel with most LLM wrappers. I'm still not completely sure what LangChain does, and at this point I'm too afraid to ask

10

9

134

27

8

255

Yi Tay

@YiTayML

1 year

i've gotten so many "how do u keep up with research" type of questions over the years. The answer is simple. You don't, you just sign up for an account for twitter and let the algorithm do the work for you. if you don't see the paper on twitter, maybe it's for a good reason. 🤯

11

16

251

Yi Tay

@YiTayML

9 months

Agreed. There's so many opportunities in AI now. It's a pretty suboptimal career choice to do a PhD at the moment. Also, many outstanding AI researchers and hard carry engineers that I know of don't have an AI or CS PhD.

Shashank Shekhar

@sshkhr16

9 months

As PhD applications season draws closer, I have an alternative suggestion for people starting their careers in artificial intelligence/machine learning: Don't Do A PhD in Machine Learning ❌ (or, at least, not right now) 1/4 🧵

36

53

514

16

25

251

Yi Tay

@YiTayML

7 months

Glad to see Google crushing it! I've always maintained that Google is the best at LLMs & AI. It seems like this is not even their best model yet 😂. Congrats to all my friends for this well deserved victory. 😆 Over the past year, I've heard so many bad and grossly wrong takes

lmsys.org

@lmsysorg

7 months

🔥Breaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini

154

624

3K

6

22

253

Yi Tay

@YiTayML

1 year

Flan-T5 crushes all other OSS models here. 😎

Neel Guha

@NeelGuha

1 year

We’re beyond excited to share the first release of LegalBench–a collaboratively constructed open-source benchmark for evaluating legal reasoning in English large language models. 🔗 📜

12

116

422

10

36

250

Yi Tay

@YiTayML

1 year

friendly reminder to everyone that there isn't yet a good & proper systematic blind eval/benchmark of LLMs yet, especially those on real world data/use-cases. if i were in academia this is something i'll work on immediately.

9

25

245

Yi Tay

@YiTayML

1 year

Sharing a piece of work I contributed to while at @GoogleAI : 😄 * a new improved Mc4 corpus (29T char tokens and 107 languages) that gets language sampling right with UniMax sampling. * open source pretrained uMT5 models trained on 1T tokens. * Unimax sampling solves some

Aran Komatsuzaki

@arankomatsuzaki

1 year

UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining Proposes a new sampling method, UniMax, that delivers more uniform coverage of head languages while mitigating overfitting on tail languages Releases: - an improved mC4 consisting of

1

28

117

8

32

242

Yi Tay

@YiTayML

4 months

research is an immensely taxing endeavour. hours spend doing IC work, debugging and what not. a paper is a canvas for researchers to express themselves after all the hard work, at the end of the day. it's my art. at least let me paint the way i want to paint. The reason why i am

Teortaxes▶️

@teortaxesTex

4 months

In retrospect, UL2 was a wild paper. «Yeah so this just sort of cooked, turns out it's more optimal than Chinchilla, idk check it out». This almost farcically casual tone makes me suspect that Yi was thinking in detail about founding Reka at the moment.

1

6

71

7

18

241

Yi Tay

@YiTayML

2 years

Don't retrieve, recite! Introducing Recitation-Augmented Language models "RECITE" from @GoogleAI by @EdwardSun0909 . RECITE is really powerful at knowledge intensive NLP tasks with its new recite-answer paradigm. Check it out here: 1/N

5

39

238

Yi Tay

@YiTayML

1 year

It’s been slightly more than a year since the UL2 paper () was released. Here’s a summary thread of some notable models/research papers that use the UL2 objective for training (aside from the original UL2/Flan-UL2 of course). 🧵 thread below #1 -

UL2: Unifying Language Learning Paradigms

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should...

arxiv.org

6

51

232

Yi Tay

@YiTayML

1 year

🙄🙄🙄 when you just completely give up and blatantly tell people you distill.

Nomic AI

@nomic_ai

1 year

Today we're releasing GPT4All, an assistant-style chatbot distilled from 430k GPT-3.5-Turbo outputs that you can run on your laptop.

44

345

2K

20

16

232

Yi Tay

@YiTayML

2 years

View from my desk at the Google Singapore office! 😀

4

231

Yi Tay

@YiTayML

3 years

Sharing "The Benchmark Lottery" from @GoogleAI & @DeepMind . In this meta-paper (), we examine the challenges of ML benchmarking (e.g., model comparisons) and how it affects long-term progress. 1/

3

43

222

Yi Tay

@YiTayML

3 years

New paper alert! 😀 "Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers" We study scaling laws of Transformers pertaining to both upstream & downstream transfer by pretraining over 200+ T5 models. Paper: @GoogleAI @DeepMind

2

44

203

Yi Tay

@YiTayML

1 year

Hit 10k citations today 😀

25

2

201

Yi Tay

@YiTayML

1 year

ICML accepted to 2 of our papers. Congrats ICML! 🎉

5

4

197

Yi Tay

@YiTayML

1 year

🙋 here's me serving as living proof & live specimen that one can be 8400 miles away from SFO, in a completely non overlapping time zone, and still be right in the middle of all the action. 🔥🤓

François Chollet

@fchollet

1 year

The notion that "if you do AI, you have to be in San Francisco" is narcissistic BS. Easily >90% of the people who are pushing AI forward aren't located in SF. In fact it's likely >95%.

46

67

1K

6

10

198

Yi Tay

@YiTayML

2 years

Check out our recent @GoogleAI "HyperPrompt" paper at #ICML2022 . TL;DR: Hypernetwork-learned task prompts outperform prompt tuning, adapters & hyperformer. pdf: Work led by @YunHeHe17 & @HuaixiuZheng at Google Brain.

1

27

194

Yi Tay

@YiTayML

2 years

Prompt:"write a tweet from Yann but in the style of @GaryMarcus "?

Yann LeCun

@ylecun

2 years

On the highway towards Human-Level AI, Large Language Model is an off-ramp.

271

321

3K

8

7

194

Yi Tay

@YiTayML

2 years

UL2 20B is a language model that trains on multiple objectives and can perform both language modeling & infilling. Nice stuff: 1. Public checkpoints 2. Great at both fine-tuning & few-shot! 3. Works well with chain-of-thought reasoning. Paper:

UL2: Unifying Language Learning Paradigms

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should...

arxiv.org

Google AI

@GoogleAI

2 years

Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at

18

169

715

4

48

191

Yi Tay

@YiTayML

1 year

meta ai = open ai?

9

6

188

Yi Tay

@YiTayML

3 years

Are pre-trained convolutions better than pre-trained Transformers? Check out our recent paper at #ACL2021nlp #NLProc 😀 Joint work with @m__dehghani @_jai_gupta @dara_bahri @VAribandi @pierceqin @metzlerd at @GoogleAI

3

37

188

Yi Tay

@YiTayML

10 months

There are not many human beings in the entire world that have as much holistic full stack LLM experience as this man here. 👇 Tons of wisdom in these slides here by @hwchung27 .

Hyung Won Chung

@hwchung27

10 months

I gave a talk at Seoul National University. I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field. Video: Slides: Trying to summarize the field forced me to think

42

615

3K

2

30

189

Yi Tay

@YiTayML

1 year

Imo the biggest open research area in LLMs now is how to exactly do evaluation correctly. It's tricky as pointed out below 👇

Jeremy Howard

@jeremyphoward

1 year

I've been looking more closely into the evaluation based on human preferences in the draft Open Assistent (OA) paper, and I'm finding it's actually a really interesting case study in how tricky evaluation is... 🧵

8

91

651

5

13

190

Yi Tay

@YiTayML

1 year

Even though @_jasonwei left Brain for OpenAI, did you know that Brain still has @JerryWeiAI , Jason's exceptionally talented brother! 🔥 Research ability clearly runs in the family! Check out this amazing thought-provoking paper led by Jerry Wei:

10

21

187

Yi Tay

@YiTayML

1 month

Recently, I went on my first podcast hosted by @swyx . 😄 It was a fun unfiltered 2 hour long conversation. Could have gone on longer but we got chased out of the studio.. 😅 Talked about a lot of stuff, i.e., reminiscing old stuff at @Google and newer stuff at @RekaAILabs .

Latent.Space

@latentspacepod

1 month

🆕 pod: The Yolo Researcher Metagame with @YiTayML ! OpenAI (ca. GPT4): ~600 people Google Gemini: ~950 coauthors @RekaAILabs : 20 people @sama once speculated on the qualities of "10,000x AI researchers", and more recently @_jasonwei described the "Yolo

5

18

109

6

27

187

Yi Tay

@YiTayML

1 year

Literally what is going on on everyone's minds now when they open twitter.

Christian Wolf

@chriswolfvision

1 year

How can one get the tweets of AI research folks but not the tweets of AI influencers? Asking for a friend.

71

48

923

7

6

183

Yi Tay

@YiTayML

1 year

Expectation: Writes LLM survey hoping to get tons of citations. 😏 Reality: No one writes LLM papers anymore 🥲

Aran Komatsuzaki

@arankomatsuzaki

1 year

A Survey of Large Language Models

11

219

1K

4

12

181

Yi Tay

@YiTayML

1 year

"Then comes FLANv2 — very important, I may have read it more than ten times and suggest just memorizing the entire content". Wow. This is yet another great blogpost by @Francis_YAO_ that is ultra-meta but insightful/useful. More more personal thoughts: - Yes, +100 to "FLAN is

Yao Fu

@Francis_YAO_

1 year

New blog post! ✒️ June 2023, A Stage Review of Instruction Tuning

20

126

511

5

16

178

Yi Tay

@YiTayML

1 year

Bard knows Flan-UL2! Another model that was released just few weeks ago. It's fresh and up to date! 🔥 I can also tell it's conditioning on the blogpost I wrote. 😀 It's also accurate 🤩

5

22

173

Yi Tay

@YiTayML

1 year

🔥🔥🔥 I am super excited and honoured to join forces with @artetxem (along with other amazing people). It's gonna be amazing and incredible journey! 💪✨🤩

Mikel Artetxe

@artetxem

1 year

📢 Life update 📢 After 2.5 wonderful years, I recently left FAIR to start a new adventure. It has been a privilege to be part of such an amazing team—I have learned a lot and had so much fun! I am super excited about what is coming next, and I hope to share more details soon!

6

3

233

12

173

Yi Tay

@YiTayML

1 year

Pretty cool idea! Great to see Flan-T5 (despite being the smallest model here) hold it's ground pretty well 🔥. It even outperforms other LMs like Dolly or StableLM. Also another noteworthy point is that at "compute-match", Flan-T5 3B is equivalent to the cost of a 1.5B

lmsys.org

@lmsysorg

1 year

Evaluating LLMs is notoriously difficult, and academic benchmarks may fail. Inspired by chess and MOBA games, we are taking a new approach by calculating Elo ratings of models with crowdsourced battle data. - Blog: - Leaderboard:

31

276

1K

2

24

171

Yi Tay

@YiTayML

1 year

Happy to share this new work on Generative Retrieval for Recommender Systems in collaboration with Youtube! 💫✨ This paper draws inspiration from our Differentiable Search Index (DSI) paper which pioneered the generative retrieval paradigm for document retrieval. 😎

Mahesh Sathiamoorthy

@madiator

1 year

Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI 🧵 (1/n)

13

72

478

4

15

168

Yi Tay

@YiTayML

2 months

Wow this is a great technical lecture by @hwchung27 . 😄 Really glad someone finally dived deep into that encoder-decoder / decoder discussion! 😄 I think not many people understand the intricacies of this topic, and these days many people don't even know what "input" and

Hyung Won Chung

@hwchung27

2 months

I gave a lecture at @Stanford CS 25. Lecture video: AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself. First step is to identify and understand

25

205

1K

4

20

163

Yi Tay

@YiTayML

1 year

Introducing Vit-22B! Vit-22B is the largest dense Vision transformer ever trained 🔥. It's time for vision to catch up to language in the scaling game! I am excited to find out what other emergent abilities can be found by scaling up vision models!

Mostafa Dehghani

@m__dehghani

1 year

1/ There is a huge headroom for improving capabilities of our vision models and given the lessons we've learned from LLMs, scaling is a promising bet. We are introducing ViT-22B, the largest vision backbone reported to date:

12

134

801

2

19

161

Yi Tay

@YiTayML

1 year

Wow! 😶 Not cool meta! Gotta admit the first thing I looked for in the llama-2 paper was @GuillaumeLample on the author list. PS: Google always retained authors on the papers even after they left the company. Don't be evil!

1

11

157

Yi Tay

@YiTayML

1 year

When you think you found a witty rebuttal to a popular paper only to find out that your ideas have been already scooped by the original paper itself. 😬 one step ahead bro. Disclaimer: I have not read this fancy "mirage" paper in detail but here's an excerpt from the original

Aran Komatsuzaki

@arankomatsuzaki

1 year

Are Emergent Abilities of Large Language Models a Mirage? Presents an alternative explanation for emergent abilities: one can choose a metric which leads to the inference of an emergent ability or another metric which does not.

24

185

973

7

18

157

Yi Tay

@YiTayML

2 years

Happy New Year / New Years Eve! 🥳(depending on where you are) Here's a thread of me reflecting on 2022 and all the research I've contributed to this year and stuff I did. 🧵 quite a long chronological thread below 👇

3

12

155

Yi Tay

@YiTayML

2 years

My first blog post on "emergence, scaling and inductive bias"! This is a medley discussion piece of some of our recent works on Emergence, U-PaLM, scaling laws vs models, CoT vs inverse scaling and more! Check it out:

3

23

154

Yi Tay

@YiTayML

4 years

As a companion to our recent efficient Transformer survey, we designed "Long Range Arena" a new challenging benchmark to help understand and analyze trade-offs between recent efficient Transformer models. Check out our paper at . @GoogleAI @DeepMind

5

39

151

Yi Tay

@YiTayML

4 months

some random paper some guy wrote about scaling laws vs model architectures:

Scaling Laws vs Model Architectures: How does Inductive Bias...

There have been a lot of interest in the scaling properties of Transformer models. However, not much has been done on the front of investigating the effect of scaling properties of different...

arxiv.org

4

11

148

Yi Tay

@YiTayML

1 year

a nice plot that sums up things nicely. my vote is to make this the cover of the "state of AI report" or whatever that people love to make.

Rosanne Liu

@savvyRL

1 year

Fixed.

48

230

1K

7

6

144

Yi Tay

@YiTayML

1 year

a flan-ul2 32b with a new (not-horrible) tokenizer is going to beat all open source models out there today. someone at google should do it.

Maxine (t/prog)

@cephaloform

1 year

it's a shame y'all aren't using ul2-20b more often

6

2

33

5

12

140

Yi Tay

@YiTayML

1 year

I really enjoyed my casual Saturday morning coffee chat with @hwchung27 . Tons of technical wisdom and fun. With respect to life he basically said "extreme experiences are way more valuable even if they are hard" just like how training on only easy examples don't produce

4

10

140

Yi Tay

@YiTayML

2 years

OPT-IML 175B is outperformed by the much smaller Flan-T5 11B. 😶🤔

Owain Evans at ICML Vienna

@OwainEvans_UK

2 years

Meta's new instruction-tuned model vs Google's PaLM (their best published model) and OAI's GPT3.5 models (which power ChatGPT).

5

22

212

4

18

139

Yi Tay

@YiTayML

5 months

Someone did a vibe check comparison of GPT-4, Claude-3, Gemini Advanced, Mistral Large and Reka Flash. I think Reka Flash did pretty well for a 21B model 😃.

12

20

139

Yi Tay

@YiTayML

1 year

Damn 🔥🔥 This is such a big deal! MHA vs MQA has been always hotly debated and I've always felt this was the right "de-risked" way to go about it. Congrats to @michielsdj to the great & impactful work!

Michiel de Jong

@michielsdj

1 year

New paper! Multi-query attention trades quality for speed and requires training a new model. Instead uptrain improved MQ variant from existing multi-head model! Work with Joshua Ainslie, James Lee-Thorp, @_theopompus , Federico Lebron, Sumit Sanghai.

9

65

320

2

17

135

Yi Tay

@YiTayML

1 year

Now this is the type of excellent work that the community needs more of! Everyone cranking out minute-of-fame "model distillation" papers should take a look at this fine exemplar of good science below: 👇 great work @EdwardSun0909

generatorman

@generatorman_ai

1 year

Move over Alpaca, IBM just changed the game for open-source LLMs 💥 Dromedary🐪, their instruction-tuned Llama model, beats Alpaca in performance 𝙬𝙞𝙩𝙝𝙤𝙪𝙩 distilling ChatGPT, and 𝙬𝙞𝙩𝙝𝙤𝙪𝙩 human feedback! How do they do it? 👇 (1/4)🧵

19

250

1K

1

22

133

Yi Tay

@YiTayML

4 months

most solid architecture is the "Noam" architecture. stop calling it a llama or whatever. this is the Noam transformer. (you can call it PaLM architecture too!)

Alexander Visheratin

@visheratin

4 months

The most solid architecture =)

0

2

36

2

11

131

Yi Tay

@YiTayML

1 year

Just realized today I have almost the same number of twitter followers and citations 😂 but missed the moment I had the exact balance. This should have been a few hours ago. Dang!

13

2

131

Yi Tay

@YiTayML

8 months

Congrats to Google and all my friends for this amazing launch!

Sundar Pichai

@sundarpichai

8 months

Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, it’s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultra’s performance exceeds current state-of-the-art results on

996

4K

24K

2

4

128

Yi Tay

@YiTayML

1 year

In the spirit of being very meta here. Here's my personal meta-review of all the leaderboard-ing methodologies. 1. I like the elo ranking based on chatbot arena from @lmsysorg 2. LM harness (e.g., zero-shot PIQA, Hellaswag etc) is the equivalent of "MNIST" for LLMs. Okay-ish

10

11

131

Yi Tay

@YiTayML

1 year

so many problems i don't know where to begin. - yea put sparse and dense models in the same plot with the # params. good job 👏 - i'm sure you know the size of palm-2 and gpt-4. 👀 - fwiw, t5 is still one of the best LM models out there. it started way earlier than 2021. -

Ivan Zhou

@ivanzhouyq

1 year

Proliferation of LLMs. Some highlights: 1. @Google started early in 2021 with LaMDA and FLAN 2. Now @Google , @OpenAI , and Chinese players are actively competing on the top half of chart 3. The bottom of chart is dominated by the open source community, with impressive output speed

10

95

318

6

14

130

Yi Tay

@YiTayML

1 year

The best "can ChatGPT do X" paper I've seen! 💯💯💯

AK

@_akhaliq

1 year

PokemonChat: Auditing ChatGPT for Pokémon Universe Knowledge paper page: probe ChatGPT for its conversational understanding and introduce a conversational framework (protocol) that can be adopted in future studies. The Pok'emon universe serves as an

5

134

591

2

14

129

Yi Tay

@YiTayML

1 year

We are at the point now that model outputs are better than the average human being huh. 🤔

Aran Komatsuzaki

@arankomatsuzaki

1 year

ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection, w/ 20x less cost.

21

152

732

5

15

124

Yi Tay

@YiTayML

2 years

Section 4.4 of is the answer to your question 😃

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural...

arxiv.org

Mitchell Gordon

@MitchellAGordon

2 years

Why aren't any recent LLMs (OPT, PaLM, etc.) using "efficient" architectures (Reformer, Longformer, etc.)? There's 20+ of them, and they've been around since 2020! Are they actually *not* more efficient?

16

33

387

2

12

124

Yi Tay

@YiTayML

1 year

Interesting paper from my ex-colleagues at @GoogleAI led by @vqctran . Generative retrieval (i.e., DSI) is one of the most fun works I've worked on (and pioneered) during my Google career. Also, @vqctran is driving a lot of the agenda that we worked on together back then. He has

Aran Komatsuzaki

@arankomatsuzaki

1 year

How Does Generative Retrieval Scale to Millions of Passages? Finds that the use of synthetic queries as a document representation strategy is the only approach that remained effective as they scaled up the corpus size using MS MARCO passages.

1

48

192

1

18

122