Yi Tay Profile Banner
Yi Tay Profile
Yi Tay

@YiTayML

31,463
Followers
89
Following
104
Media
3,141
Statuses

chief scientist / cofounder @RekaAILabs ๐Ÿซ  | past: research scientist @Google Brain ๐Ÿคฏ | currently learning to be a dad ๐Ÿผ

mixture-of-locations
Joined October 2016
Don't wanna be here? Send us removal request.
@YiTayML
Yi Tay
5 months
Long overdue but here's a new blogpost on training LLMs in the wilderness from the ground up ๐Ÿ˜„๐Ÿง In this blog post, I discuss: 1. Experiences in procuring compute & variance in different compute providers. Our biggest finding/surprise is that variance is super high and it's
Tweet media one
44
255
2K
@YiTayML
Yi Tay
1 year
New open source Flan-UL2 20B checkpoints :) - Truly open source ๐Ÿ˜Ž No forms! ๐Ÿคญ Apache license ๐Ÿ”ฅ - Best OS model on MMLU/Big-Bench hard ๐Ÿคฉ - Better than Flan-T5 XXL & competitive to Flan-PaLM 62B. - Size ceiling of Flan family just got higher! Blog:
51
346
2K
@YiTayML
Yi Tay
10 months
Itโ€™s been a short 6 months since I left Google Brain and it has been a uniquely challenging yet interesting experience to build everything from the ground up in an entirely new environment (e.g., the wilderness) Today, weโ€™re excited to announce the first version of the
@RekaAILabs
Reka
10 months
We are excited to announce the 1st version of our multimodal assistant, Yasa-1, a language assistant with visual and auditory sensors that can take actions via code execution ๐Ÿช„. Yasa-1 can understand text, images, videos, sounds & more! ๐Ÿš€ Check out more details below๐Ÿ‘‡
24
159
901
84
140
1K
@YiTayML
Yi Tay
1 year
Hot take ๐Ÿ”ฅ: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models ๐Ÿ˜€ Take a look at this table from this new InstructEval paper: . Some thoughts/observations: 1.
Tweet media one
48
207
1K
@YiTayML
Yi Tay
1 year
Over the past 3.3 years at Google, I have been blessed with so many wonderful friendships and experiences. I have grown so much. However, itโ€™s time to move on to a new adventure! I wrote a blogpost about my wonderful experience here:
65
63
992
@YiTayML
Yi Tay
2 years
"Scaling laws vs Model Architectures" from @GoogleAI . Lessons: - Not all arch scale the same way. - Vanilla Transformer does pretty well ๐Ÿ˜€ - Touching the attention too much is "dangerous". ๐Ÿ˜” - Perf at base may not translate to large+ scale. pdf:
Tweet media one
18
205
987
@YiTayML
Yi Tay
4 months
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! ๐Ÿ’ช One of the goals weโ€™ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a
@RekaAILabs
Reka
4 months
Meet Reka Core, our best and most capable multimodal language model yet. ๐Ÿ”ฎ Itโ€™s been a busy few months training this model and we are glad to finally ship it! ๐Ÿ’ช Core has a lot of capabilities, and one of them is understanding video --- letโ€™s see what Core thinks of the 3 body
53
244
1K
66
96
960
@YiTayML
Yi Tay
1 year
Weโ€™re coming out of stealth with $58M in funding to build generative models and advance AI research at @RekaAILabs ๐Ÿ”ฅ๐Ÿš€ Language models and their multimodal counterparts are already ubiquitous and massively impactful everywhere. That said, we are still at the beginning of this
Tweet media one
94
75
925
@YiTayML
Yi Tay
4 years
Inspired by the dizzying number of efficient Transformers ("x-formers") models that are coming out lately, we wrote a survey paper to organize all this information. Check it out at . Joint work with @m__dehghani @dara_bahri and @metzlerd . @GoogleAI ๐Ÿ˜€๐Ÿ˜ƒ
Tweet media one
Tweet media two
16
266
873
@YiTayML
Yi Tay
2 years
Excited to share our latest work at @GoogleAI on "Transformer Memory as a Differentiable Search Index"! TL;DR? We parameterize a search system with only a single Transformer model ๐Ÿ˜Ž. Everything in the corpus is encoded in the model! ๐Ÿ™Œ Paper:
Tweet media one
10
153
728
@YiTayML
Yi Tay
4 months
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens
@mattshumer_
Matt Shumer
4 months
The dataset is everything. Great read:
Tweet media one
120
585
3K
31
62
663
@YiTayML
Yi Tay
25 days
Decided to start a new blog series about model architectures in the era of LLMs. ๐Ÿ˜€ Here's part 1 on broader architectures like Transformer Encoders/Encoder-Decoders, PrefixLM and denoising objectives. ๐Ÿ˜„ A frequently asked question: "The people who worked on language and NLP
Tweet media one
5
120
664
@YiTayML
Yi Tay
1 year
So many misconceptions about architectures (esp encoder-decoder vs decoder) partially due to nomenclature being confusing. - EncDec, PrefixLMs, Causal Dec-onlys are *all* autoregressive. Even T5/UL2's objective is autoregressive. - All 3 archs are not that different. People
@ylecun
Yann LeCun
1 year
A survey of LLMs with a practical guide and evolutionary tree. Number of LLMs from Meta = 7 Number of open source LLMs from Meta = 7 The architecture nomenclature for LLMs is somewhat confusing and unfortunate. What's called "encoder only" actually has an encoder and a decoder
Tweet media one
75
742
3K
26
91
633
@YiTayML
Yi Tay
2 years
"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence." - This has got to be my favorite line written in a paper ever (from ).
Tweet media one
7
84
587
@YiTayML
Yi Tay
6 months
We are excited to share Reka Flash โœจ, a new state-of-the-art 21B multimodal model that rivals Gemini Pro and GPT 3.5 on key language & vision benchmarks ๐Ÿ“ˆ. We've trained this model from scratch and ground zero with a small (but amazingly capable team ๐Ÿง™โ€โ™‚๏ธ) and relatively finite
@RekaAILabs
Reka
6 months
Introducing Reka Flash, our efficient and highly capable multimodal language model. Try it at Reka playground ๐Ÿ› for free today. ๐Ÿงต Thread, blog & links below ๐Ÿ‘‡
Tweet media one
13
44
257
52
72
571
@YiTayML
Yi Tay
3 months
New paper from @RekaAILabs ๐Ÿ”ฅ (yes an actual paper). This time we're releasing part of our internal evals which we call Vibe-Eval ๐Ÿ˜ƒ This comprises of a hard set which imo is pretty challenging for frontier models today. The fun part here is that we constructed it by trying to
Tweet media one
22
86
575
@YiTayML
Yi Tay
1 year
Efficient 64k context window! New long range transformer that uses UL2 objective for long context few shot! ๐Ÿ”ฅ Glad to advise on this work on ul2 training.
@arankomatsuzaki
Aran Komatsuzaki
1 year
CoLT5: Faster Long-Range Transformers with Conditional Computation Achieves: - stronger performance than LongT5 with much faster training and inference - SOTA on the SCROLLS benchmark - strong gains up to 64k input length
Tweet media one
6
110
666
15
69
521
@YiTayML
Yi Tay
3 years
Happy to share our latest paper from @GoogleAI , @DeepMind "ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning" where we scaled multi-task learning to 107 NLP tasks! link :
Tweet media one
8
96
520
@YiTayML
Yi Tay
3 years
Excited to share our new work from @GoogleAI and @DeepMind . "Charformer: Fast Character Transformers via Gradient-based Subword Tokenization (paper: )
Tweet media one
Tweet media two
5
108
513
@YiTayML
Yi Tay
2 years
My wife was planning our upcoming trip to Tokyo and was having a hard time with transport planning. Jokingly, I asked her to ask ChatGPT to do it for us. Within 1 min, ChatGPT found us an optimal & efficient route that we never came across while doing research. Mindblown ๐Ÿคฏ
11
34
501
@YiTayML
Yi Tay
2 years
Introducing U-PaLM 540B! @GoogleAI Training PaLM w UL2's mixture-of-denoisers with only 0.1% more compute unlocks: - Much better scaling ๐Ÿ“ˆ - Emergent abilities on BIGBench ๐Ÿ˜Ž - Saving 2x compute (4.4 million TPU hours!) ๐Ÿ”ฅ - New prompting ability link:
Tweet media one
8
87
509
@YiTayML
Yi Tay
10 months
When comparing two models, a common reference point of compute is often used. If you trained a 7b model with 3x the number of tokens/compute to beat a 13b model, did you really beat it? Probably not. ๐Ÿ˜ถ Here's a paper we wrote in 2021 () that I still
17
73
491
@YiTayML
Yi Tay
1 year
For many years I have always eagerly camped at the "gates of arxiv" to check out cool new stuff/papers. But this has somehow stopped being the case ๐Ÿ˜. - If there's a paper important enough to read, it will somehow "intrusively" appear in my face on twitter anyway. - AI research
16
38
472
@YiTayML
Yi Tay
7 months
Promoted to dad ๐Ÿ˜ƒ
58
1
469
@YiTayML
Yi Tay
1 year
How about pausing all the LLM influencer nonsense on twitter for 6 months! ๐Ÿ˜ƒ That, I would sign.
@fchollet
Franรงois Chollet
1 year
Personally I'd suggest a 6 month moratorium on people overreacting to LLMs (in either direction)
56
210
2K
18
39
462
@YiTayML
Yi Tay
2 years
Bard announcement! ๐Ÿ”ฅ๐ŸŽ‰ We are working hard to bring the best large language models to the world! Stoked and excited to be part of this, i.e., the Bard team.
@sundarpichai
Sundar Pichai
2 years
1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational #GoogleAI service powered by LaMDA.
741
3K
15K
10
18
429
@YiTayML
Yi Tay
4 months
Our @RekaAILabs Tech Report / Paper is out! ๐Ÿ”ฅ Tech reports with completely no information are kinda boring so weโ€™re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. ๐Ÿ˜ƒ We tried
Tweet media one
10
56
415
@YiTayML
Yi Tay
4 months
"To be frontier you first need to be be pareto-frontier". ~ First law of LLM training. ๐Ÿ˜ƒ
Tweet media one
21
40
410
@YiTayML
Yi Tay
1 year
Someone said it finally ๐Ÿ”ฅ. Can you believe a 'leading institution' started this trend? ๐Ÿ‘€ This is why desperate hype grabbing is so harmful ๐Ÿ˜ฆ
@arankomatsuzaki
Aran Komatsuzaki
1 year
The False Promise of Imitating Proprietary LLMs Open-sourced LLMs are adept at mimicking ChatGPTโ€™s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM.
Tweet media one
51
260
1K
11
50
404
@YiTayML
Yi Tay
22 days
Working idea but I've noticed a bunch of archetypes of AI researchers & engineers in my career. ๐Ÿ˜‚ Here are some of them: 1. Carry: Hero-level person capable of making unprecedented (alone or in a small group). Either in terms of modeling, infra or making impact in general. Very
18
42
412
@YiTayML
Yi Tay
5 months
Blogpost link: Happy to hear feedback. I may get back to blogging more in general so ideas would also be welcome! ๐Ÿ˜„
12
57
394
@YiTayML
Yi Tay
1 year
Community: Eval for LLMs are broken! Academic benchmarks are not representative of real world performance! ๐Ÿ™…โ€โ™‚๏ธ. We need better evals! Also the same community: Lets make definitive rankings & leaderboards based on just four zero-shot "LM harness" tasks!๐Ÿคทโ€โ™‚๏ธ๐Ÿคทโ€โ™‚๏ธ Not wanting to single
12
43
373
@YiTayML
Yi Tay
1 year
Just a few years ago, research is mostly sorted by "applications". When folks asked what research you're working on, you're expected to say something like "oh I work in question answering" or "sentiment analysis" or something ๐Ÿ˜…. In fact, all the conference tracks are sorted as
20
43
370
@YiTayML
Yi Tay
2 years
Happy to share that we have updated and published v2 of our "efficient transformer" survey! Major updates: โœ… Expanded our scope to sparse models and added a ton of new models! โœ… Wrote a retrospective post about the advances in the past year. Link:
Tweet media one
6
70
366
@YiTayML
Yi Tay
2 years
๐Ÿšจ New blogpost about my fav language/nlp AI papers of 2022. 10 awesome best papers + 22 interesting papers to read. Check it out! ๐Ÿ˜Ž Hope it's helpful for everyone! ๐Ÿ˜€
7
78
365
@YiTayML
Yi Tay
1 year
Singaporean AI researchers are kind of rare. Over the years, many people have asked me if I know other Singaporean AI researchers working on the bleeding age. Here's a thread introducing some of these Singaporean AI researchers who I know of that are doing amazing work! ๐Ÿ‘‡ 1)
26
38
325
@YiTayML
Yi Tay
1 year
Happy birthday to me ๐ŸŽ‚๐Ÿฅณ I turn 33 this year. 33 is my lucky number so I'm hoping to have an amazing year. ๐Ÿ˜€
43
4
290
@YiTayML
Yi Tay
2 years
New UL2 model/paper from @GoogleAI ! "Unifying Language Learning Paradigms" โœ… SOTA on 50-ish diverse NLP tasks โœ… Outperforms GPT-3 175B on 0-shot SGLUE โœ… 3 x perf vs T5 XXL (+LM) on 1-Shot XSUM โœ… Open code & 20B Flax checkpoints. Paper:
Tweet media one
5
70
293
@YiTayML
Yi Tay
3 years
Excited to share that we have released 170+ pretrained transformer checkpoints of many different shape & sizes as part of our #ICLR2022 paper on "Scaling Transformers Efficiently" ๐Ÿ˜„. Checkpoints: Paper:
Tweet media one
7
40
287
@YiTayML
Yi Tay
5 months
There are two major things that have happened in my life since 1 year ago. The 1st is now I am a co-founder of a globally distributed LLM startup. The 2nd is that I had a baby recently. Hereโ€™s a typical day of my life ๐Ÿ˜‚: [3:00pm] Wake up officially. [3:30pm] Stagger to desk and
18
10
285
@YiTayML
Yi Tay
5 months
Hey Singapore government ๐Ÿ‡ธ๐Ÿ‡ฌ if you're interested in LLMs, instead of "Le Model", we can build you "Model La". Just 200 million dollars will do and would be light years ahead of anything you'll be able to train by yourselves. ๐Ÿคญ
25
12
280
@YiTayML
Yi Tay
1 year
Hot take: 2023 is not a good year for being an AI researcher.
@pcastr
Pablo Samuel Castro
1 year
Hot take: 2017-2019 were the golden years for being an AI researcher.
19
7
213
22
15
275
@YiTayML
Yi Tay
1 year
New PaLM API launched! ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ Feels amazing to be able to share your work with the world! ๐Ÿ˜ Glad to have contributed to this massive team effort by helping to lead architecture and objective improvements for this latest generation of PaLM! ๐Ÿ˜Ž
@sundarpichai
Sundar Pichai
1 year
Excited about PaLM API: an easy and safe way for developers to build on top of our language models, and MakerSuite, a tool to jumpstart prototyping - both in private preview today. @googlecloud customers can also access these models + more via Vertex AI.
174
769
4K
19
26
264
@YiTayML
Yi Tay
1 year
"I'm still not completely sure what LangChain does, and at this point I'm too afraid to ask" Same lol.
@kohjingyu
Jing Yu Koh
1 year
@srush_nlp This is how I feel with most LLM wrappers. I'm still not completely sure what LangChain does, and at this point I'm too afraid to ask
10
9
134
27
8
255
@YiTayML
Yi Tay
1 year
i've gotten so many "how do u keep up with research" type of questions over the years. The answer is simple. You don't, you just sign up for an account for twitter and let the algorithm do the work for you. if you don't see the paper on twitter, maybe it's for a good reason. ๐Ÿคฏ
11
16
251
@YiTayML
Yi Tay
9 months
Agreed. There's so many opportunities in AI now. It's a pretty suboptimal career choice to do a PhD at the moment. Also, many outstanding AI researchers and hard carry engineers that I know of don't have an AI or CS PhD.
@sshkhr16
Shashank Shekhar
9 months
As PhD applications season draws closer, I have an alternative suggestion for people starting their careers in artificial intelligence/machine learning: Don't Do A PhD in Machine Learning โŒ (or, at least, not right now) 1/4 ๐Ÿงต
36
53
514
16
25
251
@YiTayML
Yi Tay
7 months
Glad to see Google crushing it! I've always maintained that Google is the best at LLMs & AI. It seems like this is not even their best model yet ๐Ÿ˜‚. Congrats to all my friends for this well deserved victory. ๐Ÿ˜† Over the past year, I've heard so many bad and grossly wrong takes
@lmsysorg
lmsys.org
7 months
๐Ÿ”ฅBreaking News from Arena Google's Bard has just made a stunning leap, surpassing GPT-4 to the SECOND SPOT on the leaderboard! Big congrats to @Google for the remarkable achievement! The race is heating up like never before! Super excited to see what's next for Bard + Gemini
Tweet media one
154
624
3K
6
22
253
@YiTayML
Yi Tay
1 year
Flan-T5 crushes all other OSS models here. ๐Ÿ˜Ž
Tweet media one
@NeelGuha
Neel Guha
1 year
Weโ€™re beyond excited to share the first release of LegalBenchโ€“a collaboratively constructed open-source benchmark for evaluating legal reasoning in English large language models. ๐Ÿ”— ๐Ÿ“œ
Tweet media one
12
116
422
10
36
250
@YiTayML
Yi Tay
1 year
friendly reminder to everyone that there isn't yet a good & proper systematic blind eval/benchmark of LLMs yet, especially those on real world data/use-cases. if i were in academia this is something i'll work on immediately.
9
25
245
@YiTayML
Yi Tay
1 year
Sharing a piece of work I contributed to while at @GoogleAI : ๐Ÿ˜„ * a new improved Mc4 corpus (29T char tokens and 107 languages) that gets language sampling right with UniMax sampling. * open source pretrained uMT5 models trained on 1T tokens. * Unimax sampling solves some
@arankomatsuzaki
Aran Komatsuzaki
1 year
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining Proposes a new sampling method, UniMax, that delivers more uniform coverage of head languages while mitigating overfitting on tail languages Releases: - an improved mC4 consisting of
Tweet media one
1
28
117
8
32
242
@YiTayML
Yi Tay
4 months
research is an immensely taxing endeavour. hours spend doing IC work, debugging and what not. a paper is a canvas for researchers to express themselves after all the hard work, at the end of the day. it's my art. at least let me paint the way i want to paint. The reason why i am
@teortaxesTex
Teortaxesโ–ถ๏ธ
4 months
In retrospect, UL2 was a wild paper. ยซYeah so this just sort of cooked, turns out it's more optimal than Chinchilla, idk check it outยป. This almost farcically casual tone makes me suspect that Yi was thinking in detail about founding Reka at the moment.
Tweet media one
Tweet media two
1
6
71
7
18
241
@YiTayML
Yi Tay
2 years
Don't retrieve, recite! Introducing Recitation-Augmented Language models "RECITE" from @GoogleAI by @EdwardSun0909 . RECITE is really powerful at knowledge intensive NLP tasks with its new recite-answer paradigm. Check it out here: 1/N
Tweet media one
5
39
238
@YiTayML
Yi Tay
1 year
Itโ€™s been slightly more than a year since the UL2 paper () was released. Hereโ€™s a summary thread of some notable models/research papers that use the UL2 objective for training (aside from the original UL2/Flan-UL2 of course). ๐Ÿงต thread below #1 -
6
51
232
@YiTayML
Yi Tay
1 year
๐Ÿ™„๐Ÿ™„๐Ÿ™„ when you just completely give up and blatantly tell people you distill.
@nomic_ai
Nomic AI
1 year
Today we're releasing GPT4All, an assistant-style chatbot distilled from 430k GPT-3.5-Turbo outputs that you can run on your laptop.
44
345
2K
20
16
232
@YiTayML
Yi Tay
2 years
View from my desk at the Google Singapore office! ๐Ÿ˜€
Tweet media one
4
4
231
@YiTayML
Yi Tay
3 years
Sharing "The Benchmark Lottery" from @GoogleAI & @DeepMind . In this meta-paper (), we examine the challenges of ML benchmarking (e.g., model comparisons) and how it affects long-term progress. 1/
Tweet media one
Tweet media two
Tweet media three
3
43
222
@YiTayML
Yi Tay
3 years
New paper alert! ๐Ÿ˜€ "Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers" We study scaling laws of Transformers pertaining to both upstream & downstream transfer by pretraining over 200+ T5 models. Paper: @GoogleAI @DeepMind
Tweet media one
2
44
203
@YiTayML
Yi Tay
1 year
Hit 10k citations today ๐Ÿ˜€
25
2
201
@YiTayML
Yi Tay
1 year
ICML accepted to 2 of our papers. Congrats ICML! ๐ŸŽ‰
5
4
197
@YiTayML
Yi Tay
1 year
๐Ÿ™‹ here's me serving as living proof & live specimen that one can be 8400 miles away from SFO, in a completely non overlapping time zone, and still be right in the middle of all the action. ๐Ÿ”ฅ๐Ÿค“
@fchollet
Franรงois Chollet
1 year
The notion that "if you do AI, you have to be in San Francisco" is narcissistic BS. Easily >90% of the people who are pushing AI forward aren't located in SF. In fact it's likely >95%.
46
67
1K
6
10
198
@YiTayML
Yi Tay
2 years
Check out our recent @GoogleAI "HyperPrompt" paper at #ICML2022 . TL;DR: Hypernetwork-learned task prompts outperform prompt tuning, adapters & hyperformer. pdf: Work led by @YunHeHe17 & @HuaixiuZheng at Google Brain.
Tweet media one
1
27
194
@YiTayML
Yi Tay
2 years
Prompt:"write a tweet from Yann but in the style of @GaryMarcus "?
@ylecun
Yann LeCun
2 years
On the highway towards Human-Level AI, Large Language Model is an off-ramp.
271
321
3K
8
7
194
@YiTayML
Yi Tay
2 years
UL2 20B is a language model that trains on multiple objectives and can perform both language modeling & infilling. Nice stuff: 1. Public checkpoints 2. Great at both fine-tuning & few-shot! 3. Works well with chain-of-thought reasoning. Paper:
@GoogleAI
Google AI
2 years
Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at
18
169
715
4
48
191
@YiTayML
Yi Tay
1 year
meta ai = open ai?
9
6
188
@YiTayML
Yi Tay
3 years
Are pre-trained convolutions better than pre-trained Transformers? Check out our recent paper at #ACL2021nlp #NLProc ๐Ÿ˜€ Joint work with @m__dehghani @_jai_gupta @dara_bahri @VAribandi @pierceqin @metzlerd at @GoogleAI
Tweet media one
3
37
188
@YiTayML
Yi Tay
10 months
There are not many human beings in the entire world that have as much holistic full stack LLM experience as this man here. ๐Ÿ‘‡ Tons of wisdom in these slides here by @hwchung27 .
@hwchung27
Hyung Won Chung
10 months
I gave a talk at Seoul National University. I titled the talk โ€œLarge Language Models (in 2023)โ€. This was an ambitious attempt to summarize our exploding field. Video: Slides: Trying to summarize the field forced me to think
Tweet media one
42
615
3K
2
30
189
@YiTayML
Yi Tay
1 year
Imo the biggest open research area in LLMs now is how to exactly do evaluation correctly. It's tricky as pointed out below ๐Ÿ‘‡
@jeremyphoward
Jeremy Howard
1 year
I've been looking more closely into the evaluation based on human preferences in the draft Open Assistent (OA) paper, and I'm finding it's actually a really interesting case study in how tricky evaluation is... ๐Ÿงต
Tweet media one
8
91
651
5
13
190
@YiTayML
Yi Tay
1 year
Even though @_jasonwei left Brain for OpenAI, did you know that Brain still has @JerryWeiAI , Jason's exceptionally talented brother! ๐Ÿ”ฅ Research ability clearly runs in the family! Check out this amazing thought-provoking paper led by Jerry Wei:
Tweet media one
10
21
187
@YiTayML
Yi Tay
1 month
Recently, I went on my first podcast hosted by @swyx . ๐Ÿ˜„ It was a fun unfiltered 2 hour long conversation. Could have gone on longer but we got chased out of the studio.. ๐Ÿ˜… Talked about a lot of stuff, i.e., reminiscing old stuff at @Google and newer stuff at @RekaAILabs .
@latentspacepod
Latent.Space
1 month
๐Ÿ†• pod: The Yolo Researcher Metagame with @YiTayML ! OpenAI (ca. GPT4): ~600 people Google Gemini: ~950 coauthors @RekaAILabs : 20 people @sama once speculated on the qualities of "10,000x AI researchers", and more recently @_jasonwei described the "Yolo
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
18
109
6
27
187
@YiTayML
Yi Tay
1 year
Literally what is going on on everyone's minds now when they open twitter.
@chriswolfvision
Christian Wolf
1 year
How can one get the tweets of AI research folks but not the tweets of AI influencers? Asking for a friend.
71
48
923
7
6
183
@YiTayML
Yi Tay
1 year
Expectation: Writes LLM survey hoping to get tons of citations. ๐Ÿ˜ Reality: No one writes LLM papers anymore ๐Ÿฅฒ
@arankomatsuzaki
Aran Komatsuzaki
1 year
A Survey of Large Language Models
Tweet media one
11
219
1K
4
12
181
@YiTayML
Yi Tay
1 year
"Then comes FLANv2 โ€” very important, I may have read it more than ten times and suggest just memorizing the entire content". Wow. This is yet another great blogpost by @Francis_YAO_ that is ultra-meta but insightful/useful. More more personal thoughts: - Yes, +100 to "FLAN is
@Francis_YAO_
Yao Fu
1 year
New blog post! โœ’๏ธ June 2023, A Stage Review of Instruction Tuning
20
126
511
5
16
178
@YiTayML
Yi Tay
1 year
Bard knows Flan-UL2! Another model that was released just few weeks ago. It's fresh and up to date! ๐Ÿ”ฅ I can also tell it's conditioning on the blogpost I wrote. ๐Ÿ˜€ It's also accurate ๐Ÿคฉ
Tweet media one
5
22
173
@YiTayML
Yi Tay
1 year
๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ I am super excited and honoured to join forces with @artetxem (along with other amazing people). It's gonna be amazing and incredible journey! ๐Ÿ’ชโœจ๐Ÿคฉ
@artetxem
Mikel Artetxe
1 year
๐Ÿ“ข Life update ๐Ÿ“ข After 2.5 wonderful years, I recently left FAIR to start a new adventure. It has been a privilege to be part of such an amazing teamโ€”I have learned a lot and had so much fun! I am super excited about what is coming next, and I hope to share more details soon!
6
3
233
12
12
173
@YiTayML
Yi Tay
1 year
Pretty cool idea! Great to see Flan-T5 (despite being the smallest model here) hold it's ground pretty well ๐Ÿ”ฅ. It even outperforms other LMs like Dolly or StableLM. Also another noteworthy point is that at "compute-match", Flan-T5 3B is equivalent to the cost of a 1.5B
@lmsysorg
lmsys.org
1 year
Evaluating LLMs is notoriously difficult, and academic benchmarks may fail. Inspired by chess and MOBA games, we are taking a new approach by calculating Elo ratings of models with crowdsourced battle data. - Blog: - Leaderboard:
Tweet media one
31
276
1K
2
24
171
@YiTayML
Yi Tay
1 year
Happy to share this new work on Generative Retrieval for Recommender Systems in collaboration with Youtube! ๐Ÿ’ซโœจ This paper draws inspiration from our Differentiable Search Index (DSI) paper which pioneered the generative retrieval paradigm for document retrieval. ๐Ÿ˜Ž
@madiator
Mahesh Sathiamoorthy
1 year
Happy to share our recent work "Recommender Systems with Generative Retrieval"! Joint work with @shashank_r12 , @_nikhilmehta , @YiTayML , @vqctran and other awesome colleagues at Google Brain, Research, and YouTube. Preprint: #GenerativeAI ๐Ÿงต (1/n)
Tweet media one
13
72
478
4
15
168
@YiTayML
Yi Tay
2 months
Wow this is a great technical lecture by @hwchung27 . ๐Ÿ˜„ Really glad someone finally dived deep into that encoder-decoder / decoder discussion! ๐Ÿ˜„ I think not many people understand the intricacies of this topic, and these days many people don't even know what "input" and
@hwchung27
Hyung Won Chung
2 months
I gave a lecture at @Stanford CS 25. Lecture video: AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself. First step is to identify and understand
Tweet media one
25
205
1K
4
20
163
@YiTayML
Yi Tay
1 year
Introducing Vit-22B! Vit-22B is the largest dense Vision transformer ever trained ๐Ÿ”ฅ. It's time for vision to catch up to language in the scaling game! I am excited to find out what other emergent abilities can be found by scaling up vision models!
@m__dehghani
Mostafa Dehghani
1 year
1/ There is a huge headroom for improving capabilities of our vision models and given the lessons we've learned from LLMs, scaling is a promising bet. We are introducing ViT-22B, the largest vision backbone reported to date:
Tweet media one
12
134
801
2
19
161
@YiTayML
Yi Tay
1 year
Wow! ๐Ÿ˜ถ Not cool meta! Gotta admit the first thing I looked for in the llama-2 paper was @GuillaumeLample on the author list. PS: Google always retained authors on the papers even after they left the company. Don't be evil!
1
11
157
@YiTayML
Yi Tay
1 year
When you think you found a witty rebuttal to a popular paper only to find out that your ideas have been already scooped by the original paper itself. ๐Ÿ˜ฌ one step ahead bro. Disclaimer: I have not read this fancy "mirage" paper in detail but here's an excerpt from the original
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
1 year
Are Emergent Abilities of Large Language Models a Mirage? Presents an alternative explanation for emergent abilities: one can choose a metric which leads to the inference of an emergent ability or another metric which does not.
Tweet media one
24
185
973
7
18
157
@YiTayML
Yi Tay
2 years
Happy New Year / New Years Eve! ๐Ÿฅณ(depending on where you are) Here's a thread of me reflecting on 2022 and all the research I've contributed to this year and stuff I did. ๐Ÿงต quite a long chronological thread below ๐Ÿ‘‡
3
12
155
@YiTayML
Yi Tay
2 years
My first blog post on "emergence, scaling and inductive bias"! This is a medley discussion piece of some of our recent works on Emergence, U-PaLM, scaling laws vs models, CoT vs inverse scaling and more! Check it out:
3
23
154
@YiTayML
Yi Tay
4 years
As a companion to our recent efficient Transformer survey, we designed "Long Range Arena" a new challenging benchmark to help understand and analyze trade-offs between recent efficient Transformer models. Check out our paper at . @GoogleAI @DeepMind
Tweet media one
Tweet media two
5
39
151
@YiTayML
Yi Tay
1 year
a nice plot that sums up things nicely. my vote is to make this the cover of the "state of AI report" or whatever that people love to make.
@savvyRL
Rosanne Liu
1 year
Fixed.
Tweet media one
48
230
1K
7
6
144
@YiTayML
Yi Tay
1 year
a flan-ul2 32b with a new (not-horrible) tokenizer is going to beat all open source models out there today. someone at google should do it.
@cephaloform
Maxine (t/prog)
1 year
it's a shame y'all aren't using ul2-20b more often
6
2
33
5
12
140
@YiTayML
Yi Tay
1 year
I really enjoyed my casual Saturday morning coffee chat with @hwchung27 . Tons of technical wisdom and fun. With respect to life he basically said "extreme experiences are way more valuable even if they are hard" just like how training on only easy examples don't produce
4
10
140
@YiTayML
Yi Tay
2 years
OPT-IML 175B is outperformed by the much smaller Flan-T5 11B. ๐Ÿ˜ถ๐Ÿค”
@OwainEvans_UK
Owain Evans at ICML Vienna
2 years
Meta's new instruction-tuned model vs Google's PaLM (their best published model) and OAI's GPT3.5 models (which power ChatGPT).
Tweet media one
5
22
212
4
18
139
@YiTayML
Yi Tay
5 months
Someone did a vibe check comparison of GPT-4, Claude-3, Gemini Advanced, Mistral Large and Reka Flash. I think Reka Flash did pretty well for a 21B model ๐Ÿ˜ƒ.
Tweet media one
12
20
139
@YiTayML
Yi Tay
1 year
Damn ๐Ÿ”ฅ๐Ÿ”ฅ This is such a big deal! MHA vs MQA has been always hotly debated and I've always felt this was the right "de-risked" way to go about it. Congrats to @michielsdj to the great & impactful work!
@michielsdj
Michiel de Jong
1 year
New paper! Multi-query attention trades quality for speed and requires training a new model. Instead uptrain improved MQ variant from existing multi-head model! Work with Joshua Ainslie, James Lee-Thorp, @_theopompus , Federico Lebron, Sumit Sanghai.
9
65
320
2
17
135
@YiTayML
Yi Tay
1 year
Now this is the type of excellent work that the community needs more of! Everyone cranking out minute-of-fame "model distillation" papers should take a look at this fine exemplar of good science below: ๐Ÿ‘‡ great work @EdwardSun0909
@generatorman_ai
generatorman
1 year
Move over Alpaca, IBM just changed the game for open-source LLMs ๐Ÿ’ฅ Dromedary๐Ÿช, their instruction-tuned Llama model, beats Alpaca in performance ๐™ฌ๐™ž๐™ฉ๐™๐™ค๐™ช๐™ฉ distilling ChatGPT, and ๐™ฌ๐™ž๐™ฉ๐™๐™ค๐™ช๐™ฉ human feedback! How do they do it? ๐Ÿ‘‡ (1/4)๐Ÿงต
19
250
1K
1
22
133
@YiTayML
Yi Tay
4 months
most solid architecture is the "Noam" architecture. stop calling it a llama or whatever. this is the Noam transformer. (you can call it PaLM architecture too!)
@visheratin
Alexander Visheratin
4 months
The most solid architecture =)
Tweet media one
0
2
36
2
11
131
@YiTayML
Yi Tay
1 year
Just realized today I have almost the same number of twitter followers and citations ๐Ÿ˜‚ but missed the moment I had the exact balance. This should have been a few hours ago. Dang!
Tweet media one
Tweet media two
13
2
131
@YiTayML
Yi Tay
8 months
Congrats to Google and all my friends for this amazing launch!
@sundarpichai
Sundar Pichai
8 months
Introducing Gemini 1.0, our most capable and general AI model yet. Built natively to be multimodal, itโ€™s the first step in our Gemini-era of models. Gemini is optimized in three sizes - Ultra, Pro, and Nano Gemini Ultraโ€™s performance exceeds current state-of-the-art results on
Tweet media one
996
4K
24K
2
4
128
@YiTayML
Yi Tay
1 year
In the spirit of being very meta here. Here's my personal meta-review of all the leaderboard-ing methodologies. 1. I like the elo ranking based on chatbot arena from @lmsysorg 2. LM harness (e.g., zero-shot PIQA, Hellaswag etc) is the equivalent of "MNIST" for LLMs. Okay-ish
10
11
131
@YiTayML
Yi Tay
1 year
so many problems i don't know where to begin. - yea put sparse and dense models in the same plot with the # params. good job ๐Ÿ‘ - i'm sure you know the size of palm-2 and gpt-4. ๐Ÿ‘€ - fwiw, t5 is still one of the best LM models out there. it started way earlier than 2021. -
@ivanzhouyq
Ivan Zhou
1 year
Proliferation of LLMs. Some highlights: 1. @Google started early in 2021 with LaMDA and FLAN 2. Now @Google , @OpenAI , and Chinese players are actively competing on the top half of chart 3. The bottom of chart is dominated by the open source community, with impressive output speed
Tweet media one
10
95
318
6
14
130
@YiTayML
Yi Tay
1 year
The best "can ChatGPT do X" paper I've seen! ๐Ÿ’ฏ๐Ÿ’ฏ๐Ÿ’ฏ
@_akhaliq
AK
1 year
PokemonChat: Auditing ChatGPT for Pokรฉmon Universe Knowledge paper page: probe ChatGPT for its conversational understanding and introduce a conversational framework (protocol) that can be adopted in future studies. The Pok'emon universe serves as an
Tweet media one
5
134
591
2
14
129
@YiTayML
Yi Tay
1 year
We are at the point now that model outputs are better than the average human being huh. ๐Ÿค”
@arankomatsuzaki
Aran Komatsuzaki
1 year
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks ChatGPT outperforms crowd-workers for several annotation tasks, including relevance, stance, topics, and frames detection, w/ 20x less cost.
Tweet media one
21
152
732
5
15
124
@YiTayML
Yi Tay
2 years
Section 4.4 of is the answer to your question ๐Ÿ˜ƒ
@MitchellAGordon
Mitchell Gordon
2 years
Why aren't any recent LLMs (OPT, PaLM, etc.) using "efficient" architectures (Reformer, Longformer, etc.)? There's 20+ of them, and they've been around since 2020! Are they actually *not* more efficient?
16
33
387
2
12
124
@YiTayML
Yi Tay
1 year
Interesting paper from my ex-colleagues at @GoogleAI led by @vqctran . Generative retrieval (i.e., DSI) is one of the most fun works I've worked on (and pioneered) during my Google career. Also, @vqctran is driving a lot of the agenda that we worked on together back then. He has
@arankomatsuzaki
Aran Komatsuzaki
1 year
How Does Generative Retrieval Scale to Millions of Passages? Finds that the use of synthetic queries as a document representation strategy is the only approach that remained effective as they scaled up the corpus size using MS MARCO passages.
Tweet media one
1
48
192
1
18
122