Hyung Won Chung @hwchung27 Twitter profile

Pinned Tweet

Hyung Won Chung

4 months

I gave a lecture at @Stanford CS 25. Lecture video: AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself. First step is to identify and understand

29

230

1K

Last Seen Profiles

@MusclePupRex

@cecilford20

@wendyshaygh

@x_brunettebb

@Sarwoto2755833

@coculosa

@NadiaAh96798543

@macaria_clara

@johnson0671

@WaymoCommunity

@ASM_Mercato

@Graenn_dragon

@Hijabbacol2883

@Xanderrific

@kutsukakeS0PH1A

@chau3000

@cr4vensworth

@Bmotta95886005

@CironNo1

@coasterlauren

@morgan_addie51

@LkthTama

@WHALEMembers

@alkemigames

@turbanlisever69

@RadnorCapital

@kak_enak

@wajid49743

@lionessaboutmy4

@CharlesJone5

@adorbvee

@LatiaDJackson

@lujmnzzz

@FootyXtra

@kazama_0425

@NGSHoops

Hyung Won Chung

@hwchung27

1 year

I gave a talk at Seoul National University. I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field. Video: Slides: Trying to summarize the field forced me to think

42

618

3K

Hyung Won Chung

@hwchung27

2 years

Excited to share that I joined @OpenAI after 3 incredible years at Google Brain! Can’t wait to work on #ChatGPT and help drive the future of AI.

36

44

1K

Hyung Won Chung

@hwchung27

12 days

Here is my talk at @MIT (after some delay😅) I made this talk last year when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm. It's a good time to zoom out for high level thinking. (1/11)

12

148

906

Hyung Won Chung

@hwchung27

11 months

OpenAI is nothing without its people

13

46

764

Hyung Won Chung

@hwchung27

1 year

I gave an invited lecture on Instruction finetuning and RLHF for @hhexiy 's class at NYU. One unique perspective of my lecture is that I introduce RLHF as an instance of using a learned objective function. Video: Slides:

9

156

678

Hyung Won Chung

@hwchung27

1 year

We are hiring in the ChatGPT team! Happy to chat about this position. DMs are open. Instead of your papers, I’d love to learn about the most difficult technical problem you worked on and your lessons. It doesn’t have to be ML. I value exceptional technical skill a lot more than

Barret Zoph

@barret_zoph

1 year

Our team at OpenAI is hiring! We're looking for engineers/researchers who do rigorous and thoughtful work understanding and evaluating LLMs like ChatGPT. If you're interested, please apply online and DM me with work that you've done!

44

106

749

24

73

610

Hyung Won Chung

@hwchung27

1 year

An interesting confounding factor in comparing these models is that training details really matter. For Flan-T5, resetting the Adafactor optimizer states during instruction finetuning was the biggest factor. It increased the MMLU almost double digit from 43 to 52. This was

Yi Tay

@YiTayML

1 year

Hot take 🔥: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models 😀 Take a look at this table from this new InstructEval paper: . Some thoughts/observations: 1.

49

207

1K

13

66

474

Hyung Won Chung

@hwchung27

11 months

Many visionaries talk about future. But talking to @sama is another level. It feels like he is already in 2030 and talks back at me. Then thinking about future becomes an "interpolation" between where I am and where he is, as opposed to extrapolation into the wild Can't think

17

15

399

Hyung Won Chung

@hwchung27

1 year

Research code that doesn’t make readers feel dumb is great. Too often, code is written to showcase the author's advanced knowledge of the language/framework, which overwhelms the reader. Researchers come to the code with other thoughts/hypotheses in mind. The mental bandwidth is

9

33

356

Hyung Won Chung

@hwchung27

2 years

New paper + models! We extend instruction finetuning by 1. scaling to 540B model 2. scaling to 1.8K finetuning tasks 3. finetuning on chain-of-thought (CoT) data With these, our Flan-PaLM model achieves a new SoTA of 75.2% on MMLU.

Quoc Le

@quocleix

2 years

New open-source language model from Google AI: Flan-T5 🍮 Flan-T5 is instruction-finetuned on 1,800+ language tasks, leading to dramatically improved prompting and multi-step reasoning abilities. Public models: Paper:

40

489

2K

9

64

350

Hyung Won Chung

@hwchung27

5 months

End-to-end eventually wins. And when it does, it is elegant. It feels like cleaning up the ML-debt we have put in.

10

16

321

Hyung Won Chung

@hwchung27

1 year

I've been experimenting with Test Driven Development with GPT-4. I first write test cases to formalize the desired behavior, then ask GPT-4 to write a function and suggest additional tests if needed. I've found this method more efficient than writing the function first and then

25

37

318

Hyung Won Chung

@hwchung27

8 months

For the first time, OpenAI nailed the model naming. We are getting better

15

6

303

Hyung Won Chung

@hwchung27

9 months

2020 at Google Michelle (manager): Do you want to be a mentor for an incoming resident? Me: Hmm not sure if I am qualified Michelle: Yes you are Me: Ok I will try A month later mentee: Hi I am Jason Wei. I just joined 2023 me: "This year I am especially thankful for Michelle"

Jason Wei

@_jasonwei

9 months

This year I am especially thankful for @hwchung27 , who has been my closest collaborator for more than a year now. I have many good things to say about Hyung Won, but to me his most salient trait is original thinking. I would describe his thinking style as highly logical, based

10

9

338

4

9

297

Hyung Won Chung

@hwchung27

1 year

Random life hack to fall asleep quickly. Pick a recursion or backtracking problem that is non-trivial. Run through test cases in your head. This will quickly max out your working memory. Your brain will beg to be shut off. And you will be in sleep. TLDR; make your brain OOM

13

19

270

Hyung Won Chung

@hwchung27

1 year

I see many people self-impose imaginary rules that hold them back from achieving more. When I first started working on deep learning, I admired Noam’s work on scaling and wanted his advice. But I imposed an imaginary rule: “I have to be good enough not to waste his time”. So I

6

18

267

Hyung Won Chung

@hwchung27

1 year

Machine unlearning is important but human unlearning is equally so, especially for LLM researchers. Without a strong theoretical framework to guide us, LLM researchers heavily rely on intuitions formed from empirical observations. The emergent abilities of LLMs, however, mean

9

35

250

Hyung Won Chung

@hwchung27

1 year

Many find this crazy but I use a single screen workflow. WHY? Fingers🖐️ are faster than head/eyes 👀. You move your head and/or eyes to switch between monitors. With keyboard shortcuts + multiple desktops, I always stare at the same thing and only my fingers move. Much faster!

30

16

244

Hyung Won Chung

@hwchung27

11 months

🤍

Sam Altman

@sama

11 months

i love the openai team so much

5K

4K

72K

3

8

221

Hyung Won Chung

@hwchung27

1 year

A counterintuitive implication of scale: trying to solve a more general version of the problem is an easier way to solve the original problem than directly tackling it. Attempting a more general problem encourages you to come up with a more general and simpler approach. This

11

25

222

Hyung Won Chung

@hwchung27

12 days

Full video: Slides: Here is the talk summary 🧵 (2/11)

Don’t teach. Incentivize

Don’t teach. Incentivize. MIT EI seminar Hyung Won Chung OpenAI Don’t teach. Incentivize. My research focuses on developing general intelligence as opposed to a specialized one. It is impossible to...

docs.google.com

6

38

220

Hyung Won Chung

@hwchung27

1 year

Not having a strong ego is pretty useful. - I don't fear becoming a beginner again. - In fact, I like being below-average in the room as my rate of learning is likely above-average. - I am fine with working on ideas that I didn't come up with. I just want to work on the most

7

24

212

Hyung Won Chung

@hwchung27

1 year

@tszzl I don’t think this is specific to ai. People have tendency to underestimate the changes in the future despite having witnessed substantial changes in the past

10

8

199

Hyung Won Chung

@hwchung27

1 year

Happy to release: 1. upgraded mT5 checkpoints: 2. refreshed mC4, a multilingual pre-training dataset: The new mC4 covers CommonCrawls in 101 languages up to Aug. 2022 3. And a new ICLR paper:

README.md · allenai/c4 at mC4_3.1.0

huggingface.co

8

33

193

Hyung Won Chung

@hwchung27

20 days

A Korean character is formed by combining consonants and vowels in various ways. So one way to corrupt a character is to add an unnecessary consonant (e.g. ㅅ). The resulting combination is so unnatural to Koreans that they can automatically undo this change. This is

OpenAI

@OpenAI

20 days

OpenAI o1 translates a corrupted sentence.

33

189

1K

8

36

196

Hyung Won Chung

@hwchung27

2 years

The biggest surprise from working on the Flan project was how good Flan-T5-XXL was for its size. However this model was less accessible because it requires some knowledge of model parallelism. Happy to see tutorials like this, which makes XXL model more accessible!

Philipp Schmid

@_philschmid

2 years

🚨Attention #NLP enthusiasts! We just published a new blog post on how to fine-tune FLAN-T5-XXL using DeepSpeed & Hugging Face Transformers! 🚀 👉 We ran a series of experiments to help you choose the right hardware setup.🤖💻

7

84

444

2

19

191

Hyung Won Chung

@hwchung27

8 months

Last week marked 1 year at OpenAI. Reflecting back, I think the most unique aspect of OpenAI is the importance of mission, which seems to be less emphasized elsewhere. To be honest, I didn’t realize this either when I first joined. Now I believe mission is critical because: 1)

4

11

171

Hyung Won Chung

@hwchung27

10 months

For research, it is more important to deeply understand the basics and have the right perspective than to dive into fancy ideas. In this lecture, Jason shares how he thinks about language models. I find it so unique and insightful that I sneaked into Stanford to listen 🥷

Jason Wei

@_jasonwei

10 months

It was an honor to give a guest lecture yesterday at Stanford’s CS330 class, "Deep Multi-Task and Meta-Learning"! I discussed a few very simple intuitions for how I personally think about large language models. Slides: Here are the six intuitions: (1)

22

281

2K

0

11

150

Hyung Won Chung

@hwchung27

7 months

Being brutally honest with oneself is difficult, especially when it requires facing harsh reality. Here is how I strive for self-honesty. I observe myself as if I were a ghost floating above. And in doing so, I replace the subject from “I” to “this monkey". For example Inner

7

8

149

Hyung Won Chung

@hwchung27

3 months

While we love to train big models more than anyone, OpenAI also knows how to train small models. Very well.

Sam Altman

@sama

3 months

towards intelligence too cheap to meter: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. most importantly, we think people will really, really like using the new model.

509

1K

10K

2

6

147

Hyung Won Chung

@hwchung27

11 days

An extended version of the 3min video from last week! If you're interested in @OpenAI 's research but weren't sure how it feels to work here, this is the closest thing. It shares what researchers value (e.g. challenges involved in scaling), what they

Building OpenAI o1 (Extended Cut)

Top row (left to right): Mark Chen, Giambattista Parascandolo, Trapit Bansal, Łukasz Kaiser, Hunter Lightman, Karl Cobbe, Łukasz Kondraciuk, Szymon Sidor, No...

www.youtube.com

1

20

141

Hyung Won Chung

@hwchung27

5 months

Recently the level of stress has been creeping up quite a bit. In general, I love scaling beyond measure but this is not the thing I want to scale. So I did some introspection. As I am working in a field that is advancing exponentially, the range of outcomes is getting larger

6

3

136

Hyung Won Chung

@hwchung27

1 year

When working on intellectually challenging problems, I often notice that I have subconsciously closed my eyes. It's as if my mental capacity is reaching its limit, and my brain is desperately freeing up cognitive resources by eliminating unrelated signals like visual stimuli.

6

7

121

Hyung Won Chung

@hwchung27

27 days

“Is what I am working on irrelevant?” has been one of the most useful questions for my career. Being extremely honest in answering that requires courage but it increases the chance of working on the right thing, which matters more than how good I am And I ask this very often

0

15

120

Hyung Won Chung

@hwchung27

3 years

Excited to present a new ICLR paper from Google Research and DeepMind: Our key contributions: - New insights on creating mode parameter-efficient and transferable models via embedding decoupling - RemBERT, which outperforms XLM-R and mT5-Large

2

31

116

Hyung Won Chung

@hwchung27

10 days

Learning, if defined from first principles, shouldn't need to assume that a student is of a particular type (human, monkeys, machines). I believe that machines are now capable enough that the education for humans and machines is converging!

Jerry Tworek

@MillionInt

11 days

“Don’t teach, incentivize” is a great concept that applies to both machines and humans. Huge credit to @hwchung27 for being able convey a lot of wisdom in very few words.

5

7

90

5

19

119

Hyung Won Chung

@hwchung27

8 months

Working in the field, positive surprises are pretty rare. But this one surprised me. Wow. Having a hard time thinking about the implications of text-to-video when it improves 100x from this point 🤯

OpenAI

@OpenAI

8 months

Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy

10K

32K

138K

1

0

108

Hyung Won Chung

@hwchung27

4 months

One reason why encoder-decoder/decoder-only is still confusing is that not many people need to implement Transformers from scratch these days. There is a level of understanding that can only be achieved by struggling to implement something from scratch. I highly recommend this!

Yi Tay

@YiTayML

4 months

Wow this is a great technical lecture by @hwchung27 . 😄 Really glad someone finally dived deep into that encoder-decoder / decoder discussion! 😄 I think not many people understand the intricacies of this topic, and these days many people don't even know what "input" and

4

20

163

2

11

100

Hyung Won Chung

@hwchung27

1 year

Compute + data + transformer doesn’t automatically lead to a good model. It needs people who systematically suffered debugging these models at various scales. @YiTayML has suffered enough. So I expect some good models.

Yi Tay

@YiTayML

1 year

We’re coming out of stealth with $58M in funding to build generative models and advance AI research at @RekaAILabs 🔥🚀 Language models and their multimodal counterparts are already ubiquitous and massively impactful everywhere. That said, we are still at the beginning of this

94

75

924

3

9

95

Hyung Won Chung

@hwchung27

1 year

In 2013, I took a class taught by Prof Strang. At the time he had been teaching at MIT for 52 years. He continued teaching for another 10 years. Yesterday he gave his final lecture He taught me how to like Linear Algebra and how invaluable teaching is.

Gil Strang's Final 18.06 Linear Algebra Lecture

Speakers: Gilbert Strang, Alan Edelman, Pavel Grinfeld, Michel Goemans Revered mathematics professor Gilbert Strang capped a 61-year career as a faculty mem...

www.youtube.com

3

4

95

Hyung Won Chung

@hwchung27

20 days

“how many r’s in strawberry?” I had to ask this to demo our new model o1-preview 😎 LLMs process text at a subword level. A question that requires understanding the notion of both character and word confuses them. OpenAI o1-preview "thinks harder" to avoid mistakes.

OpenAI

@OpenAI

20 days

OpenAI o1 answers a famously tricky question for large language models.

57

189

2K

5

7

92

Hyung Won Chung

@hwchung27

1 year

Long term thinking is surprisingly rare even in places like Silicon Valley. One of the causes is that even if you are working towards long term impact, the day-to-day often feels incremental and mundane. I find it really useful to practice zooming out of the incremental progress

1

10

92

Hyung Won Chung

@hwchung27

6 months

People don’t like to repeat because they don’t feel like making progress. But repetition is necessary for deeper understanding. E.g. - Re-reading books - Repeating the thought process understanding a new concept Unfortunate side effect of over-reliance on quantitative metrics

2

10

91

Hyung Won Chung

@hwchung27

4 months

Jason walked into the classroom without anything (no laptop, no notes) and gave a lecture out of memory. I felt so glad that I refused to also give a blackboard lecture.

Jason Wei

@_jasonwei

4 months

As a kid I loved whiteboard lectures way more than slides, so for Stanford’s CS25 class I gave a whiteboard lecture! My goal was to simply and clearly explain why language models work so well, purely via intuitions. Youtube video: (w/ @hwchung27 )

7

97

658

0

88

Hyung Won Chung

@hwchung27

7 months

“[9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson” This has it all. Think about AGI, drop the “scientist ego” and seek divine benevolence. This is AI research at its core

Jason Wei

@_jasonwei

7 months

My typical day as a Member of Technical Staff at OpenAI: [9:00am] Wake up [9:30am] Commute to Mission SF via Waymo. Grab avocado toast from Tartine [9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson [10:00am] Meetings (Google Meet). Discuss how to

152

308

4K

1

3

86

Hyung Won Chung

@hwchung27

1 year

Finally found time to read this blog post. For researchers, fellow researchers are like customers. Learning that my research affected other researcher in such a positive way is the best customer feedback. This made my day!

Yao Fu

@Francis_YAO_

1 year

New blog post! ✒️ June 2023, A Stage Review of Instruction Tuning

20

126

510

3

4

83

Hyung Won Chung

@hwchung27

1 year

A lot of AI research has shifted from “building” models to “using” models. Creativity and curiosity play much bigger roles in this new era. Not sure about creativity but you can complement curiosity to some extent. Think what your curious friend would have done in a given

0

18

83

Hyung Won Chung

@hwchung27

4 months

A saturated benchmark gives a false impression that the underlying progress is slowing down. Benchmarks are proxy for what we care about, which are often hard to measure. When they are saturated, they are useless and even misleading.

4

7

83

Hyung Won Chung

@hwchung27

21 days

A good model satisfies users’ prompts. A great model changes the types of prompts by expanding what is possible. Benchmarks like LMSYS provide good insight but they can't measure the latter. We should at least be aware of it. Otherwise, we incentivize incremental progress

2

10

82

Hyung Won Chung

@hwchung27

1 year

I am very excited that the MedPaLM paper is now published in @Nature It is a great way to invite the broader scientific community to LLMs. I feel like LLMs are more adopted in the general public than in the scientific community. There are just so many

Large language models encode clinical knowledge

Nature - Med-PaLM, a state-of-the-art large language model for medicine, is introduced and evaluated across several medical question answering tasks, demonstrating the promise of these models in...

www.nature.com

5

77

Hyung Won Chung

@hwchung27

2 years

Hey Bard…? 🤔

5

9

77

Hyung Won Chung

@hwchung27

28 days

I'd like to clarify a few points on this slide from my previous talk to avoid potential confusion () 1) As cited in the slides, this function is adapted from Noam's multiquery paper, which I highly recommend. This is the best resource to learn about

Cosmin Negruseri

@cosminnegruseri

29 days

love this compact multi-head attention code from @hwchung27

17

27

436

0

10

79

Hyung Won Chung

@hwchung27

1 year

In an empirical research field such as deep learning, willingness to discard one’s own hard work is crucial. Try out a bunch of approaches, ruthlessly prioritize and trim less promising directions. But in practice it is hard every time. Good bye my dear code 😞

1

5

78

Hyung Won Chung

@hwchung27

7 months

I strive not to be too organized because doing so misses a lot of deep lessons that tend to compound in the long run. I sometimes work on things that don’t generate output for some time. From a highly organized person’s perspective, I am not being “productive” and this is a

2

5

74

Hyung Won Chung

@hwchung27

1 year

Compression begets clarity. - Kill 90% of Chrome tabs at the end of each day. What do you leave open? - Summarize the entire field of LLMs into a 50-min talk. - What is one foundational principle behind every major AI breakthroughs? - If you could recommend only one book, what

4

14

71

Hyung Won Chung

@hwchung27

1 year

Just like some books have an audiobook version, I'd love to see an LLM version of a book. Books represent a unidirectional mode of knowledge transfer. With "LLMs for books"—maybe achieved through fine-tuning or in-context learning—the knowledge transfer could become

4

8

65

Hyung Won Chung

@hwchung27

5 months

I'm getting used to AI surpassing me in more areas, much like how I trust Google Maps over me. Even two years ago, it was so easy to look at the model generation and grade it myself. Now it is quite difficult for some domains (e.g. GPQA eval). Such a humbling experience.

0

2

68

Hyung Won Chung

@hwchung27

19 days

Such an honor to have an opportunity to work together and learn from these researchers! This video doesn't show all the great people who worked on this project. Please check out

OpenAI o1 Contributions

openai.com

OpenAI

@OpenAI

19 days

Some of our researchers behind OpenAI o1 🍓

228

843

7K

3

7

67

Hyung Won Chung

@hwchung27

11 months

Just before thanksgiving!

OpenAI

@OpenAI

11 months

We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.

6K

13K

66K

2

0

67

Hyung Won Chung

@hwchung27

6 months

Flan2 paper is now on JMLR, 1.5 years after the initial arXiv release. It already feels quite dated, reflecting how fast the field is moving. That said Flan-T5 series is still going strong, with an astonishing 52M cumulative downloads 🤯 How are people using these models?

5

66

Hyung Won Chung

@hwchung27

10 months

I will be on the panel for the Instruction following workshop at #NeurIPS2023 . Of course I will interpret everything from scaling perspective 😎 Today 10:45-11:30am, Room 220-222

0

12

58

Hyung Won Chung

@hwchung27

12 days

Unfortunately, this analogy also referred to my childhood self, who did not enjoy math, no matter the reward 😂

Yi Tay

@YiTayML

12 days

Great talk by @hwchung27 . I really like the interesting analogies here (he's great at that!). My favourite one is "no amount of bananas will incentivize monkeys to do mathematical reasoning" 🤣

1

12

72

3

2

56

Hyung Won Chung

@hwchung27

7 months

Many people learn about the tools just enough to get the job done. I prefer to dive deeper; understanding my tools in detail makes my work much more fun. Not sure if it's good or bad. Just more fun! Perhaps that’s what truly matters in the end.

5

3

56

Hyung Won Chung

@hwchung27

12 days

I titled the talk “Don’t teach. Incentivize”. We can’t enumerate every single skill we want from an AGI system because there are just too many of them. In my view, the only feasible way is to incentivize the model such that general skills emerge. (3/11)

1

7

55

Hyung Won Chung

@hwchung27

1 year

Additional benefits of pair programming: 1. When I think deeply about a problem, in my head I make logical jumps and stitch thoughts together in an incoherent manner. I am very generous to myself when it comes to such logical flaws. Often the implication of this is uncovered

Jason Wei

@_jasonwei

1 year

Pair programming isn’t standard at most companies and basically non-existent in academia, but I’ve been doing it with @hwchung27 for almost a year now. While it naively seems slower to code individually, I’ve realized that there are many benefits: (1) In AI, what you work on can

37

118

984

0

4

53

Hyung Won Chung

@hwchung27

1 year

Many people fear reading because if they fail to understand what they are reading, it doesn't feel good and can even hurt their ego. If this unpleasant experience happens repeatedly, they avoid reading, as it becomes associated with negative rewards. Take "Googling" as an

5

4

52

Hyung Won Chung

@hwchung27

1 year

Here is how AI can revolutionize education. 1. AI estimates the capability of a student (human, AI, etc) 2. It consistently provides materials say 0.1% beyond current capability. Consistency is the key; learning exponentially compounds. 3. Scale to all Can’t fathom the impact

5

4

51

Hyung Won Chung

@hwchung27

2 years

@achowdhery and I are at the Google booth (Hall G) talking about PaLM and Flan! #NeurIPS2022

1

2

50

Hyung Won Chung

@hwchung27

10 months

Board hiring process for transparency

Jason Wei

@_jasonwei

10 months

Today I am pleased to announce the new board of directors for my relationship. The new board of directors will be: 1. My mom 2. My girlfriend’s sister 3. @hwchung27 , who I pair program with frequently 4. Bret Taylor (we’ve only met once, but every board should have Bret Taylor)

41

24

1K

1

0

49

Hyung Won Chung

@hwchung27

5 months

I intentionally did not fix the broken copilot for a few days because that makes me more grateful for what I take it for granted. Remove what I use all the time and later when it put it back I realize how great that thing has been. @_jasonwei i invite you to drop cursor and use

Jason Wei

@_jasonwei

5 months

One time I was pair programming with @hwchung27 , and his github co-pilot extension was broken so he was manually typing every word. What an awful experience, it was like watching my granddad typing on apple notes on his iphone 7 Nice reminder for how quickly we have used AI to

5

8

143

2

3

49

Hyung Won Chung

@hwchung27

2 years

A few times I found myself questioning my own judgment when disagreeing with GPT-4. This reminded me of Google Maps; I began to trust its guidance more than my instincts once it crossed a certain threshold. GPT-4 is posed to usher in significant shifts in our perception of AI

OpenAI

@OpenAI

2 years

Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:

2K

17K

63K

1

6

48

Hyung Won Chung

@hwchung27

2 years

One of the most important aspects of Flan is its generality. This paper extends that further; instruction finetuning benefits "single-task" finetuning as well. You can further finetune Flan-T5 on your custom tasks and that is likely better than finetuning T5!

Shayne Longpre

@ShayneRedford

2 years

✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/

24

249

1K

1

8

48

Hyung Won Chung

@hwchung27

1 year

With macOS, I use multiple desktops each with a shortcut - option-1 to get to desktop 1 for project 1 - option-5 to get email/calendar I also use @apptivateapp to set - option-t for iTerm (Vim ftw!) - option-s for Slack - option-c for Chrome

4

1

48

Hyung Won Chung

@hwchung27

8 months

As the field matures, it becomes rarer to build something from scratch. So difficulties associated with such endeavor is often overlooked. Huge congrats for @YiTayML and the team for achieving this milestone so quickly!

Yi Tay

@YiTayML

8 months

We are excited to share Reka Flash ✨, a new state-of-the-art 21B multimodal model that rivals Gemini Pro and GPT 3.5 on key language & vision benchmarks 📈. We've trained this model from scratch and ground zero with a small (but amazingly capable team 🧙‍♂️) and relatively finite

52

72

571

1

46

Hyung Won Chung

@hwchung27

1 year

Manually examining the data and model output is a great way to deeply understand the problem. It is like lubricating the brain. It reduces the friction in thinking within the domain; I can think faster and make deeper reasoning steps. This could mean the difference between

Jason Wei

@_jasonwei

1 year

One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the

47

207

2K

0

3

47

Hyung Won Chung

@hwchung27

12 days

In Dragon Ball, there is “Room of spirit and time”. You train one year inside the room and it is only a day outside. The multiplier is 365. For machines it is a lot higher. So a strong generalist with more compute is often better at special domains than specialists. (10/11)

2

3

48

Hyung Won Chung

@hwchung27

12 days

I hope this lecture sparks interest in high level thinking, which will be useful in building better perspectives. This in turn will lead to finding more impactful problems to solve. Thanks @hjterrysuh and MIT EI Seminar for hosting me! (11/11)

3

0

48

Hyung Won Chung

@hwchung27

7 months

Great to see such detailed descriptions of challenges training large models from scratch. Such knowledge is extremely valuable and scarce. Hope more people share their unique experience!

Yi Tay

@YiTayML

7 months

Long overdue but here's a new blogpost on training LLMs in the wilderness from the ground up 😄🧐 In this blog post, I discuss: 1. Experiences in procuring compute & variance in different compute providers. Our biggest finding/surprise is that variance is super high and it's

44

254

2K

2

4

45

Hyung Won Chung

@hwchung27

4 months

Thanks, this totally justifies my struggle getting the optimal amount of squiggles 😅 I am obviously biased but pretty much all learning-based AI should be understood with this plot as the unified perspective!

Jerry Tworek

@MillionInt

4 months

I think plot from @hwchung27 and @_jasonwei if understood is worth more than any other education in AI

4

6

99

2

4

46

Hyung Won Chung

@hwchung27

3 months

I really liked that this podcast is unfiltered. Feels raw but when a lot of videos on internet are highly polished, this raw feeling stands out to me. As always Yi is very transparent about sharing his experience, which is really helpful for those who want to learn about AI

Yi Tay

@YiTayML

3 months

Recently, I went on my first podcast hosted by @swyx . 😄 It was a fun unfiltered 2 hour long conversation. Could have gone on longer but we got chased out of the studio.. 😅 Talked about a lot of stuff, i.e., reminiscing old stuff at @Google and newer stuff at @RekaAILabs .

6

27

189

1

5

42

Hyung Won Chung

@hwchung27

12 days

An analogy I used is extending the old saying: "Give a man a fish, you feed him for a day. Teach him how to fish, you feed him for a lifetime." I go one step further and solve this task with an incentive-based method: "Teach him the taste of fish and make him hungry." (6/11)

1

5

44

Hyung Won Chung

@hwchung27

1 year

Try Code Interpreter! One use case for me is data visualization. This figure took me 3+ hours to manually plot with matplotlib. It was a pain. With Code Interpreter, I can probably get it done in 10 min. But this is just a simple use case. I am excited to see how people

OpenAI

@OpenAI

1 year

Code Interpreter will be available to all ChatGPT Plus users over the next week. It lets ChatGPT run code, optionally with access to files you've uploaded. You can ask ChatGPT to analyze data, create charts, edit files, perform math, etc. Plus users can opt in via settings.

727

3K

16K

2

4

37

Hyung Won Chung

@hwchung27

19 days

I will be answering questions about o1 between 10-11am PT!

OpenAI Developers

@OpenAIDevs

19 days

We’re hosting an AMA for developers from 10–11 AM PT today. Reply to this thread with any questions and the OpenAI o1 team will answer as many as they can.

480

133

1K

1

38

Hyung Won Chung

@hwchung27

6 months

Congrats to @YiTayML and the reka team on this launch! In the tech report i see this huge spike in the loss curve. Hope you did not lose much sleep when that happened @YiTayML

Yi Tay

@YiTayML

6 months

Our @RekaAILabs Tech Report / Paper is out! 🔥 Tech reports with completely no information are kinda boring so we’re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. 😃 We tried

10

56

416

3

38

Hyung Won Chung

@hwchung27

2 years

Had to rely on machine translation since I took only one semester of Japanese but this is a great and detailed summary of our work!

小猫遊りょう（たかにゃし・りょう）

@jaguring1

2 years

グーグルのAIが凄まじいことに（Flan-T5、Flan-PaLM、Flan-U-PaLM）。汎用言語モデルを1836タスクで微調整（instruction finetuning）。タスク数とモデルサイズの増加で性能が向上し続ける。数学, 物理, 法学、歴史など57ジャンルの4択問題タスク「MMLU」で75.2%（平均的な人間の評価者は34.5%）

1

192

733

1

11

36

Hyung Won Chung

@hwchung27

12 days

RIP oai closet gym 🏋️

Jason Wei

@_jasonwei

12 days

New talk from @hwchung27 about how to think "meta-level" in AI research. I have been impressed by Hyung Won's ability to identify new paradigms and totally give up any sunk cost. In late 2022 he realized the power of RL and has been preaching it ever since A fun story: when

3

29

271

0

36

Hyung Won Chung

@hwchung27

2 years

@YiTayML Flan-UL2 is trained with prefix LM objective much more than the Flan-T5. The benefit might not be well-captured by the academic benchmarks (they don't require long-form generation) but the "model usability" of Flan-UL2 will probably be better

1

2

36

Hyung Won Chung

@hwchung27

10 months

I am going to #NeurIPS2023 next week! See you there

0

32

Hyung Won Chung

@hwchung27

1 year

@zhansheng It doesn't help for all cases. I have seen a few cases where this actually hurts mildly. Here is my (very unscientific) intuition. Not-resetting the states is good if you are finetuning on a task that is "similar" to pretraining. For example, SuperGLUE tasks have at least

0

32

Hyung Won Chung

@hwchung27

10 months

When the petition started, the google doc exploded due to traffic. I felt pretty anxious not being able to sign. Being alone without peers in Korea certainly did not help. I’d say it was more of FOMO than "peer pressure" for me.

roon

@tszzl

10 months

not to longpoast, and I can only speak for myself, but this is a very inaccurate representation of the mood from an employee perspective - “employees felt pressured” -> at some point hundreds of us were in a backyard learning about the petition. people were so upset at the

77

94

2K

0

30

Hyung Won Chung

@hwchung27

12 days

If you try to solve tens of tasks with minimal effort possible, then pattern-recognizing each task separately might be easiest If you try to solve trillions of tasks, it might be easier to solve them by learning generalizable skills, e.g. language, reasoning, etc. (5/11)

1

2

33

Hyung Won Chung

@hwchung27

1 year

I believe that 1. the energy to desire is finite 2. the more you desire something, the higher the chance of achieving it Corollary: ruthlessly reduce the number of desires in order to increase the chances of achieving what truly matters to you.

1

3

31

Hyung Won Chung

@hwchung27

1 year

Congrats @YiTayML and the team!! Building such system from scratch in 6 months is an incredible feat. This makes me think about my past 6 months🥹

Yi Tay

@YiTayML

1 year

It’s been a short 6 months since I left Google Brain and it has been a uniquely challenging yet interesting experience to build everything from the ground up in an entirely new environment (e.g., the wilderness) Today, we’re excited to announce the first version of the

84

140

1K

1

2

30

Hyung Won Chung

@hwchung27

12 days

You might think that it takes too long to teach via the incentive instead of direct teaching. That is true for humans, but for machines, we can give more compute to shorten the time. In fact, I'd say this "slower" method allows us to put in more compute. (8/11)

3

1

32

Hyung Won Chung

@hwchung27

6 months

Leverage dilemma: if you are truly leveraged, you benefit greatly even if you don't work hard. But if you do work hard, the additional benefit will be so significant that it is too costly not to work hard

0

1

30

Hyung Won Chung

@hwchung27

6 months

Really excited about the expansion to Asia!

OpenAI

@OpenAI

6 months

Introducing OpenAI Japan, our first office in Asia, along with a new GPT-4 custom model specifically optimized for 日本語 (the Japanese language).

484

2K

9K

2

3

29

Hyung Won Chung

@hwchung27

3 months

This blog explains pretraining objectives and Transformer architectures. Studying these old ideas tells us long term consequences of research decisions. I believe such lesson is more important than knowing a lot of recent advances whose long term consequences we don't know yet

Yi Tay

@YiTayML

3 months

Decided to start a new blog series about model architectures in the era of LLMs. 😀 Here's part 1 on broader architectures like Transformer Encoders/Encoder-Decoders, PrefixLM and denoising objectives. 😄 A frequently asked question: "The people who worked on language and NLP

5

125

690

0

6

30

Hyung Won Chung

@hwchung27

1 year

Received an overwheling number of DMs and emails. So the processing has been slow. We are going to read all of them today and this weekend. Thanks for your interest.

Hyung Won Chung

@hwchung27

1 year

We are hiring in the ChatGPT team! Happy to chat about this position. DMs are open. Instead of your papers, I’d love to learn about the most difficult technical problem you worked on and your lessons. It doesn’t have to be ML. I value exceptional technical skill a lot more than

24

73

610

2

0

28