Hyung Won Chung Profile
Hyung Won Chung

@hwchung27

23,818
Followers
261
Following
41
Media
500
Statuses

Research Scientist @OpenAI . Past: @Google Brain / PhD @MIT

Mountain View, CA
Joined February 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@hwchung27
Hyung Won Chung
4 months
I gave a lecture at @Stanford CS 25. Lecture video: AI is moving so fast that it's hard to keep up. Instead of spending all our energy catching up with the latest development, we should study the change itself. First step is to identify and understand
Tweet media one
29
230
1K
@hwchung27
Hyung Won Chung
1 year
I gave a talk at Seoul National University. I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field. Video: Slides: Trying to summarize the field forced me to think
Tweet media one
42
618
3K
@hwchung27
Hyung Won Chung
2 years
Excited to share that I joined @OpenAI after 3 incredible years at Google Brain! Can’t wait to work on #ChatGPT and help drive the future of AI.
36
44
1K
@hwchung27
Hyung Won Chung
12 days
Here is my talk at @MIT (after some delay😅) I made this talk last year when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm. It's a good time to zoom out for high level thinking. (1/11)
12
148
906
@hwchung27
Hyung Won Chung
11 months
OpenAI is nothing without its people
13
46
764
@hwchung27
Hyung Won Chung
1 year
I gave an invited lecture on Instruction finetuning and RLHF for @hhexiy 's class at NYU. One unique perspective of my lecture is that I introduce RLHF as an instance of using a learned objective function. Video: Slides:
Tweet media one
9
156
678
@hwchung27
Hyung Won Chung
1 year
We are hiring in the ChatGPT team! Happy to chat about this position. DMs are open. Instead of your papers, I’d love to learn about the most difficult technical problem you worked on and your lessons. It doesn’t have to be ML. I value exceptional technical skill a lot more than
@barret_zoph
Barret Zoph
1 year
Our team at OpenAI is hiring! We're looking for engineers/researchers who do rigorous and thoughtful work understanding and evaluating LLMs like ChatGPT. If you're interested, please apply online and DM me with work that you've done!
44
106
749
24
73
610
@hwchung27
Hyung Won Chung
1 year
An interesting confounding factor in comparing these models is that training details really matter. For Flan-T5, resetting the Adafactor optimizer states during instruction finetuning was the biggest factor. It increased the MMLU almost double digit from 43 to 52. This was
@YiTayML
Yi Tay
1 year
Hot take 🔥: Lots of buzz these days about new foundation open-source models but what if I told you there have been no real advance since 2019's T5 models 😀 Take a look at this table from this new InstructEval paper: . Some thoughts/observations: 1.
Tweet media one
49
207
1K
13
66
474
@hwchung27
Hyung Won Chung
11 months
Many visionaries talk about future. But talking to @sama is another level. It feels like he is already in 2030 and talks back at me. Then thinking about future becomes an "interpolation" between where I am and where he is, as opposed to extrapolation into the wild Can't think
17
15
399
@hwchung27
Hyung Won Chung
1 year
Research code that doesn’t make readers feel dumb is great. Too often, code is written to showcase the author's advanced knowledge of the language/framework, which overwhelms the reader. Researchers come to the code with other thoughts/hypotheses in mind. The mental bandwidth is
Tweet media one
9
33
356
@hwchung27
Hyung Won Chung
2 years
New paper + models! We extend instruction finetuning by 1. scaling to 540B model 2. scaling to 1.8K finetuning tasks 3. finetuning on chain-of-thought (CoT) data With these, our Flan-PaLM model achieves a new SoTA of 75.2% on MMLU.
Tweet media one
@quocleix
Quoc Le
2 years
New open-source language model from Google AI: Flan-T5 🍮 Flan-T5 is instruction-finetuned on 1,800+ language tasks, leading to dramatically improved prompting and multi-step reasoning abilities. Public models: Paper:
Tweet media one
40
489
2K
9
64
350
@hwchung27
Hyung Won Chung
5 months
End-to-end eventually wins. And when it does, it is elegant. It feels like cleaning up the ML-debt we have put in.
Tweet media one
10
16
321
@hwchung27
Hyung Won Chung
1 year
I've been experimenting with Test Driven Development with GPT-4. I first write test cases to formalize the desired behavior, then ask GPT-4 to write a function and suggest additional tests if needed. I've found this method more efficient than writing the function first and then
25
37
318
@hwchung27
Hyung Won Chung
8 months
For the first time, OpenAI nailed the model naming. We are getting better
15
6
303
@hwchung27
Hyung Won Chung
9 months
2020 at Google Michelle (manager): Do you want to be a mentor for an incoming resident? Me: Hmm not sure if I am qualified Michelle: Yes you are Me: Ok I will try A month later mentee: Hi I am Jason Wei. I just joined 2023 me: "This year I am especially thankful for Michelle"
@_jasonwei
Jason Wei
9 months
This year I am especially thankful for @hwchung27 , who has been my closest collaborator for more than a year now. I have many good things to say about Hyung Won, but to me his most salient trait is original thinking. I would describe his thinking style as highly logical, based
Tweet media one
10
9
338
4
9
297
@hwchung27
Hyung Won Chung
1 year
Random life hack to fall asleep quickly. Pick a recursion or backtracking problem that is non-trivial. Run through test cases in your head. This will quickly max out your working memory. Your brain will beg to be shut off. And you will be in sleep. TLDR; make your brain OOM
13
19
270
@hwchung27
Hyung Won Chung
1 year
I see many people self-impose imaginary rules that hold them back from achieving more. When I first started working on deep learning, I admired Noam’s work on scaling and wanted his advice. But I imposed an imaginary rule: “I have to be good enough not to waste his time”. So I
6
18
267
@hwchung27
Hyung Won Chung
1 year
Machine unlearning is important but human unlearning is equally so, especially for LLM researchers. Without a strong theoretical framework to guide us, LLM researchers heavily rely on intuitions formed from empirical observations. The emergent abilities of LLMs, however, mean
9
35
250
@hwchung27
Hyung Won Chung
1 year
Many find this crazy but I use a single screen workflow. WHY? Fingers🖐️ are faster than head/eyes 👀. You move your head and/or eyes to switch between monitors. With keyboard shortcuts + multiple desktops, I always stare at the same thing and only my fingers move. Much faster!
30
16
244
@hwchung27
Hyung Won Chung
11 months
🤍
@sama
Sam Altman
11 months
i love the openai team so much
5K
4K
72K
3
8
221
@hwchung27
Hyung Won Chung
1 year
A counterintuitive implication of scale: trying to solve a more general version of the problem is an easier way to solve the original problem than directly tackling it. Attempting a more general problem encourages you to come up with a more general and simpler approach. This
11
25
222
@hwchung27
Hyung Won Chung
1 year
Not having a strong ego is pretty useful. - I don't fear becoming a beginner again. - In fact, I like being below-average in the room as my rate of learning is likely above-average. - I am fine with working on ideas that I didn't come up with. I just want to work on the most
7
24
212
@hwchung27
Hyung Won Chung
1 year
@tszzl I don’t think this is specific to ai. People have tendency to underestimate the changes in the future despite having witnessed substantial changes in the past
10
8
199
@hwchung27
Hyung Won Chung
1 year
Happy to release: 1. upgraded mT5 checkpoints: 2. refreshed mC4, a multilingual pre-training dataset: The new mC4 covers CommonCrawls in 101 languages up to Aug. 2022 3. And a new ICLR paper:
8
33
193
@hwchung27
Hyung Won Chung
20 days
A Korean character is formed by combining consonants and vowels in various ways. So one way to corrupt a character is to add an unnecessary consonant (e.g. ㅅ). The resulting combination is so unnatural to Koreans that they can automatically undo this change. This is
@OpenAI
OpenAI
20 days
OpenAI o1 translates a corrupted sentence.
33
189
1K
8
36
196
@hwchung27
Hyung Won Chung
2 years
The biggest surprise from working on the Flan project was how good Flan-T5-XXL was for its size. However this model was less accessible because it requires some knowledge of model parallelism. Happy to see tutorials like this, which makes XXL model more accessible!
@_philschmid
Philipp Schmid
2 years
🚨Attention #NLP enthusiasts! We just published a new blog post on how to fine-tune FLAN-T5-XXL using DeepSpeed & Hugging Face Transformers! 🚀 👉 We ran a series of experiments to help you choose the right hardware setup.🤖💻
7
84
444
2
19
191
@hwchung27
Hyung Won Chung
8 months
Last week marked 1 year at OpenAI. Reflecting back, I think the most unique aspect of OpenAI is the importance of mission, which seems to be less emphasized elsewhere. To be honest, I didn’t realize this either when I first joined. Now I believe mission is critical because: 1)
4
11
171
@hwchung27
Hyung Won Chung
10 months
For research, it is more important to deeply understand the basics and have the right perspective than to dive into fancy ideas. In this lecture, Jason shares how he thinks about language models. I find it so unique and insightful that I sneaked into Stanford to listen 🥷
@_jasonwei
Jason Wei
10 months
It was an honor to give a guest lecture yesterday at Stanford’s CS330 class, "Deep Multi-Task and Meta-Learning"! I discussed a few very simple intuitions for how I personally think about large language models. Slides: Here are the six intuitions: (1)
Tweet media one
22
281
2K
0
11
150
@hwchung27
Hyung Won Chung
7 months
Being brutally honest with oneself is difficult, especially when it requires facing harsh reality. Here is how I strive for self-honesty. I observe myself as if I were a ghost floating above. And in doing so, I replace the subject from “I” to “this monkey". For example Inner
7
8
149
@hwchung27
Hyung Won Chung
3 months
While we love to train big models more than anyone, OpenAI also knows how to train small models. Very well.
@sama
Sam Altman
3 months
towards intelligence too cheap to meter: 15 cents per million input tokens, 60 cents per million output tokens, MMLU of 82%, and fast. most importantly, we think people will really, really like using the new model.
509
1K
10K
2
6
147
@hwchung27
Hyung Won Chung
11 days
An extended version of the 3min video from last week! If you're interested in @OpenAI 's research but weren't sure how it feels to work here, this is the closest thing. It shares what researchers value (e.g. challenges involved in scaling), what they
1
20
141
@hwchung27
Hyung Won Chung
5 months
Recently the level of stress has been creeping up quite a bit. In general, I love scaling beyond measure but this is not the thing I want to scale. So I did some introspection. As I am working in a field that is advancing exponentially, the range of outcomes is getting larger
6
3
136
@hwchung27
Hyung Won Chung
1 year
When working on intellectually challenging problems, I often notice that I have subconsciously closed my eyes. It's as if my mental capacity is reaching its limit, and my brain is desperately freeing up cognitive resources by eliminating unrelated signals like visual stimuli.
6
7
121
@hwchung27
Hyung Won Chung
27 days
“Is what I am working on irrelevant?” has been one of the most useful questions for my career. Being extremely honest in answering that requires courage but it increases the chance of working on the right thing, which matters more than how good I am And I ask this very often
0
15
120
@hwchung27
Hyung Won Chung
3 years
Excited to present a new ICLR paper from Google Research and DeepMind: Our key contributions: - New insights on creating mode parameter-efficient and transferable models via embedding decoupling - RemBERT, which outperforms XLM-R and mT5-Large
Tweet media one
2
31
116
@hwchung27
Hyung Won Chung
10 days
Learning, if defined from first principles, shouldn't need to assume that a student is of a particular type (human, monkeys, machines). I believe that machines are now capable enough that the education for humans and machines is converging!
@MillionInt
Jerry Tworek
11 days
“Don’t teach, incentivize” is a great concept that applies to both machines and humans. Huge credit to @hwchung27 for being able convey a lot of wisdom in very few words.
5
7
90
5
19
119
@hwchung27
Hyung Won Chung
8 months
Working in the field, positive surprises are pretty rare. But this one surprised me. Wow. Having a hard time thinking about the implications of text-to-video when it improves 100x from this point 🤯
@OpenAI
OpenAI
8 months
Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy
10K
32K
138K
1
0
108
@hwchung27
Hyung Won Chung
4 months
One reason why encoder-decoder/decoder-only is still confusing is that not many people need to implement Transformers from scratch these days. There is a level of understanding that can only be achieved by struggling to implement something from scratch. I highly recommend this!
@YiTayML
Yi Tay
4 months
Wow this is a great technical lecture by @hwchung27 . 😄 Really glad someone finally dived deep into that encoder-decoder / decoder discussion! 😄 I think not many people understand the intricacies of this topic, and these days many people don't even know what "input" and
4
20
163
2
11
100
@hwchung27
Hyung Won Chung
1 year
Compute + data + transformer doesn’t automatically lead to a good model. It needs people who systematically suffered debugging these models at various scales. @YiTayML has suffered enough. So I expect some good models.
@YiTayML
Yi Tay
1 year
We’re coming out of stealth with $58M in funding to build generative models and advance AI research at @RekaAILabs 🔥🚀 Language models and their multimodal counterparts are already ubiquitous and massively impactful everywhere. That said, we are still at the beginning of this
Tweet media one
94
75
924
3
9
95
@hwchung27
Hyung Won Chung
1 year
In 2013, I took a class taught by Prof Strang. At the time he had been teaching at MIT for 52 years. He continued teaching for another 10 years. Yesterday he gave his final lecture He taught me how to like Linear Algebra and how invaluable teaching is.
3
4
95
@hwchung27
Hyung Won Chung
20 days
“how many r’s in strawberry?” I had to ask this to demo our new model o1-preview 😎 LLMs process text at a subword level. A question that requires understanding the notion of both character and word confuses them. OpenAI o1-preview "thinks harder" to avoid mistakes.
Tweet media one
@OpenAI
OpenAI
20 days
OpenAI o1 answers a famously tricky question for large language models.
57
189
2K
5
7
92
@hwchung27
Hyung Won Chung
1 year
Long term thinking is surprisingly rare even in places like Silicon Valley. One of the causes is that even if you are working towards long term impact, the day-to-day often feels incremental and mundane. I find it really useful to practice zooming out of the incremental progress
1
10
92
@hwchung27
Hyung Won Chung
6 months
People don’t like to repeat because they don’t feel like making progress. But repetition is necessary for deeper understanding. E.g. - Re-reading books - Repeating the thought process understanding a new concept Unfortunate side effect of over-reliance on quantitative metrics
2
10
91
@hwchung27
Hyung Won Chung
4 months
Jason walked into the classroom without anything (no laptop, no notes) and gave a lecture out of memory. I felt so glad that I refused to also give a blackboard lecture.
@_jasonwei
Jason Wei
4 months
As a kid I loved whiteboard lectures way more than slides, so for Stanford’s CS25 class I gave a whiteboard lecture! My goal was to simply and clearly explain why language models work so well, purely via intuitions. Youtube video: (w/ @hwchung27 )
7
97
658
0
0
88
@hwchung27
Hyung Won Chung
7 months
“[9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson” This has it all. Think about AGI, drop the “scientist ego” and seek divine benevolence. This is AI research at its core
@_jasonwei
Jason Wei
7 months
My typical day as a Member of Technical Staff at OpenAI: [9:00am] Wake up [9:30am] Commute to Mission SF via Waymo. Grab avocado toast from Tartine [9:45 am] Recite OpenAI charter. Pray to optimization Gods. Learn the Bitter Lesson [10:00am] Meetings (Google Meet). Discuss how to
152
308
4K
1
3
86
@hwchung27
Hyung Won Chung
1 year
Finally found time to read this blog post. For researchers, fellow researchers are like customers. Learning that my research affected other researcher in such a positive way is the best customer feedback. This made my day!
Tweet media one
@Francis_YAO_
Yao Fu
1 year
New blog post! ✒️ June 2023, A Stage Review of Instruction Tuning
20
126
510
3
4
83
@hwchung27
Hyung Won Chung
1 year
A lot of AI research has shifted from “building” models to “using” models. Creativity and curiosity play much bigger roles in this new era. Not sure about creativity but you can complement curiosity to some extent. Think what your curious friend would have done in a given
0
18
83
@hwchung27
Hyung Won Chung
4 months
A saturated benchmark gives a false impression that the underlying progress is slowing down. Benchmarks are proxy for what we care about, which are often hard to measure. When they are saturated, they are useless and even misleading.
4
7
83
@hwchung27
Hyung Won Chung
21 days
A good model satisfies users’ prompts. A great model changes the types of prompts by expanding what is possible. Benchmarks like LMSYS provide good insight but they can't measure the latter. We should at least be aware of it. Otherwise, we incentivize incremental progress
2
10
82
@hwchung27
Hyung Won Chung
1 year
I am very excited that the MedPaLM paper is now published in @Nature It is a great way to invite the broader scientific community to LLMs. I feel like LLMs are more adopted in the general public than in the scientific community. There are just so many
5
5
77
@hwchung27
Hyung Won Chung
2 years
Hey Bard…? 🤔
Tweet media one
5
9
77
@hwchung27
Hyung Won Chung
28 days
I'd like to clarify a few points on this slide from my previous talk to avoid potential confusion () 1) As cited in the slides, this function is adapted from Noam's multiquery paper, which I highly recommend. This is the best resource to learn about
Tweet media one
@cosminnegruseri
Cosmin Negruseri
29 days
love this compact multi-head attention code from @hwchung27
Tweet media one
17
27
436
0
10
79
@hwchung27
Hyung Won Chung
1 year
In an empirical research field such as deep learning, willingness to discard one’s own hard work is crucial. Try out a bunch of approaches, ruthlessly prioritize and trim less promising directions. But in practice it is hard every time. Good bye my dear code 😞
1
5
78
@hwchung27
Hyung Won Chung
7 months
I strive not to be too organized because doing so misses a lot of deep lessons that tend to compound in the long run. I sometimes work on things that don’t generate output for some time. From a highly organized person’s perspective, I am not being “productive” and this is a
2
5
74
@hwchung27
Hyung Won Chung
1 year
Compression begets clarity. - Kill 90% of Chrome tabs at the end of each day. What do you leave open? - Summarize the entire field of LLMs into a 50-min talk. - What is one foundational principle behind every major AI breakthroughs? - If you could recommend only one book, what
4
14
71
@hwchung27
Hyung Won Chung
1 year
Just like some books have an audiobook version, I'd love to see an LLM version of a book. Books represent a unidirectional mode of knowledge transfer. With "LLMs for books"—maybe achieved through fine-tuning or in-context learning—the knowledge transfer could become
4
8
65
@hwchung27
Hyung Won Chung
5 months
I'm getting used to AI surpassing me in more areas, much like how I trust Google Maps over me. Even two years ago, it was so easy to look at the model generation and grade it myself. Now it is quite difficult for some domains (e.g. GPQA eval). Such a humbling experience.
0
2
68
@hwchung27
Hyung Won Chung
19 days
Such an honor to have an opportunity to work together and learn from these researchers! This video doesn't show all the great people who worked on this project. Please check out
@OpenAI
OpenAI
19 days
Some of our researchers behind OpenAI o1 🍓
228
843
7K
3
7
67
@hwchung27
Hyung Won Chung
11 months
Just before thanksgiving!
@OpenAI
OpenAI
11 months
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
6K
13K
66K
2
0
67
@hwchung27
Hyung Won Chung
6 months
Flan2 paper is now on JMLR, 1.5 years after the initial arXiv release. It already feels quite dated, reflecting how fast the field is moving. That said Flan-T5 series is still going strong, with an astonishing 52M cumulative downloads 🤯 How are people using these models?
5
5
66
@hwchung27
Hyung Won Chung
10 months
I will be on the panel for the Instruction following workshop at #NeurIPS2023 . Of course I will interpret everything from scaling perspective 😎 Today 10:45-11:30am, Room 220-222
0
12
58
@hwchung27
Hyung Won Chung
12 days
Unfortunately, this analogy also referred to my childhood self, who did not enjoy math, no matter the reward 😂
@YiTayML
Yi Tay
12 days
Great talk by @hwchung27 . I really like the interesting analogies here (he's great at that!). My favourite one is "no amount of bananas will incentivize monkeys to do mathematical reasoning" 🤣
1
12
72
3
2
56
@hwchung27
Hyung Won Chung
7 months
Many people learn about the tools just enough to get the job done. I prefer to dive deeper; understanding my tools in detail makes my work much more fun. Not sure if it's good or bad. Just more fun! Perhaps that’s what truly matters in the end.
5
3
56
@hwchung27
Hyung Won Chung
12 days
I titled the talk “Don’t teach. Incentivize”. We can’t enumerate every single skill we want from an AGI system because there are just too many of them. In my view, the only feasible way is to incentivize the model such that general skills emerge. (3/11)
1
7
55
@hwchung27
Hyung Won Chung
1 year
Additional benefits of pair programming: 1. When I think deeply about a problem, in my head I make logical jumps and stitch thoughts together in an incoherent manner. I am very generous to myself when it comes to such logical flaws. Often the implication of this is uncovered
@_jasonwei
Jason Wei
1 year
Pair programming isn’t standard at most companies and basically non-existent in academia, but I’ve been doing it with @hwchung27 for almost a year now. While it naively seems slower to code individually, I’ve realized that there are many benefits: (1) In AI, what you work on can
37
118
984
0
4
53
@hwchung27
Hyung Won Chung
1 year
Many people fear reading because if they fail to understand what they are reading, it doesn't feel good and can even hurt their ego. If this unpleasant experience happens repeatedly, they avoid reading, as it becomes associated with negative rewards. Take "Googling" as an
5
4
52
@hwchung27
Hyung Won Chung
1 year
Here is how AI can revolutionize education. 1. AI estimates the capability of a student (human, AI, etc) 2. It consistently provides materials say 0.1% beyond current capability. Consistency is the key; learning exponentially compounds. 3. Scale to all Can’t fathom the impact
5
4
51
@hwchung27
Hyung Won Chung
2 years
@achowdhery and I are at the Google booth (Hall G) talking about PaLM and Flan! #NeurIPS2022
Tweet media one
1
2
50
@hwchung27
Hyung Won Chung
10 months
Board hiring process for transparency
Tweet media one
@_jasonwei
Jason Wei
10 months
Today I am pleased to announce the new board of directors for my relationship. The new board of directors will be: 1. My mom 2. My girlfriend’s sister 3. @hwchung27 , who I pair program with frequently 4. Bret Taylor (we’ve only met once, but every board should have Bret Taylor)
41
24
1K
1
0
49
@hwchung27
Hyung Won Chung
5 months
I intentionally did not fix the broken copilot for a few days because that makes me more grateful for what I take it for granted. Remove what I use all the time and later when it put it back I realize how great that thing has been. @_jasonwei i invite you to drop cursor and use
@_jasonwei
Jason Wei
5 months
One time I was pair programming with @hwchung27 , and his github co-pilot extension was broken so he was manually typing every word. What an awful experience, it was like watching my granddad typing on apple notes on his iphone 7 Nice reminder for how quickly we have used AI to
5
8
143
2
3
49
@hwchung27
Hyung Won Chung
2 years
A few times I found myself questioning my own judgment when disagreeing with GPT-4. This reminded me of Google Maps; I began to trust its guidance more than my instincts once it crossed a certain threshold. GPT-4 is posed to usher in significant shifts in our perception of AI
@OpenAI
OpenAI
2 years
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:
2K
17K
63K
1
6
48
@hwchung27
Hyung Won Chung
2 years
One of the most important aspects of Flan is its generality. This paper extends that further; instruction finetuning benefits "single-task" finetuning as well. You can further finetune Flan-T5 on your custom tasks and that is likely better than finetuning T5!
@ShayneRedford
Shayne Longpre
2 years
✨New Paper✨What’s the best completely public competitor to #ChatGPT ? Flan-T5 beats all public models we tested: Flan-T5 3B ▶️ T0++ 3B ▶️ OPT-IML 175B ▶️ GLM-130B ▶️ Flan 2021 3B ▶️ NIv2 3B We release the @GoogleAI 🌟Flan Collection🌟data + methods for Instruction Tuning! 1/
Tweet media one
Tweet media two
24
249
1K
1
8
48
@hwchung27
Hyung Won Chung
1 year
With macOS, I use multiple desktops each with a shortcut - option-1 to get to desktop 1 for project 1 - option-5 to get email/calendar I also use @apptivateapp to set - option-t for iTerm (Vim ftw!) - option-s for Slack - option-c for Chrome
4
1
48
@hwchung27
Hyung Won Chung
8 months
As the field matures, it becomes rarer to build something from scratch. So difficulties associated with such endeavor is often overlooked. Huge congrats for @YiTayML and the team for achieving this milestone so quickly!
@YiTayML
Yi Tay
8 months
We are excited to share Reka Flash ✨, a new state-of-the-art 21B multimodal model that rivals Gemini Pro and GPT 3.5 on key language & vision benchmarks 📈. We've trained this model from scratch and ground zero with a small (but amazingly capable team 🧙‍♂️) and relatively finite
52
72
571
1
1
46
@hwchung27
Hyung Won Chung
1 year
Manually examining the data and model output is a great way to deeply understand the problem. It is like lubricating the brain. It reduces the friction in thinking within the domain; I can think faster and make deeper reasoning steps. This could mean the difference between
@_jasonwei
Jason Wei
1 year
One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the
47
207
2K
0
3
47
@hwchung27
Hyung Won Chung
12 days
In Dragon Ball, there is “Room of spirit and time”. You train one year inside the room and it is only a day outside. The multiplier is 365. For machines it is a lot higher. So a strong generalist with more compute is often better at special domains than specialists. (10/11)
2
3
48
@hwchung27
Hyung Won Chung
12 days
I hope this lecture sparks interest in high level thinking, which will be useful in building better perspectives. This in turn will lead to finding more impactful problems to solve. Thanks @hjterrysuh and MIT EI Seminar for hosting me! (11/11)
3
0
48
@hwchung27
Hyung Won Chung
7 months
Great to see such detailed descriptions of challenges training large models from scratch. Such knowledge is extremely valuable and scarce. Hope more people share their unique experience!
@YiTayML
Yi Tay
7 months
Long overdue but here's a new blogpost on training LLMs in the wilderness from the ground up 😄🧐 In this blog post, I discuss: 1. Experiences in procuring compute & variance in different compute providers. Our biggest finding/surprise is that variance is super high and it's
Tweet media one
44
254
2K
2
4
45
@hwchung27
Hyung Won Chung
4 months
Thanks, this totally justifies my struggle getting the optimal amount of squiggles 😅 I am obviously biased but pretty much all learning-based AI should be understood with this plot as the unified perspective!
@MillionInt
Jerry Tworek
4 months
I think plot from @hwchung27 and @_jasonwei if understood is worth more than any other education in AI
Tweet media one
4
6
99
2
4
46
@hwchung27
Hyung Won Chung
3 months
I really liked that this podcast is unfiltered. Feels raw but when a lot of videos on internet are highly polished, this raw feeling stands out to me. As always Yi is very transparent about sharing his experience, which is really helpful for those who want to learn about AI
@YiTayML
Yi Tay
3 months
Recently, I went on my first podcast hosted by @swyx . 😄 It was a fun unfiltered 2 hour long conversation. Could have gone on longer but we got chased out of the studio.. 😅 Talked about a lot of stuff, i.e., reminiscing old stuff at @Google and newer stuff at @RekaAILabs .
6
27
189
1
5
42
@hwchung27
Hyung Won Chung
12 days
An analogy I used is extending the old saying: "Give a man a fish, you feed him for a day. Teach him how to fish, you feed him for a lifetime." I go one step further and solve this task with an incentive-based method: "Teach him the taste of fish and make him hungry." (6/11)
1
5
44
@hwchung27
Hyung Won Chung
1 year
Try Code Interpreter! One use case for me is data visualization. This figure took me 3+ hours to manually plot with matplotlib. It was a pain. With Code Interpreter, I can probably get it done in 10 min. But this is just a simple use case. I am excited to see how people
Tweet media one
@OpenAI
OpenAI
1 year
Code Interpreter will be available to all ChatGPT Plus users over the next week. It lets ChatGPT run code, optionally with access to files you've uploaded. You can ask ChatGPT to analyze data, create charts, edit files, perform math, etc. Plus users can opt in via settings.
Tweet media one
727
3K
16K
2
4
37
@hwchung27
Hyung Won Chung
19 days
I will be answering questions about o1 between 10-11am PT!
@OpenAIDevs
OpenAI Developers
19 days
We’re hosting an AMA for developers from 10–11 AM PT today. Reply to this thread with any questions and the OpenAI o1 team will answer as many as they can.
480
133
1K
1
1
38
@hwchung27
Hyung Won Chung
6 months
Congrats to @YiTayML and the reka team on this launch! In the tech report i see this huge spike in the loss curve. Hope you did not lose much sleep when that happened @YiTayML
Tweet media one
@YiTayML
Yi Tay
6 months
Our @RekaAILabs Tech Report / Paper is out! 🔥 Tech reports with completely no information are kinda boring so we’re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. 😃 We tried
Tweet media one
10
56
416
3
3
38
@hwchung27
Hyung Won Chung
2 years
Had to rely on machine translation since I took only one semester of Japanese but this is a great and detailed summary of our work!
@jaguring1
小猫遊りょう(たかにゃし・りょう)
2 years
グーグルのAIが凄まじいことに(Flan-T5、Flan-PaLM、Flan-U-PaLM)。汎用言語モデルを1836タスクで微調整(instruction finetuning)。タスク数とモデルサイズの増加で性能が向上し続ける。数学, 物理, 法学、歴史など57ジャンルの4択問題タスク「MMLU」で75.2%(平均的な人間の評価者は34.5%)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
192
733
1
11
36
@hwchung27
Hyung Won Chung
12 days
RIP oai closet gym 🏋️
@_jasonwei
Jason Wei
12 days
New talk from @hwchung27 about how to think "meta-level" in AI research. I have been impressed by Hyung Won's ability to identify new paradigms and totally give up any sunk cost. In late 2022 he realized the power of RL and has been preaching it ever since A fun story: when
3
29
271
0
0
36
@hwchung27
Hyung Won Chung
2 years
@YiTayML Flan-UL2 is trained with prefix LM objective much more than the Flan-T5. The benefit might not be well-captured by the academic benchmarks (they don't require long-form generation) but the "model usability" of Flan-UL2 will probably be better
1
2
36
@hwchung27
Hyung Won Chung
10 months
I am going to #NeurIPS2023 next week! See you there
0
0
32
@hwchung27
Hyung Won Chung
1 year
@zhansheng It doesn't help for all cases. I have seen a few cases where this actually hurts mildly. Here is my (very unscientific) intuition. Not-resetting the states is good if you are finetuning on a task that is "similar" to pretraining. For example, SuperGLUE tasks have at least
0
0
32
@hwchung27
Hyung Won Chung
10 months
When the petition started, the google doc exploded due to traffic. I felt pretty anxious not being able to sign. Being alone without peers in Korea certainly did not help. I’d say it was more of FOMO than "peer pressure" for me.
@tszzl
roon
10 months
not to longpoast, and I can only speak for myself, but this is a very inaccurate representation of the mood from an employee perspective - “employees felt pressured” -> at some point hundreds of us were in a backyard learning about the petition. people were so upset at the
77
94
2K
0
0
30
@hwchung27
Hyung Won Chung
12 days
If you try to solve tens of tasks with minimal effort possible, then pattern-recognizing each task separately might be easiest If you try to solve trillions of tasks, it might be easier to solve them by learning generalizable skills, e.g. language, reasoning, etc. (5/11)
1
2
33
@hwchung27
Hyung Won Chung
1 year
I believe that 1. the energy to desire is finite 2. the more you desire something, the higher the chance of achieving it Corollary: ruthlessly reduce the number of desires in order to increase the chances of achieving what truly matters to you.
1
3
31
@hwchung27
Hyung Won Chung
1 year
Congrats @YiTayML and the team!! Building such system from scratch in 6 months is an incredible feat. This makes me think about my past 6 months🥹
@YiTayML
Yi Tay
1 year
It’s been a short 6 months since I left Google Brain and it has been a uniquely challenging yet interesting experience to build everything from the ground up in an entirely new environment (e.g., the wilderness) Today, we’re excited to announce the first version of the
84
140
1K
1
2
30
@hwchung27
Hyung Won Chung
12 days
You might think that it takes too long to teach via the incentive instead of direct teaching. That is true for humans, but for machines, we can give more compute to shorten the time. In fact, I'd say this "slower" method allows us to put in more compute. (8/11)
Tweet media one
3
1
32
@hwchung27
Hyung Won Chung
6 months
Leverage dilemma: if you are truly leveraged, you benefit greatly even if you don't work hard. But if you do work hard, the additional benefit will be so significant that it is too costly not to work hard
0
1
30
@hwchung27
Hyung Won Chung
6 months
Really excited about the expansion to Asia!
@OpenAI
OpenAI
6 months
Introducing OpenAI Japan, our first office in Asia, along with a new GPT-4 custom model specifically optimized for 日本語 (the Japanese language).
484
2K
9K
2
3
29
@hwchung27
Hyung Won Chung
3 months
This blog explains pretraining objectives and Transformer architectures. Studying these old ideas tells us long term consequences of research decisions. I believe such lesson is more important than knowing a lot of recent advances whose long term consequences we don't know yet
@YiTayML
Yi Tay
3 months
Decided to start a new blog series about model architectures in the era of LLMs. 😀 Here's part 1 on broader architectures like Transformer Encoders/Encoder-Decoders, PrefixLM and denoising objectives. 😄 A frequently asked question: "The people who worked on language and NLP
Tweet media one
5
125
690
0
6
30
@hwchung27
Hyung Won Chung
1 year
Received an overwheling number of DMs and emails. So the processing has been slow. We are going to read all of them today and this weekend. Thanks for your interest.
@hwchung27
Hyung Won Chung
1 year
We are hiring in the ChatGPT team! Happy to chat about this position. DMs are open. Instead of your papers, I’d love to learn about the most difficult technical problem you worked on and your lessons. It doesn’t have to be ML. I value exceptional technical skill a lot more than
24
73
610
2
0
28