Yu Bai @yubai01 Twitter profile | Pikagi

Pikagi

Yu Bai

@yubai01

3,529

Followers

1,623

Following

17

Media

221

Statuses

Researcher @OpenAI . Previously @SFResearch , PhD @Stanford .

San Francisco, CA

https://t.co/ZVxu0t5WD1

Joined November 2010

Don't wanna be here? Send us removal request.

Pinned Tweet

@yubai01

Yu Bai

4 months

📢 Life update: In the vibes of the announcement today I am thrilled to share that I have joined @OpenAI as a researcher! It's just my week 2 here but already amazed by so many things the team has achieved. Looking forward to learning more and contributing!

@LiamFedus

William Fedus

4 months

GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.

Tweet media one

191

901

5K

51

10

357

Last Seen Profiles

@Chain_Smookers

@underrworId

@anaksd748634

@cd32451

@Para_Bellum666

@Briologia

@ibubohay2

@revolukozar

@elpptuki

@katu1801

@loui_Crypto

@vnlazxr

@ShiningDong

@Princess_Sydd15

@BinorRaja

@JordanDees5

@CWRBLX

@AndriWeidmann

@coachrdodge

@PowerSurgeFGC

@Gaziantepevli0

@turkserhan

@TakeTheNamePA

@charlet_charlet

@alwlyd_mwss

@fireemblazem

@wolfpupxx

@moonsun_inc

@mdowaliby

@DavidJames80901

@SamiEsenol

@RT_subby

@sheldonecox

@clairekomeiji

@jimphil16149157

@FrancesAnderton

@yubai01

Yu Bai

1 year

Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️

Tweet media one

3

51

227

@yubai01

Yu Bai

3 years

🚨 New blog post on Deep Learning Theory Beyond NTKs: Salesforce research blog: offconvex: An exposition of "escaping the NTK ball with stronger learning guarantees". Joint w/ @jasondeanlee @MinshuoC

Tweet media one

3

39

178

@yubai01

Yu Bai

1 year

Excited to share that "Transformers as Statisticians" will appear at #NeurIPS2023 as an oral! We have two other posters on learning with attention models and RL theory (thread may follow):

Tweet card media

Efficient Reinforcement Learning with Impaired Observability:...

In real-world reinforcement learning (RL) systems, various forms of {\it impaired observability} can complicate matters. These situations arise when an agent is unable to observe the most recent...

@yubai01

Yu Bai

1 year

Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️

Tweet media one

3

51

227

2

13

143

@yubai01

Yu Bai

3 years

📜🆕New extended blog post on recent progresses in multi-agent RL theory (joint w/ Chi Jin): We talk about "How does RL theory become different when it's multi-agent", and present the various recent developments and opportunities therein.

Recent Progresses in Multi-Agent RL Theory

Reinforcement learning (RL) has made substantial empirical progresses in solving hard AI challenges in the past few years. A big portion of these progresses—Go, Dota 2, Starcraft, economic simulati...

2

16

120

@yubai01

Yu Bai

9 months

Flying to #NeurIPS2023 now. Looking forward to meeting old and new friends, and talking about everything LLM / RL! "Transformers as Statisticians" oral would be in the Wed afternoon session.

Tweet media one

1

6

85

@yubai01

Yu Bai

4 months

And as my journey at Salesforce Research @SFResearch comes to an end after 4.5 years, I can't help but feel so fortunate to have been a part of this amazing AI research team. Thanks @huan__wang @CaimingXiong @silviocinguetta @RichardSocher ++everyone for all the support ♥️

3

2

80

@yubai01

Yu Bai

4 years

How do deep networks perform hierarchical learning? We theoretically show that networks with wide intermediate representations can express functions hierarchically, and be more sample efficient than "shallow learners" such as the NTK.

Tweet card media

Towards Understanding Hierarchical Learning: Benefits of Neural...

Deep neural networks can empirically perform efficient hierarchical learning, in which the layers learn useful representations of the data. However, how they make use of the intermediate...

2

13

69

@yubai01

Yu Bai

2 years

#ICLR2022 We present CP-Gen, a modular approach for improving the efficiency (e.g. length, volume) with conformal prediction, by tuning prediction sets with more than one parameters. Paper: Poster (Monday 6:30pm PT):

Tweet media one

2

10

55

@yubai01

Yu Bai

1 year

We also used ReLU attention to study the expressive power of transformers. It matches softmax in our small (gpt2 scale) experiment in the first paper below. Nice work @hoonkp @skornblith and co to get ReLU transformers in action!

Tweet card media

What can a Single Attention Layer Learn? A Study Through the...

Attention layers -- which map a sequence of inputs to a sequence of outputs -- are core building blocks of the Transformer architecture which has achieved significant breakthroughs in modern...

@arankomatsuzaki

Aran Komatsuzaki

@arankomatsuzaki

1 year

Replacing softmax with ReLU in Vision Transformers ReLU-attention has better compute-performance scaling than softmax-attention on Vision Transformers

Tweet media one

7

69

382

1

7

51

@yubai01

Yu Bai

2 years

📜 Our paper on efficient learning in Extensive-Form Games will appear at #NeurIPS2022 as an Oral Presentation! 🔗 Paper: 📢 Poster: Joint work with @chijinML @WispyMay Ziang Song @tianchengyu14 🧵1/

Tweet media one

1

7

50

@yubai01

Yu Bai

9 months

CoT is a nice name! And excited that "Transformers as Statisticians" oral will be in this "CoT/Reasoning" session (Wed)--Being a statistician probably does mean that you're good at reasoning 😃

Tweet media one

@denny_zhou

Denny Zhou

9 months

In NeurIPS 2023, there is a section “CoT/Reasoning”. When preparing our CoT paper, I kicked off a discussion on the title. Different names were proposed, like stream of thought (Jason), train of thought (Dale), chain of thought (Dale). Finally I decided to choose “chain of

Tweet media one

8

12

108

0

5

44

@yubai01

Yu Bai

4 years

New preprint on offline RL: * A variance reduction algorithm for offline RL * Optimal horizon dependence: O(H^2/d_m) sample complexity on time-homogeneous MDPs Joint w/ Ming Yin ( @MingYin_0312 ) and Yu-Xiang Wang

2

6

40

@yubai01

Yu Bai

3 years

Check out our #ICML2021 paper---We theoretically analyze calibration, and show that over-confident prediction happens for well-specified logistic regression too, not just on large NNs! Paper: Poster: , Wed 9am PT 1/4

Tweet media one

2

4

38

@yubai01

Yu Bai

1 year

Congrats on the ICML best paper @GoogleDeepMind @misovalko @Tdash_Koz & team! Wraps a "trilogy" on learning Extensive-Form Games: * Our #ICML2022 paper, which first got O(X) (tight): * Their #NeurIPS2021 paper which got O(X^2):

Tweet card media

Model-Free Learning for Two-Player Zero-Sum Partially Observable...

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the...

@demishassabis

Demis Hassabis

1 year

Congrats to @GoogleDeepMind ’s Remi Munos, @misovalko , & team on the Outstanding Paper Award at @ICMLConf ! “Adapting to game trees in zero-sum imperfect information games” helps answer: how do you make the best move in a game w/ only partial info? Paper:

4

33

318

1

4

36

@yubai01

Yu Bai

1 year

I'll be presenting "Transformers as Statisticians" at the ES-FOMO workshop at #ICML2023 , at 1:00pm HT. See you there! Workshop website:

@yubai01

Yu Bai

1 year

Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️

Tweet media one

3

51

227

2

2

35

@yubai01

Yu Bai

5 years

Our paper on low switching cost RL () has been accepted at @NeurIPSConf 2019. We showed that efficient PAC exploration can be achieved by switching the policy only logarithmically many times. Congrats Tengyang, @nanjiang_cs , and Yu-Xiang!

Tweet card media

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is...

1

2

33

@yubai01

Yu Bai

2 years

Attending #NeurIPS2022 from Mon Evening -> Sat, and presenting 4 papers (1 oral + 3 posters) on multi-agent RL, games, and deep learning theory. I will also be at Salesforce's booth Tuesday afternoon, starting 2:45pm. Let me know if you want to chat!

Tweet media one

0

1

32

@yubai01

Yu Bai

4 years

The AI Economist: Using multi-agent RL to simulate complex economic systems, guide policy designs, and improve social equality. Impressive work by colleagues @StephanZheng @alexrtrott and all at @SFResearch !

@RichardSocher

Richard Socher

4 years

Excited to introduce the AI Economist: Extends ideas from Reinforcement Learning for tackling inequality through learned tax policy design. The framework optimizes productivity and equality. Blog: Paper: Q&A:

57

681

2K

0

7

30

@yubai01

Yu Bai

5 years

Can wide neural nets be systematically analyzed beyond the kernel / linearized regime? Our recent work shows that wide NNs can couple with higher-order (e.g. quadratic) submodels and genearlize better than the linearized ones! (joint w/ @jasondeanlee )

Tweet card media

Beyond Linearization: On Quadratic and Higher-Order Approximation...

Recent theoretical work has established connections between over-parametrized neural networks and linearized models governed by he Neural Tangent Kernels (NTKs). NTK theory leads to concrete...

0

2

27

@yubai01

Yu Bai

1 year

En route to #ICML2023 ✈️🌴. Let's chat about LLMs / in-context learning, (multi-agent) RL, and their theories. You can also find me at our posters and workshop papers:

Tweet media one

0

0

25

@yubai01

Yu Bai

2 months

GPT-4o mini is out!

@OpenAIDevs

OpenAI Developers

2 months

Introducing GPT-4o mini! It’s our most intelligent and affordable small model, available today in the API. GPT-4o mini is significantly smarter and cheaper than GPT-3.5 Turbo.

Tweet media one

163

618

3K

0

0

22

@yubai01

Yu Bai

5 years

Excited to be attending #NeurIPS2019 at Vancouver next week!

0

0

21

@yubai01

Yu Bai

1 year

Happy to be selected as an expert reviewer for @TmlrOrg ! Time to send in a submission for earning that expert certificate :)

@hugo_larochelle

Hugo Larochelle

@hugo_larochelle

1 year

We have just finalized our first selection of TMLR Expert Reviewers. These are reviewers who have done particularly exemplary work in evaluating TMLR submissions. See the following page for details and the list of reviewers:

2

23

186

1

0

21

@yubai01

Yu Bai

5 months

Check out NPO, a simple objective for LLM unlearning.

@Song__Mei

Song Mei

5 months

LLM unlearning was mostly based on variants of gradient ascent (GA), susceptible to catastrophic forgetting. We propose Negative Preference Optimization (NPO), demonstrating efficient unlearning on TOFU benchmark. w/ @RuiqiZhang0614 @ Licong Lin, @yubai01 .

4

21

105

1

2

20

@yubai01

Yu Bai

4 years

Excited to present our paper "Provable Self-Play Algorithms for Competitive Reinforcement Learning" at #ICML2020 ! Talk: Wednesday (July 15) 9am PT / 10pm PT Paper: Poster: Joint work with Chi Jin. 1/2

Tweet media one

1

1

20

@yubai01

Yu Bai

1 year

Flying to #ICML2023 tomorrow. Ping me if you'd like to chat!

0

0

19

@yubai01

Yu Bai

5 months

Exciting opportunity for working with Song on LLMs!

@Song__Mei

Song Mei

5 months

My group at Berkeley Stats and EECS has a postdoc opening in the theoretical (e.g., scaling laws, watermark) and empirical aspects (e.g., efficiency, safety, alignment) of LLMs or diffusion models. Send me an email with your CV if interested!

0

23

96

0

0

19

@yubai01

Yu Bai

2 years

Looking forward to this tomorrow! Thanks for organizing @CsabaSzepesvari @neu_rips @CiaraPikeBurke

@CsabaSzepesvari

Csaba Szepesvari

@CsabaSzepesvari

2 years

Thinking of scaling up multiagent RL to a large number of agents? Provably? Choose your equilibrium concept right and you may be rewarded! Yu Bai will tell us tomorrow how! For details see

Tweet media one

0

7

33

0

0

18

@yubai01

Yu Bai

4 years

Appearing at #NeurIPS2020 ! Come to our poster session at Tuesday 9-11am PT to have some fun with NTKs, shallow Taylorized models, and better sample complexity than all these via neural hierarchical learning. w/ @MinshuoC @jasondeanlee ++

Tweet media one

@yubai01

Yu Bai

4 years

How do deep networks perform hierarchical learning? We theoretically show that networks with wide intermediate representations can express functions hierarchically, and be more sample efficient than "shallow learners" such as the NTK.

2

13

69

1

3

17

@yubai01

Yu Bai

4 years

Come chat with us about our Beyond Linearization paper and more! ICLR poster session today 10am - 12pm and 1 - 3pm PDT: Paper:

Tweet card media

Beyond Linearization: On Quadratic and Higher-Order Approximation...

Wide neural networks can escape the NTK regime and couple with quadratic models, with provably nice optimization landscape and better generalization.

@yubai01

Yu Bai

5 years

Our Beyond Linearization paper is accepted at #ICLR2020 !

0

1

15

0

3

17

@yubai01

Yu Bai

2 years

In new paper led by @EshaanNichani , we utilize the spectral structure + higher-order "QuadNTK" approximation to show benefit of "After NTK" learning.

@EshaanNichani

Eshaan Nichani

2 years

What happens “after NTK” in wide neural nets, and how does it improve over the NTK? Excited to announce a new paper with @yubai01 and @jasondeanlee ! A thread on the main takeaways below: (1/9)

Tweet media one

1

12

73

0

0

15

@yubai01

Yu Bai

5 years

Our Beyond Linearization paper is accepted at #ICLR2020 !

Tweet card media

Beyond Linearization: On Quadratic and Higher-Order Approximation...

Wide neural networks can escape the NTK regime and couple with quadratic models, with provably nice optimization landscape and better generalization.

@yubai01

Yu Bai

5 years

Can wide neural nets be systematically analyzed beyond the kernel / linearized regime? Our recent work shows that wide NNs can couple with higher-order (e.g. quadratic) submodels and genearlize better than the linearized ones! (joint w/ @jasondeanlee )

0

2

27

0

1

15

@yubai01

Yu Bai

2 years

Check out our new work for efficiently learning "rationalizable equilibria" in multiplayer games---Strategies that are both approximate CE/CCE, and supported on rationalizable actions.

@chijinML

Chi Jin

2 years

We are excited to announce our recent work with @YuanhaoWang3 , Dingwen Kong, @yubai01 , which presents new algorithms and the first sample-efficient guarantees for learning rationalizable equilibria.

0

1

21

0

0

12

@yubai01

Yu Bai

4 years

#NeurIPS2020 What is the optimal algorithm for multi-agent reinforcement learning in zero-sum Markov games? We present "Near-Optimal Reinforcement Learning via Self-Play" Paper: Poster session: Tuesday 9-11am PT Joint w/ Chi Jin, Tiancheng Yu.

Tweet media one

0

2

11

@yubai01

Yu Bai

3 years

🆕"When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?" We theoretically study what RL can learn in multi-player general-sum MGs without exp(# players) samples. Joint w/ Ziang Song (Peking U.) & @WispyMay . 🧵

Tweet card media

When Can We Learn General-Sum Markov Games with a Large Number of...

Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a...

2

0

11

@yubai01

Yu Bai

5 years

Will be attending this workshop Thu - Fri. Looking forward to it!

@prfsanjeevarora

Sanjeev Arora

@prfsanjeevarora

5 years

2-day workshop "New Directions in Reinforcement Learning and Control" @the_IAS in Princeton Nov 7-8. Schedule and livestream here .

0

23

92

0

0

10

@yubai01

Yu Bai

4 years

Our annual AI research grant is now open for applications!

@SFResearch

Salesforce AI Research

4 years

Announcing the Third Annual AI Research Grant! For more details and how to apply: Blog: Website: Good luck to our future applicants!

1

36

75

0

0

11

@yubai01

Yu Bai

2 years

#ICLR2022 We present provably sample-efficient algorithms for multi-agent RL with large # players, without exp(# players) blowup! Poster session today (Tue 6:30 - 8:30pm PT): Paper👇

@yubai01

Yu Bai

3 years

🆕"When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?" We theoretically study what RL can learn in multi-player general-sum MGs without exp(# players) samples. Joint w/ Ziang Song (Peking U.) & @WispyMay . 🧵

2

0

11

0

0

9

@yubai01

Yu Bai

3 years

Welcome to check out our new AI Residency Program!

@SFResearch

Salesforce AI Research

3 years

Our new AI Residency Program aims to foster the next generation of AI researchers. Our program gives candidates real-world experience and makes them more qualified for top PhD programs. Applications close January 3, 2022:

0

9

26

0

0

10

@yubai01

Yu Bai

5 years

Check out Song's blog for more nice stuff on statistical physics <-> theoretical ML.

@gabrielpeyre

Gabriel Peyré

5 years

Very nice blog post from Song Mei on the replica method from statistical physics.

Tweet media one

3

64

254

1

0

8

@yubai01

Yu Bai

1 year

Curious exp: A single transformer (TF_alg_select) can simultaneously match Bayes optimal in-context predictions on two tasks (noisy linear models with different noise levels). Those two tasks required different optimal algorithms (ridge regression with different \lambda's)!

Tweet media one

1

1

8

@yubai01

Yu Bai

3 years

Welcome to check out our work!

@huan__wang

Huan Wang

3 years

If you are attending ICML @icmlconf this week, you are most welcome to check out some recent work from our team: #ICML2021 @SFResearch

Tweet media one

1

7

28

0

0

8

@yubai01

Yu Bai

3 years

Acknowledgement: Thanks @prfsanjeevarora for hosting offconvex and the many helps! Thanks to other co-authors of our paper @tourzhao @huan__wang @CaimingXiong @RichardSocher

1

0

7

@yubai01

Yu Bai

2 years

Nice 🧵 by @yuxiangw_cs on low-switching cost RL (aka deployment efficiency). It's a practically relevant setting in between offline RL and "truly" online RL, with many exciting progresses and open questions for both deep RL and RL theory!

@yuxiangw_cs

Yu-Xiang Wang

2 years

Online RL guarantees good exploration but has limited applicability (due to safety / legal concerns for online trials-and-errors). Offline RL (aka Batch RL) shows great promise but requires strong assumptions on logged data. Is there anything in between? 1/7

Tweet media one

3

21

116

0

0

6

@yubai01

Yu Bai

1 year

Personally, one thing I really like in this project is that, **both experiments and ML theory** (statistics, linear algebra with transformers) played crucial roles in isolating and rigorizing the phenomenon.

1

1

6

@yubai01

Yu Bai

4 years

Joint work with @MinshuoC @jasondeanlee @tourzhao @huan__wang @CaimingXiong @RichardSocher at @SFResearch , Georgia Tech, and Princeton.

0

0

6

@yubai01

Yu Bai

1 year

To discover and understand the capabilities of LLMs, I believe this combination will become even more powerful. This is joint work with an amazing team of collaborators: Fan Chen (Peking U), @Song__Mei (Berkeley) @huan__wang @CaimingXiong (Salesforce)

1

1

6

@yubai01

Yu Bai

1 year

We coin this capability as "In-Context Algorithm Selection". This is similar to what a statistician / ML expert can do in real-life: Choose the best algorithm for their data at hand. How can a Transformer (TF) do that? We construct two mechanisms in theory.

1

1

5

@yubai01

Yu Bai

1 year

++ Along the way, we develop a comprehensive & quantitative theory for TFs to do ICL: * Implementing many more ML algs by TF (Lasso, Logistic regression, neural networks...) * New efficient implementation of in-context gradient descent as backbone * Analysis of pretraining * ...

1

1

5

@yubai01

Yu Bai

3 years

The Accuracy-Calibration frontier is much more informative than just calibration errors alone. Nice to see this extensive empirical study.

@MJLM3

Matthias Minderer

3 years

New paper: Revisiting the Calibration of Modern Neural Networks (). We studied the calibration of MLP-Mixer, Vision Transformers, BiT, and many others. Non-convolutional models are doing surprisingly well! 1/5

Tweet media one

2

72

302

0

0

5

@yubai01

Yu Bai

1 year

In the first mechanism, “Post-ICL Validation”, the TF executes many ICL algorithms in parallel on a train split, and outputs the one with lowest loss on a validation split. Example: A TF can do ridge regression with \lambda_1 on input 1 and \lambda_2 on input 2.

Tweet media one

1

1

5

@yubai01

Yu Bai

1 year

In the second mechanism, "Pre-ICL testing", the TF runs a certain distribution test to deci which ICL algorithm to use. Example: A TF can do linear regression on a regression problem, and logistic regression on a classification problem, using a binary type check.

Tweet media one

1

1

4

@yubai01

Yu Bai

4 months

Looking forward to seeing more exciting works from the team!

0

0

4

@yubai01

Yu Bai

1 year

@GoogleDeepMind @misovalko @Tdash_Koz X = number of information sets for a single player, the main measure of game size for EFGs. Their new ICML paper improves over ours in the H (game horizon) dependency, and importantly does not require the structure of the game tree to be known ahead.

1

0

4

@yubai01

Yu Bai

2 years

@ml_angelopoulos @davidstutz92 Thanks for flagging! Conditional coverage is definitely an important goal, great to see backproping thru conformal works here. May be interesting to see whether a proper efficiency loss could be designed for conditional coverage (and be optimized) too.

0

0

4

@yubai01

Yu Bai

4 years

Congrats team!

@CaimingXiong

Caiming Xiong

4 years

Our NLP team got 16 papers (11 long, 2 short, and 3 finds) at #emnlp2020 , which cover dialogue, summarization, question answering, multilingual, few-shot, NLI, semantic parsing, data augmentation, etc. Congrats to team members and coauthors. More info about papers coming soon!

Tweet media one

13

67

459

0

0

4

@yubai01

Yu Bai

1 year

These mechanisms not only match our findings in experiments. They also allow TFs to achieve strong ICL performance in theory. Example: We construct a TF to do nearly Bayes-optimal ICL in a challenging task---noisy linear models with **mixed** noise levels.

Tweet media one

1

1

3

@yubai01

Yu Bai

4 months

@CaimingXiong @SFResearch @huan__wang @silviocinguetta @RichardSocher Thanks Caiming, it was great working with you!

0

0

3

@yubai01

Yu Bai

3 months

@nanjiang_cs Congrats Nan!

0

0

3

@yubai01

Yu Bai

1 month

@johnschulman2 It's been an honor to have been colleague with you and wished it could be longer. Thank you and all the best!

0

0

7

@yubai01

Yu Bai

2 years

Our results generalize theirs to the case of EFCEs. Besides, we unveil a new connection in their setting as well: Hedge in NFG space = Kernelized MWU (Farina et al.'s efficient impl.) = Standard OMD with dilated entropy regularizer. Once again, OMD <-> NFG 😎 15/

1

0

3

@yubai01

Yu Bai

1 year

@KuanFang @Cornell @CornellCIS @Cornell_CS Congrats Kuan!!

0

0

2

@yubai01

Yu Bai

2 years

Before we wrap: Our work is hugely inspired by the recent work of @gabrfarina @Chung_Wei_ @HaipengLuo @ChrKroer at #ICML2022 : They used the "kernel trick" to obtain an efficient implementation of Hedge in NFG space, for learning NFCCEs. 14/

Tweet card media

Kernelized Multiplicative Weights for 0/1-Polyhedral Games:...

While extensive-form games (EFGs) can be converted into normal-form games (NFGs), doing so comes at the cost of an exponential blowup of the strategy space. So, progress on NFGs and EFGs has...

1

2

3

@yubai01

Yu Bai

3 years

@Guodzh @SimonShaoleiDu Yeah the meaning of self-play does depend on the context. In many theory works we do have >=2 *different* agents playing against each other, and we called it "self-play" too, to emphasize we don't need guidance from expert opponents / demonstrations (think AlphaZero vs. AlphaGo)

1

0

3

@yubai01

Yu Bai

4 years

Observed same thing on CIFAR too -- square loss is as powerful as cross-entropy loss across various architectures and hyperparameters.

@deepcohen

Jeremy Cohen

4 years

Square loss works basically as well as cross-entropy loss on classification tasks. For example, square loss gets 76.0 accuracy for ResNet-50 on ImageNet, compared to 76.1 for cross-entropy.

7

16

112

0

0

2

@yubai01

Yu Bai

1 year

@aviral_kumar2 @SCSatCMU @CSDatCMU @mldcmu @svlevine Huge congrats @aviral_kumar2 !!

0

0

2

@yubai01

Yu Bai

2 years

Note that the modified algorithm is **no longer an algorithm in the NFG space**. However, it is still an OMD algorithm, just with a different (dilated) regularizer. So we're back to OMD again, and we're much better at designing OMD algorithms, via the NFG connection 😃 13/

1

0

2

@yubai01

Yu Bai

1 year

@stats_stephen Congrats Stephen!!

1

0

2

@yubai01

Yu Bai

1 year

@martinjzhang @CMUCompBio Congrats!

0

0

2

@yubai01

Yu Bai

7 months

@Diyi_Yang Congratulations Diyi!

1

0

2

@yubai01

Yu Bai

4 months

Thanks everyone!! ❤️

0

0

2

@yubai01

Yu Bai

2 months

@du_yilun @KempnerInst Congrats Yilun!

0

0

2

@yubai01

Yu Bai

2 years

@unsorsodicorda @WispyMay @huan__wang @yingbozhou_ai @CaimingXiong Yeah definitely, would love to catch up :)

0

0

2

@yubai01

Yu Bai

1 year

@ben_eysenbach @Princeton @mldcmu @rsalakhu @svlevine @PrincetonCS Congrats Ben!!

0

0

0

@yubai01

Yu Bai

4 months

@Song__Mei Thanks Song, that means a lot! ♥️

0

0

2

@yubai01

Yu Bai

2 years

What's even nicer about the OMD connection: We build on this connection to design a modified OMD algorithm, that achieves better and the first near-optimal sample complexity for learning EFCE under bandit feedback. That is our second main result. 12/

1

0

2

@yubai01

Yu Bai

2 years

@dragomir_radev Congrats Drago!

0

0

2

@yubai01

Yu Bai

2 years

@SurbhiGoel_ @PennCIS Congrats Surbhi!!

1

0

1

@yubai01

Yu Bai

3 years

This is joint work w/ @WispyMay @huan__wang @CaimingXiong . 4/4

0

0

1

@yubai01

Yu Bai

2 years

@hausman_k Hi @hausman_k , nice work! We had a similar algorithm in our Policy Finetuning paper (Alg. 2) where we first use a ref policy \mu within steps 1:h to guide exploration for steps h+1:H, and then further improve \mu using the learned exploration policy

2

0

1

@yubai01

Yu Bai

2 years

Pros and cons of converting to NFGs: ✅ Much easier algorithm design, can just apply known NFG algorithms such as Hedge or Phi-Hedge. ✅ Good convergence rate / sample complexity. ❌ Computationally intractable, as size of converted NFG = exp(size of original EFG) ! 8/

1

0

1

@yubai01

Yu Bai

1 year

@polynoamial @OpenAI Congrats!!

0

0

1

@yubai01

Yu Bai

2 years

For the future, we believe EFGs is an exciting topic: It's mathematically elegant, many open questions, and theoretically principled algorithms are super relevant for AI practice! 16/n, n=16

1

1

1

@yubai01

Yu Bai

1 year

@tunguz Transformers can be good statisticians indeed:

Tweet card media

Transformers as Statisticians: Provable In-Context Learning with...

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training...

0

0

1

@yubai01

Yu Bai

2 years

Adding @Song__Mei correct handle for Song

0

0

1

@yubai01

Yu Bai

2 years

@lihua_lei_stat @StanfordGSB Huge congrats!!

1

0

1

@yubai01

Yu Bai

2 years

@InstMathStat @weijie444 Congrats @weijie444 !

1

0

1

@yubai01

Yu Bai

3 years

We then turn to Markov Potential Games (MPGs), a well-studied subclass of general-sum MGs. In this case, there exist recent algorithms that can find Nash with poly(m, 1/eps) samples. We design an alternative algorithm with similar poly(m) and improved dependence on eps. 4/n

1

0

1

@yubai01

Yu Bai

1 year

@tri_dao @Stanford @Princeton @PrincetonCS Congrats Tri!

0

0

1

@yubai01

Yu Bai

2 years

EFGs admit an elegant game structure. People have utilized this structure to develop algorithmic principles, such as Online Mirror Descent (OMD), and Counterfactual Regret Minimization (CFR). OMD/CFR-type algorithms are both efficient in theory and work well in practice! 5/

1

0

1

@yubai01

Yu Bai

1 year

@SametOymac Thanks! Same congrats to your dissecting CoT paper!

0

0

1

@yubai01

Yu Bai

6 months

@misovalko @Meta @edunov @ylecun @jpineau1 @AIatMeta @lvdmaaten Congrats Michal!

0

0

1

@yubai01

Yu Bai

2 years

An alternative route to algorithm design was long known, but not as popular---Convert them to Normal-Form Games (NFGs). That is, re-express the EFG as a (much bigger) NFG whose action space is the strategy space of the original EFG. 7/

1

0

1

@yubai01

Yu Bai

2 years

We further find that, this algorithm is **equivalent** to an OMD-type algorithm in the reparametrized space! This resolves the aforementioned open problem as well: It's the first OMD type algorithm for learning EFCEs. 11/

1

1

1

@yubai01

Yu Bai

1 year

@MussmannSteve @GeorgiaTech @gatech_scs Congrats Steve!

0

0

1

@yubai01

Yu Bai

4 years

@adityagrover_ @StanfordAILab @StefanoErmon @erichorvitz @jure @percyliang Congrats Aditya!

0

0

1

@yubai01

Yu Bai

3 years

@Guodzh @SimonShaoleiDu I would still consider that as multi-agent. Basically that's still solving a min-max problem over policies \mu, \nu, but with the special parametrization \mu = \nu (which makes sense in a symmetric game such as Go)

1

0

1

@yubai01

Yu Bai

9 months

@adityagrover_ Congrats Aditya!!

1

0

1

@yubai01

Yu Bai

4 years

@tourzhao @huan__wang @CaimingXiong @RichardSocher

0

0

1