Yu Bai Profile Banner
Yu Bai Profile
Yu Bai

@yubai01

3,529
Followers
1,623
Following
17
Media
221
Statuses

Researcher @OpenAI . Previously @SFResearch , PhD @Stanford .

San Francisco, CA
Joined November 2010
Don't wanna be here? Send us removal request.
Pinned Tweet
@yubai01
Yu Bai
4 months
📢 Life update: In the vibes of the announcement today I am thrilled to share that I have joined @OpenAI as a researcher! It's just my week 2 here but already amazed by so many things the team has achieved. Looking forward to learning more and contributing!
@LiamFedus
William Fedus
4 months
GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.
Tweet media one
191
901
5K
51
10
357
@yubai01
Yu Bai
1 year
Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️
Tweet media one
3
51
227
@yubai01
Yu Bai
3 years
🚨 New blog post on Deep Learning Theory Beyond NTKs: Salesforce research blog: offconvex: An exposition of "escaping the NTK ball with stronger learning guarantees". Joint w/ @jasondeanlee @MinshuoC
Tweet media one
3
39
178
@yubai01
Yu Bai
1 year
Excited to share that "Transformers as Statisticians" will appear at #NeurIPS2023 as an oral! We have two other posters on learning with attention models and RL theory (thread may follow):
@yubai01
Yu Bai
1 year
Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️
Tweet media one
3
51
227
2
13
143
@yubai01
Yu Bai
3 years
📜🆕New extended blog post on recent progresses in multi-agent RL theory (joint w/ Chi Jin): We talk about "How does RL theory become different when it's multi-agent", and present the various recent developments and opportunities therein.
2
16
120
@yubai01
Yu Bai
9 months
Flying to #NeurIPS2023 now. Looking forward to meeting old and new friends, and talking about everything LLM / RL! "Transformers as Statisticians" oral would be in the Wed afternoon session.
Tweet media one
1
6
85
@yubai01
Yu Bai
4 months
And as my journey at Salesforce Research @SFResearch comes to an end after 4.5 years, I can't help but feel so fortunate to have been a part of this amazing AI research team. Thanks @huan__wang @CaimingXiong @silviocinguetta @RichardSocher ++everyone for all the support ♥️
3
2
80
@yubai01
Yu Bai
4 years
How do deep networks perform hierarchical learning? We theoretically show that networks with wide intermediate representations can express functions hierarchically, and be more sample efficient than "shallow learners" such as the NTK.
2
13
69
@yubai01
Yu Bai
2 years
#ICLR2022 We present CP-Gen, a modular approach for improving the efficiency (e.g. length, volume) with conformal prediction, by tuning prediction sets with more than one parameters. Paper: Poster (Monday 6:30pm PT):
Tweet media one
2
10
55
@yubai01
Yu Bai
1 year
We also used ReLU attention to study the expressive power of transformers. It matches softmax in our small (gpt2 scale) experiment in the first paper below. Nice work @hoonkp @skornblith and co to get ReLU transformers in action!
@arankomatsuzaki
Aran Komatsuzaki
1 year
Replacing softmax with ReLU in Vision Transformers ReLU-attention has better compute-performance scaling than softmax-attention on Vision Transformers
Tweet media one
7
69
382
1
7
51
@yubai01
Yu Bai
2 years
📜 Our paper on efficient learning in Extensive-Form Games will appear at #NeurIPS2022 as an Oral Presentation! 🔗 Paper: 📢 Poster: Joint work with @chijinML @WispyMay Ziang Song @tianchengyu14 🧵1/
Tweet media one
1
7
50
@yubai01
Yu Bai
9 months
CoT is a nice name! And excited that "Transformers as Statisticians" oral will be in this "CoT/Reasoning" session (Wed)--Being a statistician probably does mean that you're good at reasoning 😃
Tweet media one
@denny_zhou
Denny Zhou
9 months
In NeurIPS 2023, there is a section “CoT/Reasoning”. When preparing our CoT paper, I kicked off a discussion on the title. Different names were proposed, like stream of thought (Jason), train of thought (Dale), chain of thought (Dale). Finally I decided to choose “chain of
Tweet media one
8
12
108
0
5
44
@yubai01
Yu Bai
4 years
New preprint on offline RL: * A variance reduction algorithm for offline RL * Optimal horizon dependence: O(H^2/d_m) sample complexity on time-homogeneous MDPs Joint w/ Ming Yin ( @MingYin_0312 ) and Yu-Xiang Wang
2
6
40
@yubai01
Yu Bai
3 years
Check out our #ICML2021 paper---We theoretically analyze calibration, and show that over-confident prediction happens for well-specified logistic regression too, not just on large NNs! Paper: Poster: , Wed 9am PT 1/4
Tweet media one
2
4
38
@yubai01
Yu Bai
1 year
Congrats on the ICML best paper @GoogleDeepMind @misovalko @Tdash_Koz & team! Wraps a "trilogy" on learning Extensive-Form Games: * Our #ICML2022 paper, which first got O(X) (tight): * Their #NeurIPS2021 paper which got O(X^2):
@demishassabis
Demis Hassabis
1 year
Congrats to @GoogleDeepMind ’s Remi Munos, @misovalko , & team on the Outstanding Paper Award at @ICMLConf ! “Adapting to game trees in zero-sum imperfect information games” helps answer: how do you make the best move in a game w/ only partial info? Paper:
4
33
318
1
4
36
@yubai01
Yu Bai
1 year
I'll be presenting "Transformers as Statisticians" at the ES-FOMO workshop at #ICML2023 , at 1:00pm HT. See you there! Workshop website:
@yubai01
Yu Bai
1 year
Thrilled to share our new work "Transformers as Statisticians👩‍🎓👨‍🎓" Unveiling a new mechanism "In-Context Algorithm Selection" for In-Context Learning (ICL) in LLMs/transformers. ++ A comprehensive theory for transformers to do ICL. Thread⬇️
Tweet media one
3
51
227
2
2
35
@yubai01
Yu Bai
5 years
Our paper on low switching cost RL () has been accepted at @NeurIPSConf 2019. We showed that efficient PAC exploration can be achieved by switching the policy only logarithmically many times. Congrats Tengyang, @nanjiang_cs , and Yu-Xiang!
1
2
33
@yubai01
Yu Bai
2 years
Attending #NeurIPS2022 from Mon Evening -> Sat, and presenting 4 papers (1 oral + 3 posters) on multi-agent RL, games, and deep learning theory. I will also be at Salesforce's booth Tuesday afternoon, starting 2:45pm. Let me know if you want to chat!
Tweet media one
0
1
32
@yubai01
Yu Bai
4 years
The AI Economist: Using multi-agent RL to simulate complex economic systems, guide policy designs, and improve social equality. Impressive work by colleagues @StephanZheng @alexrtrott and all at @SFResearch !
@RichardSocher
Richard Socher
4 years
Excited to introduce the AI Economist: Extends ideas from Reinforcement Learning for tackling inequality through learned tax policy design. The framework optimizes productivity and equality. Blog: Paper: Q&A:
57
681
2K
0
7
30
@yubai01
Yu Bai
5 years
Can wide neural nets be systematically analyzed beyond the kernel / linearized regime? Our recent work shows that wide NNs can couple with higher-order (e.g. quadratic) submodels and genearlize better than the linearized ones! (joint w/ @jasondeanlee )
0
2
27
@yubai01
Yu Bai
1 year
En route to #ICML2023 ✈️🌴. Let's chat about LLMs / in-context learning, (multi-agent) RL, and their theories. You can also find me at our posters and workshop papers:
Tweet media one
0
0
25
@yubai01
Yu Bai
2 months
GPT-4o mini is out!
@OpenAIDevs
OpenAI Developers
2 months
Introducing GPT-4o mini! It’s our most intelligent and affordable small model, available today in the API. GPT-4o mini is significantly smarter and cheaper than GPT-3.5 Turbo.
Tweet media one
163
618
3K
0
0
22
@yubai01
Yu Bai
5 years
Excited to be attending #NeurIPS2019 at Vancouver next week!
0
0
21
@yubai01
Yu Bai
1 year
Happy to be selected as an expert reviewer for @TmlrOrg ! Time to send in a submission for earning that expert certificate :)
@hugo_larochelle
Hugo Larochelle
1 year
We have just finalized our first selection of TMLR Expert Reviewers. These are reviewers who have done particularly exemplary work in evaluating TMLR submissions. See the following page for details and the list of reviewers:
2
23
186
1
0
21
@yubai01
Yu Bai
5 months
Check out NPO, a simple objective for LLM unlearning.
@Song__Mei
Song Mei
5 months
LLM unlearning was mostly based on variants of gradient ascent (GA), susceptible to catastrophic forgetting. We propose Negative Preference Optimization (NPO), demonstrating efficient unlearning on TOFU benchmark. w/ @RuiqiZhang0614 @ Licong Lin, @yubai01 .
4
21
105
1
2
20
@yubai01
Yu Bai
4 years
Excited to present our paper "Provable Self-Play Algorithms for Competitive Reinforcement Learning" at #ICML2020 ! Talk: Wednesday (July 15) 9am PT / 10pm PT Paper: Poster: Joint work with Chi Jin. 1/2
Tweet media one
1
1
20
@yubai01
Yu Bai
1 year
Flying to #ICML2023 tomorrow. Ping me if you'd like to chat!
0
0
19
@yubai01
Yu Bai
5 months
Exciting opportunity for working with Song on LLMs!
@Song__Mei
Song Mei
5 months
My group at Berkeley Stats and EECS has a postdoc opening in the theoretical (e.g., scaling laws, watermark) and empirical aspects (e.g., efficiency, safety, alignment) of LLMs or diffusion models. Send me an email with your CV if interested!
0
23
96
0
0
19
@yubai01
Yu Bai
2 years
Looking forward to this tomorrow! Thanks for organizing @CsabaSzepesvari @neu_rips @CiaraPikeBurke
@CsabaSzepesvari
Csaba Szepesvari
2 years
Thinking of scaling up multiagent RL to a large number of agents? Provably? Choose your equilibrium concept right and you may be rewarded! Yu Bai will tell us tomorrow how! For details see
Tweet media one
0
7
33
0
0
18
@yubai01
Yu Bai
4 years
Appearing at #NeurIPS2020 ! Come to our poster session at Tuesday 9-11am PT to have some fun with NTKs, shallow Taylorized models, and better sample complexity than all these via neural hierarchical learning. w/ @MinshuoC @jasondeanlee ++
Tweet media one
@yubai01
Yu Bai
4 years
How do deep networks perform hierarchical learning? We theoretically show that networks with wide intermediate representations can express functions hierarchically, and be more sample efficient than "shallow learners" such as the NTK.
2
13
69
1
3
17
@yubai01
Yu Bai
4 years
Come chat with us about our Beyond Linearization paper and more! ICLR poster session today 10am - 12pm and 1 - 3pm PDT: Paper:
@yubai01
Yu Bai
5 years
Our Beyond Linearization paper is accepted at #ICLR2020 !
0
1
15
0
3
17
@yubai01
Yu Bai
2 years
In new paper led by @EshaanNichani , we utilize the spectral structure + higher-order "QuadNTK" approximation to show benefit of "After NTK" learning.
@EshaanNichani
Eshaan Nichani
2 years
What happens “after NTK” in wide neural nets, and how does it improve over the NTK? Excited to announce a new paper with @yubai01 and @jasondeanlee ! A thread on the main takeaways below: (1/9)
Tweet media one
1
12
73
0
0
15
@yubai01
Yu Bai
5 years
Our Beyond Linearization paper is accepted at #ICLR2020 !
@yubai01
Yu Bai
5 years
Can wide neural nets be systematically analyzed beyond the kernel / linearized regime? Our recent work shows that wide NNs can couple with higher-order (e.g. quadratic) submodels and genearlize better than the linearized ones! (joint w/ @jasondeanlee )
0
2
27
0
1
15
@yubai01
Yu Bai
2 years
Check out our new work for efficiently learning "rationalizable equilibria" in multiplayer games---Strategies that are both approximate CE/CCE, and supported on rationalizable actions.
@chijinML
Chi Jin
2 years
We are excited to announce our recent work with @YuanhaoWang3 , Dingwen Kong, @yubai01 , which presents new algorithms and the first sample-efficient guarantees for learning rationalizable equilibria.
0
1
21
0
0
12
@yubai01
Yu Bai
4 years
#NeurIPS2020 What is the optimal algorithm for multi-agent reinforcement learning in zero-sum Markov games? We present "Near-Optimal Reinforcement Learning via Self-Play" Paper: Poster session: Tuesday 9-11am PT Joint w/ Chi Jin, Tiancheng Yu.
Tweet media one
0
2
11
@yubai01
Yu Bai
3 years
🆕"When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?" We theoretically study what RL can learn in multi-player general-sum MGs without exp(# players) samples. Joint w/ Ziang Song (Peking U.) & @WispyMay . 🧵
2
0
11
@yubai01
Yu Bai
5 years
Will be attending this workshop Thu - Fri. Looking forward to it!
@prfsanjeevarora
Sanjeev Arora
5 years
2-day workshop "New Directions in Reinforcement Learning and Control" @the_IAS in Princeton Nov 7-8. Schedule and livestream here .
0
23
92
0
0
10
@yubai01
Yu Bai
4 years
Our annual AI research grant is now open for applications!
@SFResearch
Salesforce AI Research
4 years
Announcing the Third Annual AI Research Grant! For more details and how to apply: Blog: Website: Good luck to our future applicants!
1
36
75
0
0
11
@yubai01
Yu Bai
2 years
#ICLR2022 We present provably sample-efficient algorithms for multi-agent RL with large # players, without exp(# players) blowup! Poster session today (Tue 6:30 - 8:30pm PT): Paper👇
@yubai01
Yu Bai
3 years
🆕"When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?" We theoretically study what RL can learn in multi-player general-sum MGs without exp(# players) samples. Joint w/ Ziang Song (Peking U.) & @WispyMay . 🧵
2
0
11
0
0
9
@yubai01
Yu Bai
3 years
Welcome to check out our new AI Residency Program!
@SFResearch
Salesforce AI Research
3 years
Our new AI Residency Program aims to foster the next generation of AI researchers. Our program gives candidates real-world experience and makes them more qualified for top PhD programs. Applications close January 3, 2022:
0
9
26
0
0
10
@yubai01
Yu Bai
5 years
Check out Song's blog for more nice stuff on statistical physics <-> theoretical ML.
@gabrielpeyre
Gabriel Peyré
5 years
Very nice blog post from Song Mei on the replica method from statistical physics.
Tweet media one
3
64
254
1
0
8
@yubai01
Yu Bai
1 year
Curious exp: A single transformer (TF_alg_select) can simultaneously match Bayes optimal in-context predictions on two tasks (noisy linear models with different noise levels). Those two tasks required different optimal algorithms (ridge regression with different \lambda's)!
Tweet media one
1
1
8
@yubai01
Yu Bai
3 years
Welcome to check out our work!
@huan__wang
Huan Wang
3 years
If you are attending ICML @icmlconf this week, you are most welcome to check out some recent work from our team: #ICML2021 @SFResearch
Tweet media one
1
7
28
0
0
8
@yubai01
Yu Bai
3 years
Acknowledgement: Thanks @prfsanjeevarora for hosting offconvex and the many helps! Thanks to other co-authors of our paper @tourzhao @huan__wang @CaimingXiong @RichardSocher
1
0
7
@yubai01
Yu Bai
2 years
Nice 🧵 by @yuxiangw_cs on low-switching cost RL (aka deployment efficiency). It's a practically relevant setting in between offline RL and "truly" online RL, with many exciting progresses and open questions for both deep RL and RL theory!
@yuxiangw_cs
Yu-Xiang Wang
2 years
Online RL guarantees good exploration but has limited applicability (due to safety / legal concerns for online trials-and-errors). Offline RL (aka Batch RL) shows great promise but requires strong assumptions on logged data. Is there anything in between? 1/7
Tweet media one
3
21
116
0
0
6
@yubai01
Yu Bai
1 year
Personally, one thing I really like in this project is that, **both experiments and ML theory** (statistics, linear algebra with transformers) played crucial roles in isolating and rigorizing the phenomenon.
1
1
6
@yubai01
Yu Bai
1 year
To discover and understand the capabilities of LLMs, I believe this combination will become even more powerful. This is joint work with an amazing team of collaborators: Fan Chen (Peking U), @Song__Mei (Berkeley) @huan__wang @CaimingXiong (Salesforce)
1
1
6
@yubai01
Yu Bai
1 year
We coin this capability as "In-Context Algorithm Selection". This is similar to what a statistician / ML expert can do in real-life: Choose the best algorithm for their data at hand. How can a Transformer (TF) do that? We construct two mechanisms in theory.
1
1
5
@yubai01
Yu Bai
1 year
++ Along the way, we develop a comprehensive & quantitative theory for TFs to do ICL: * Implementing many more ML algs by TF (Lasso, Logistic regression, neural networks...) * New efficient implementation of in-context gradient descent as backbone * Analysis of pretraining * ...
1
1
5
@yubai01
Yu Bai
3 years
The Accuracy-Calibration frontier is much more informative than just calibration errors alone. Nice to see this extensive empirical study.
@MJLM3
Matthias Minderer
3 years
New paper: Revisiting the Calibration of Modern Neural Networks (). We studied the calibration of MLP-Mixer, Vision Transformers, BiT, and many others. Non-convolutional models are doing surprisingly well! 1/5
Tweet media one
2
72
302
0
0
5
@yubai01
Yu Bai
1 year
In the first mechanism, “Post-ICL Validation”, the TF executes many ICL algorithms in parallel on a train split, and outputs the one with lowest loss on a validation split. Example: A TF can do ridge regression with \lambda_1 on input 1 and \lambda_2 on input 2.
Tweet media one
1
1
5
@yubai01
Yu Bai
1 year
In the second mechanism, "Pre-ICL testing", the TF runs a certain distribution test to deci which ICL algorithm to use. Example: A TF can do linear regression on a regression problem, and logistic regression on a classification problem, using a binary type check.
Tweet media one
1
1
4
@yubai01
Yu Bai
4 months
Looking forward to seeing more exciting works from the team!
0
0
4
@yubai01
Yu Bai
1 year
@GoogleDeepMind @misovalko @Tdash_Koz X = number of information sets for a single player, the main measure of game size for EFGs. Their new ICML paper improves over ours in the H (game horizon) dependency, and importantly does not require the structure of the game tree to be known ahead.
1
0
4
@yubai01
Yu Bai
2 years
@ml_angelopoulos @davidstutz92 Thanks for flagging! Conditional coverage is definitely an important goal, great to see backproping thru conformal works here. May be interesting to see whether a proper efficiency loss could be designed for conditional coverage (and be optimized) too.
0
0
4
@yubai01
Yu Bai
4 years
Congrats team!
@CaimingXiong
Caiming Xiong
4 years
Our NLP team got 16 papers (11 long, 2 short, and 3 finds) at #emnlp2020 , which cover dialogue, summarization, question answering, multilingual, few-shot, NLI, semantic parsing, data augmentation, etc. Congrats to team members and coauthors. More info about papers coming soon!
Tweet media one
13
67
459
0
0
4
@yubai01
Yu Bai
1 year
These mechanisms not only match our findings in experiments. They also allow TFs to achieve strong ICL performance in theory. Example: We construct a TF to do nearly Bayes-optimal ICL in a challenging task---noisy linear models with **mixed** noise levels.
Tweet media one
1
1
3
@yubai01
Yu Bai
4 months
0
0
3
@yubai01
Yu Bai
3 months
@nanjiang_cs Congrats Nan!
0
0
3
@yubai01
Yu Bai
1 month
@johnschulman2 It's been an honor to have been colleague with you and wished it could be longer. Thank you and all the best!
0
0
7
@yubai01
Yu Bai
2 years
Our results generalize theirs to the case of EFCEs. Besides, we unveil a new connection in their setting as well: Hedge in NFG space = Kernelized MWU (Farina et al.'s efficient impl.) = Standard OMD with dilated entropy regularizer. Once again, OMD <-> NFG 😎 15/
1
0
3
@yubai01
Yu Bai
3 years
@Guodzh @SimonShaoleiDu Yeah the meaning of self-play does depend on the context. In many theory works we do have >=2 *different* agents playing against each other, and we called it "self-play" too, to emphasize we don't need guidance from expert opponents / demonstrations (think AlphaZero vs. AlphaGo)
1
0
3
@yubai01
Yu Bai
4 years
Observed same thing on CIFAR too -- square loss is as powerful as cross-entropy loss across various architectures and hyperparameters.
@deepcohen
Jeremy Cohen
4 years
Square loss works basically as well as cross-entropy loss on classification tasks. For example, square loss gets 76.0 accuracy for ResNet-50 on ImageNet, compared to 76.1 for cross-entropy.
7
16
112
0
0
2
@yubai01
Yu Bai
2 years
Note that the modified algorithm is **no longer an algorithm in the NFG space**. However, it is still an OMD algorithm, just with a different (dilated) regularizer. So we're back to OMD again, and we're much better at designing OMD algorithms, via the NFG connection 😃 13/
1
0
2
@yubai01
Yu Bai
1 year
@stats_stephen Congrats Stephen!!
1
0
2
@yubai01
Yu Bai
7 months
@Diyi_Yang Congratulations Diyi!
1
0
2
@yubai01
Yu Bai
4 months
Thanks everyone!! ❤️
0
0
2
@yubai01
Yu Bai
2 months
0
0
2
@yubai01
Yu Bai
2 years
0
0
2
@yubai01
Yu Bai
4 months
@Song__Mei Thanks Song, that means a lot! ♥️
0
0
2
@yubai01
Yu Bai
2 years
What's even nicer about the OMD connection: We build on this connection to design a modified OMD algorithm, that achieves better and the first near-optimal sample complexity for learning EFCE under bandit feedback. That is our second main result. 12/
1
0
2
@yubai01
Yu Bai
2 years
@dragomir_radev Congrats Drago!
0
0
2
@yubai01
Yu Bai
2 years
@SurbhiGoel_ @PennCIS Congrats Surbhi!!
1
0
1
@yubai01
Yu Bai
3 years
This is joint work w/ @WispyMay @huan__wang @CaimingXiong . 4/4
0
0
1
@yubai01
Yu Bai
2 years
@hausman_k Hi @hausman_k , nice work! We had a similar algorithm in our Policy Finetuning paper (Alg. 2) where we first use a ref policy \mu within steps 1:h to guide exploration for steps h+1:H, and then further improve \mu using the learned exploration policy
2
0
1
@yubai01
Yu Bai
2 years
Pros and cons of converting to NFGs: ✅ Much easier algorithm design, can just apply known NFG algorithms such as Hedge or Phi-Hedge. ✅ Good convergence rate / sample complexity. ❌ Computationally intractable, as size of converted NFG = exp(size of original EFG) ! 8/
1
0
1
@yubai01
Yu Bai
1 year
0
0
1
@yubai01
Yu Bai
2 years
For the future, we believe EFGs is an exciting topic: It's mathematically elegant, many open questions, and theoretically principled algorithms are super relevant for AI practice! 16/n, n=16
1
1
1
@yubai01
Yu Bai
2 years
Adding @Song__Mei correct handle for Song
0
0
1
@yubai01
Yu Bai
2 years
1
0
1
@yubai01
Yu Bai
3 years
We then turn to Markov Potential Games (MPGs), a well-studied subclass of general-sum MGs. In this case, there exist recent algorithms that can find Nash with poly(m, 1/eps) samples. We design an alternative algorithm with similar poly(m) and improved dependence on eps. 4/n
1
0
1
@yubai01
Yu Bai
2 years
EFGs admit an elegant game structure. People have utilized this structure to develop algorithmic principles, such as Online Mirror Descent (OMD), and Counterfactual Regret Minimization (CFR). OMD/CFR-type algorithms are both efficient in theory and work well in practice! 5/
1
0
1
@yubai01
Yu Bai
1 year
@SametOymac Thanks! Same congrats to your dissecting CoT paper!
0
0
1
@yubai01
Yu Bai
2 years
An alternative route to algorithm design was long known, but not as popular---Convert them to Normal-Form Games (NFGs). That is, re-express the EFG as a (much bigger) NFG whose action space is the strategy space of the original EFG. 7/
1
0
1
@yubai01
Yu Bai
2 years
We further find that, this algorithm is **equivalent** to an OMD-type algorithm in the reparametrized space! This resolves the aforementioned open problem as well: It's the first OMD type algorithm for learning EFCEs. 11/
1
1
1
@yubai01
Yu Bai
3 years
@Guodzh @SimonShaoleiDu I would still consider that as multi-agent. Basically that's still solving a min-max problem over policies \mu, \nu, but with the special parametrization \mu = \nu (which makes sense in a symmetric game such as Go)
1
0
1
@yubai01
Yu Bai
9 months
@adityagrover_ Congrats Aditya!!
1
0
1