Anton Bakhtin Profile
Anton Bakhtin

@anton_bakhtin

1,591
Followers
131
Following
3
Media
59
Statuses

Researcher at @AnthropicAI , Ex FAIR @MetaAI , Ex @Google

Joined August 2019
Don't wanna be here? Send us removal request.
@anton_bakhtin
Anton Bakhtin
7 months
RL never works, until it does :) That was incredible to be the part of the adventure. Beside being smart, the model is more fun to interact with. Go check it out!
@AnthropicAI
Anthropic
7 months
Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Tweet media one
570
2K
10K
17
24
414
@anton_bakhtin
Anton Bakhtin
2 years
AI mastered many purely adversarial games (Go, Poker, StarCraft) by using self-play at scale. However, it doesn’t work in Diplomacy as it requires cooperation and coordination that does not emerge naturally. Here’s how we tackled this problem piece by piece in the last 3y. 🧵
@AIatMeta
AI at Meta
2 years
Meta AI’s @polynoamial and @anton_bakhtin talk about strategic reasoning and how it enables #CICERObyMetaAI to predict moves from billions of possibilities. Want to know how CICERO uses planning to find opportunities for mutually beneficial cooperation? Read more on our blog ⬇️
1
23
122
7
59
357
@anton_bakhtin
Anton Bakhtin
2 years
I played with pytorch 2.0 a little, and oh boy it's the best thing after... pytorch itself! It made inference for a random bidirectional transformer I tried 3x faster with one line of code.
@PyTorch
PyTorch
2 years
We just introduced PyTorch 2.0 at the #PyTorchConference , introducing torch.compile! Available in the nightlies today, stable release Early March 2023. Read the full post: 🧵below! 1/5
Tweet media one
23
522
2K
1
3
73
@anton_bakhtin
Anton Bakhtin
2 years
I hope one day AI would be able to act as a good friend: understand you, reason what could be done, and communicate this. Today, we got one step closer by achieving human level performance in “Diplomacy” - a game that models many of these aspects. Paper:
@AIatMeta
AI at Meta
2 years
Meta AI presents CICERO — the first AI to achieve human-level performance in Diplomacy, a strategy game which requires building trust, negotiating and cooperating with multiple players. Learn more about #CICERObyMetaAI :
240
840
4K
3
10
63
@anton_bakhtin
Anton Bakhtin
2 years
A new paper: how to get a human-friendly🦕using a pikl 🥒! More generally, how to make RL optimize some non-trivial objective staying true to what humans would call a common sense.
@polynoamial
Noam Brown
2 years
After building on years of work from MILA, DeepMind, ourselves, and others, our AIs are now expert-human-level in no-press Diplomacy and Hanabi! Unlike Go and Dota, Diplomacy/Hanabi involve *cooperation*, which breaks naive RL. 🧵👇
Tweet media one
30
261
1K
3
7
43
@anton_bakhtin
Anton Bakhtin
4 years
Want to understand why AlphaZero can't play poker, and how to fix that? Come see our NeurIPS poster at 9am PT/noon ET Thursday!
@AIatMeta
AI at Meta
4 years
AI bots have bested humans in both chess and poker but the algorithms used to win each game were very different. Today we introduce ReBeL, a major step towards a single AI algorithm that can play all games including chess, Go, poker, Liar's Dice and more.
Tweet media one
11
166
732
0
4
39
@anton_bakhtin
Anton Bakhtin
2 years
This is a multi-year effort by the Diplomacy team. A shout-out to our star interns @apjacob03 (PiKL) and @aweisawei (value based filtering). Special thanks to Diplomacy experts, @DiploStrats , @TheGoffy , Karthik Konath, for sharing their wisdom about the game.
1
0
33
@anton_bakhtin
Anton Bakhtin
3 years
We applied RL in Diplomacy with its 10**20 action space, and found it's not enough to play well with humans due to multiple equilibria with 7 players. Accepted to NeurIPS. Does the research community finally care about negative results? Nah, we also superhuman for 2p variant
@polynoamial
Noam Brown
3 years
Introducing DORA, an AI that learns no-press Diplomacy from scratch with no human data! Our #NeurIPS2021 paper shows DORA is superhuman in 1v1 Diplomacy. In 7p Diplomacy, the results are more subtle. Joint work w/ @anton_bakhtin , David Wu, and @adamlerer :
Tweet media one
6
48
275
1
2
27
@anton_bakhtin
Anton Bakhtin
2 years
Alignment. We developed a new search algorithm, PiKL (🥒), that takes into account likelihood of each action under a human policy into position evaluation. The resulting agent, Diplodocus (🦕), demonstrated expert human performance in dialogue-free Diplomacy using RL+PiKL.
1
0
26
@anton_bakhtin
Anton Bakhtin
2 years
Coordination. Finally, dialogue allows players to coordinate and changes the expected value of each position. While directly modeling dialogue is intractable in RL, we model joint policy marginalized over possible dialogues in CoShar-PiKL algorithm. This is how we got to Cicero.
1
0
25
@anton_bakhtin
Anton Bakhtin
2 years
Scale. First, we developed DORA, a self-play algorithm to handle games the size of Diplomacy. It’s super-human in a simplified 2-player version of Diplomacy, but as expected played worse than SearchBot with humans in 7-player Diplomacy as it doesn’t ally well.
1
0
24
@anton_bakhtin
Anton Bakhtin
2 years
Bonus - value based filtering. We can compute the expected value not only of action, but of messages too. Each message changes the expected policies of the players and hence the expected value. We use this to filter out unwise messages, e.g., the ones where we leak information.
1
0
23
@anton_bakhtin
Anton Bakhtin
2 years
The power of LLMs as a translators into a reasoning system is mind-blowing. Especially if feedback loop will get there
@perplexity_ai
Perplexity
2 years
Introducing Bird SQL, a Twitter search interface that is powered by Perplexity’s structured search engine. It uses OpenAI Codex to translate natural language into SQL, giving everyone the ability to navigate large datasets like Twitter.
228
2K
9K
0
0
19
@anton_bakhtin
Anton Bakhtin
2 years
If you want to see not cherry-picked performance of the agent, there is a great video by @DiploStrats , a professional Diplomacy player, where he comments in real time on his game with 6 copies of Cicero
1
2
14
@anton_bakhtin
Anton Bakhtin
3 years
AI is known to be good at two player zero-sum games, where each move is either makes me better off or my opponent. But what if the agent has sometimes cooperate or compromise with other players to win? Check out poster+oral tmr to learn!
0
1
12
@anton_bakhtin
Anton Bakhtin
2 years
This work was brought by the power of strategic reasoning and communication by a multidisciplinary team of amazing people I had an honor to be a part of.
1
0
9
@anton_bakhtin
Anton Bakhtin
2 years
A mesmerizing thing about AI art is that it allows going from 3d to funky manifolds of your liking. Everything, all at at once
@GlennIsZen
Glenn Marshall
2 years
A Dance, My Lord
57
755
4K
2
1
7
@anton_bakhtin
Anton Bakhtin
3 years
Wow, the dream comes true! True multi threading in python instead of the multiprocess hell.
@soumithchintala
Soumith Chintala
3 years
PyTorch co-author Sam Gross ( @colesbury ) has been working on removing the GIL from Python. Like...we can start using threads again instead of multiprocessing hacks! This was a multi-year project by Sam. Great article summarizing it:
13
317
2K
0
0
4
@anton_bakhtin
Anton Bakhtin
2 years
@elontimes @ylecun In contrast, Diplomacy is a general sum game so there are no guarantees that the equilibrium we find through self-play is compatible with human play. It’s true for simpler games as well, e.g., in the iterated prisoner's dilemma. But Diplomacy has more subtlety to cooperation. 3/3
1
1
4
@anton_bakhtin
Anton Bakhtin
2 years
If you're in Baltimore this steamy morning, drop by our spotlight talk at 10:30 in 307. Will be presented live!
@apjacob03
Athul Paul Jacob
2 years
We are excited to present our work on building strong, human-like gameplay agents at #ICML2022 next week! In chess, Go, Hanabi and no-press Diplomacy, we get SOTA human prediction accuracy while being substantially stronger than imitation learning. 🧵(1/7)
1
24
90
1
0
3
@anton_bakhtin
Anton Bakhtin
2 years
@drmehmetismail By "purely adversarial" I was referring to *two-player* zero-sum where no cooperation is needed. More precisely, any n-player general sum game is equivalent to a (n+1)-player zero-sum, so for n>2 all the same. Fun fact - some Diplomacy scoring systems are not fixed sum either.
1
0
3
@anton_bakhtin
Anton Bakhtin
2 years
@karpathy Text protobufs. Hierarchy and oneof's naturally capture components, python typing with generated .pyi, and an option to pass to pybind c++
0
0
2
@anton_bakhtin
Anton Bakhtin
2 years
@DanceScholar Diplomacy is zero sum, just not 2 player, and so it's possible to extend ELO for such games. Our best agent got the first place, but the variance is too high so we only to claim anything other than expert level.
0
0
2
@anton_bakhtin
Anton Bakhtin
1 year
Wow, maybe the future is bright
@DigThatData
David Marx
1 year
Absolutely dope #AIart transformation of @marcrebillet with @devdef 's #stablediffusion #warpfusion + a few other tools (see OP for workflow and links to tutorials), by redditor AthleteEducational63:
34
175
897
0
0
2
@anton_bakhtin
Anton Bakhtin
2 years
@elontimes @ylecun Self-play converges to some local equilibrium strategies. It's theoretically proven that for any two player zero-sum games (e.g, Go) strategies from different equilibria are compatible. Thus, there is no need to know how humans play the game - rules of the game are enough 2/3
1
0
2
@anton_bakhtin
Anton Bakhtin
2 years
@hr0nix Or you can ask the model itself to give judgement as Constitutional AI does.
1
0
1
@anton_bakhtin
Anton Bakhtin
4 years
made my day!
@NPCollapse
Connor Leahy
4 years
10
25
177
0
0
1
@anton_bakhtin
Anton Bakhtin
2 years
@elontimes @ylecun It's a general property. Here's a gif that shows how a game is played by 7 humans (top) or 7 pure self-play agents. Humans can cooperate and at the end 2 smaller survivors stop the hegemon. In contrast, AI does constant precise rebalancing. Hard and boring for humans. 1/3
1
0
1