Anton Bakhtin @anton_bakhtin Twitter profile

Last Seen Profiles

@Swol_Radguy

@KurtHimpe

@blvcth

@unkowngoon404

@seno_yutan

@wata_noro

@sapphicrime

@bokeplokalmalam

@MarkFarrelly131

@bokeplokalmalam

@WSL_esports

@realdami02

@kasi_e24579

@piss

@true_prince1

@wagashiiii0107

@stw46

@Luis10_usa

@AmWdyat

@2245Aki

@Andysaro

@sotwecom

@PedroBenito9

@cukienaknikmati

@Bitch_aa77

@MidAndHeadless

@bokeplokalmalam

@SuryadiL77919

@stw46

@bokeplokalmalam

@BabyBLoodieChin

@MREZsports

@grumpiesart

@bokeplokalmalam

@ClemMany

@donabe_don

Anton Bakhtin

@anton_bakhtin

7 months

RL never works, until it does :) That was incredible to be the part of the adventure. Beside being smart, the model is more fun to interact with. Go check it out!

Anthropic

@AnthropicAI

7 months

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

570

2K

10K

17

24

414

Anton Bakhtin

@anton_bakhtin

2 years

AI mastered many purely adversarial games (Go, Poker, StarCraft) by using self-play at scale. However, it doesn’t work in Diplomacy as it requires cooperation and coordination that does not emerge naturally. Here’s how we tackled this problem piece by piece in the last 3y. 🧵

AI at Meta

@AIatMeta

2 years

Meta AI’s @polynoamial and @anton_bakhtin talk about strategic reasoning and how it enables #CICERObyMetaAI to predict moves from billions of possibilities. Want to know how CICERO uses planning to find opportunities for mutually beneficial cooperation? Read more on our blog ⬇️

1

23

122

7

59

357

Anton Bakhtin

@anton_bakhtin

2 years

I played with pytorch 2.0 a little, and oh boy it's the best thing after... pytorch itself! It made inference for a random bidirectional transformer I tried 3x faster with one line of code.

PyTorch

@PyTorch

2 years

We just introduced PyTorch 2.0 at the #PyTorchConference , introducing torch.compile! Available in the nightlies today, stable release Early March 2023. Read the full post: 🧵below! 1/5

23

522

2K

1

3

73

Anton Bakhtin

@anton_bakhtin

2 years

I hope one day AI would be able to act as a good friend: understand you, reason what could be done, and communicate this. Today, we got one step closer by achieving human level performance in “Diplomacy” - a game that models many of these aspects. Paper:

AI at Meta

@AIatMeta

2 years

Meta AI presents CICERO — the first AI to achieve human-level performance in Diplomacy, a strategy game which requires building trust, negotiating and cooperating with multiple players. Learn more about #CICERObyMetaAI :

240

840

4K

3

10

63

Anton Bakhtin

@anton_bakhtin

2 years

@david_picard @MetaAI @polynoamial You can see Markus @DiploStrats , an expert Diplomacy player, playing a full game with 6 copies of Cicero here: . You can also find more games by Cicero in out repo:

Expert Diplomacy Player vs CICERO AI

CaptainMeme takes on Meta's new Diplomacy AI, CICERO! This video shows the entire game in real time, including all Press - yes, this AI negotiates!There's a ...

www.youtube.com

2

3

43

Anton Bakhtin

@anton_bakhtin

2 years

A new paper: how to get a human-friendly🦕using a pikl 🥒! More generally, how to make RL optimize some non-trivial objective staying true to what humans would call a common sense.

Noam Brown

@polynoamial

2 years

After building on years of work from MILA, DeepMind, ourselves, and others, our AIs are now expert-human-level in no-press Diplomacy and Hanabi! Unlike Go and Dota, Diplomacy/Hanabi involve *cooperation*, which breaks naive RL. 🧵👇

30

261

1K

3

7

43

Anton Bakhtin

@anton_bakhtin

4 years

Want to understand why AlphaZero can't play poker, and how to fix that? Come see our NeurIPS poster at 9am PT/noon ET Thursday!

AI at Meta

@AIatMeta

4 years

AI bots have bested humans in both chess and poker but the algorithms used to win each game were very different. Today we introduce ReBeL, a major step towards a single AI algorithm that can play all games including chess, Go, poker, Liar's Dice and more.

11

166

732

0

4

39

Anton Bakhtin

@anton_bakhtin

2 years

This is a multi-year effort by the Diplomacy team. A shout-out to our star interns @apjacob03 (PiKL) and @aweisawei (value based filtering). Special thanks to Diplomacy experts, @DiploStrats , @TheGoffy , Karthik Konath, for sharing their wisdom about the game.

1

0

33

Anton Bakhtin

@anton_bakhtin

2 years

Papers. DORA. Diplodocus. Cicero.

Mastering the Game of No-Press Diplomacy via Human-Regularized...

No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has...

arxiv.org

1

0

29

Anton Bakhtin

@anton_bakhtin

3 years

We applied RL in Diplomacy with its 10**20 action space, and found it's not enough to play well with humans due to multiple equilibria with 7 players. Accepted to NeurIPS. Does the research community finally care about negative results? Nah, we also superhuman for 2p variant

Noam Brown

@polynoamial

3 years

Introducing DORA, an AI that learns no-press Diplomacy from scratch with no human data! Our #NeurIPS2021 paper shows DORA is superhuman in 1v1 Diplomacy. In 7p Diplomacy, the results are more subtle. Joint work w/ @anton_bakhtin , David Wu, and @adamlerer :

6

48

275

1

2

27

Anton Bakhtin

@anton_bakhtin

2 years

Alignment. We developed a new search algorithm, PiKL (🥒), that takes into account likelihood of each action under a human policy into position evaluation. The resulting agent, Diplodocus (🦕), demonstrated expert human performance in dialogue-free Diplomacy using RL+PiKL.

1

0

26

Anton Bakhtin

@anton_bakhtin

2 years

Coordination. Finally, dialogue allows players to coordinate and changes the expected value of each position. While directly modeling dialogue is intractable in RL, we model joint policy marginalized over possible dialogues in CoShar-PiKL algorithm. This is how we got to Cicero.

1

0

25

Anton Bakhtin

@anton_bakhtin

2 years

Scale. First, we developed DORA, a self-play algorithm to handle games the size of Diplomacy. It’s super-human in a simplified 2-player version of Diplomacy, but as expected played worse than SearchBot with humans in 7-player Diplomacy as it doesn’t ally well.

1

0

24

Anton Bakhtin

@anton_bakhtin

2 years

Bonus - value based filtering. We can compute the expected value not only of action, but of messages too. Each message changes the expected policies of the players and hence the expected value. We use this to filter out unwise messages, e.g., the ones where we leak information.

1

0

23

Anton Bakhtin

@anton_bakhtin

2 years

The power of LLMs as a translators into a reasoning system is mind-blowing. Especially if feedback loop will get there

Perplexity

@perplexity_ai

2 years

Introducing Bird SQL, a Twitter search interface that is powered by Perplexity’s structured search engine. It uses OpenAI Codex to translate natural language into SQL, giving everyone the ability to navigate large datasets like Twitter.

228

2K

9K

0

19

Anton Bakhtin

@anton_bakhtin

2 years

If you want to see not cherry-picked performance of the agent, there is a great video by @DiploStrats , a professional Diplomacy player, where he comments in real time on his game with 6 copies of Cicero

Expert Diplomacy Player vs CICERO AI

CaptainMeme takes on Meta's new Diplomacy AI, CICERO! This video shows the entire game in real time, including all Press - yes, this AI negotiates!There's a ...

www.youtube.com

1

2

14

Anton Bakhtin

@anton_bakhtin

3 years

AI is known to be good at two player zero-sum games, where each move is either makes me better off or my opponent. But what if the agent has sometimes cooperate or compromise with other players to win? Check out poster+oral tmr to learn!

0

1

12

Anton Bakhtin

@anton_bakhtin

2 years

❤️ @adamlerer @ad_optimum @alex_h_miller @apjacob03 @aweisawei @colin__flaherty @dan_fried @darkwolf0010010 @em_dinan @gabrfarina @HengyuanH @hughbzhang @jgrayatwork @joespeez @lightvector1 @MinaeKwon @MKomeili @ml_perception @polynoamial @rendu_a @shi_weiyan @stephenroller

0

11

Anton Bakhtin

@anton_bakhtin

2 years

This work was brought by the power of strategic reasoning and communication by a multidisciplinary team of amazing people I had an honor to be a part of.

1

0

9

Anton Bakhtin

@anton_bakhtin

2 years

A mesmerizing thing about AI art is that it allows going from 3d to funky manifolds of your liking. Everything, all at at once

Glenn Marshall

@GlennIsZen

2 years

A Dance, My Lord

57

755

4K

2

1

7

Anton Bakhtin

@anton_bakhtin

3 years

Wow, the dream comes true! True multi threading in python instead of the multiprocess hell.

Soumith Chintala

@soumithchintala

3 years

PyTorch co-author Sam Gross ( @colesbury ) has been working on removing the GIL from Python. Like...we can start using threads again instead of multiprocessing hacks! This was a multi-year project by Sam. Great article summarizing it:

13

317

2K

0

4

Anton Bakhtin

@anton_bakhtin

2 years

@elontimes @ylecun In contrast, Diplomacy is a general sum game so there are no guarantees that the equilibrium we find through self-play is compatible with human play. It’s true for simpler games as well, e.g., in the iterated prisoner's dilemma. But Diplomacy has more subtlety to cooperation. 3/3

1

4

Anton Bakhtin

@anton_bakhtin

2 years

If you're in Baltimore this steamy morning, drop by our spotlight talk at 10:30 in 307. Will be presented live!

Athul Paul Jacob

@apjacob03

2 years

We are excited to present our work on building strong, human-like gameplay agents at #ICML2022 next week! In chess, Go, Hanabi and no-press Diplomacy, we get SOTA human prediction accuracy while being substantially stronger than imitation learning. 🧵(1/7)

1

24

90

1

0

3

Anton Bakhtin

@anton_bakhtin

2 years

@drmehmetismail By "purely adversarial" I was referring to *two-player* zero-sum where no cooperation is needed. More precisely, any n-player general sum game is equivalent to a (n+1)-player zero-sum, so for n>2 all the same. Fun fact - some Diplomacy scoring systems are not fixed sum either.

1

0

3

Anton Bakhtin

@anton_bakhtin

2 years

@karpathy Text protobufs. Hierarchy and oneof's naturally capture components, python typing with generated .pyi, and an option to pass to pybind c++

0

2

Anton Bakhtin

@anton_bakhtin

2 years

@DanceScholar Diplomacy is zero sum, just not 2 player, and so it's possible to extend ELO for such games. Our best agent got the first place, but the variance is too high so we only to claim anything other than expert level.

0

2

Anton Bakhtin

@anton_bakhtin

1 year

Wow, maybe the future is bright

David Marx

@DigThatData

1 year

Absolutely dope #AIart transformation of @marcrebillet with @devdef 's #stablediffusion #warpfusion + a few other tools (see OP for workflow and links to tutorials), by redditor AthleteEducational63:

34

175

897

0

2

Anton Bakhtin

@anton_bakhtin

2 years

@elontimes @ylecun Self-play converges to some local equilibrium strategies. It's theoretically proven that for any two player zero-sum games (e.g, Go) strategies from different equilibria are compatible. Thus, there is no need to know how humans play the game - rules of the game are enough 2/3

1

0

2

Anton Bakhtin

@anton_bakhtin

2 years

@hr0nix Or you can ask the model itself to give judgement as Constitutional AI does.

1

0

1

Anton Bakhtin

@anton_bakhtin

4 years

made my day!

Connor Leahy

@NPCollapse

4 years

10

25

177

0

1

Anton Bakhtin

@anton_bakhtin

2 years

@elontimes @ylecun It's a general property. Here's a gif that shows how a game is played by 7 humans (top) or 7 pure self-play agents. Humans can cooperate and at the end 2 smaller survivors stop the hegemon. In contrast, AI does constant precise rebalancing. Hard and boring for humans. 1/3

1

0

1