Tom Zahavy @TZahavy Twitter profile

Pinned Tweet

Tom Zahavy

1 year

I'm super excited to share AlphaZeroᵈᵇ, a team of diverse #AlphaZero agents that collaborate to solve #Chess puzzles and demonstrate increased creativity. Check out our paper to learn more! A quick 🧵(1/n)

5

72

336

Last Seen Profiles

@DaRealXXLSLAYER

@kitakoshi3

@7Mg9vSLa1NiPw

@cvgworkshop

@JJP8z0uHgpu7aqY

@5WavesOfficial

@xoo__ll

@iiXxAngelixXii

@Netinho344521

@NealBrenda99668

@ZakirHtabla

@lust4melii

@SelfMadeFetty

@StudioGauntlet

@LorcanHill24

@pokekadon5555

@Mnemoz_btc

@lholcomb2024

@ModLangOpen

@DamienRactliffe

@LATraffic24

@db19crypto

@BTC_ConsultsWW

@Bhawna_seema

@1975bean_

@takinami3i

@1000wattagency

@DrJillSimons

@rodosuta204

@4zrIx

@OLIVE__BRANCH_

@i3ny_

@bblsqt

@InsanelySsane

@Pinkk_Assassin

@FlutzesCast

Tom Zahavy

@TZahavy

5 months

We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We build AI agents that discover new knowledge using RL, planning and LLMs. DM me if you have specific questions about working with us 🙏

6

33

287

Tom Zahavy

@TZahavy

24 days

We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We are building AI agents that create new knowledge using RL, planning and LLMs in domains like Mathematics, chess and more. Please apply

8

30

256

Tom Zahavy

@TZahavy

3 years

In our #Neurips2021 spotlight, we study RL problems where the goal is to minimize a cost over the state occupancy. When this cost is linear, we get the standard RL problem. When it is non-linear, we get apprenticeship learning, pure exploration, diversity and more. [1/7]

3

30

159

Tom Zahavy

@TZahavy

2 years

Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:

5

27

152

Tom Zahavy

@TZahavy

2 years

Super excited to share that our Bootstrapped Meta Learning paper led by @flennerhag received an Outstanding Paper Award from #iclr2022 Better meta learning -> doubled the performance of STACX in Atari to a new SOTA. Come talk with us at the poster session!

Sebastian Flennerhag

@flennerhag

3 years

What should a meta-learner optimize? What if we make it chase its own future outputs? Turns out, it can improve meta-optimization, set new SOTAs, and lead to new types of meta-learning. w. Y. Schroecker, @tomzhavy , @hado , D. Silver, S. Singh. 🧵👇

4

39

185

0

9

93

Tom Zahavy

@TZahavy

3 years

A rejection story with a happy end. A paper from my #Phd was accepted to #ICML2021 after 4-5 rejections (I lost count honestly). Each time we had reviewers that liked it and some that didn’t. Believing in it and keep improving it over time eventually got it in. Don’t loose hope!

Khimya

@khimya

3 years

2/2 #ICML2021 submissions rejected. I would like to thank all parties involved. Also, this marks 3 consecutive years of #PhD #fellowship #rejections . But hey, who cares, I am a rising 🌟 in EECS 😂 and will rise above this one day✌️Encourage you to share your rejection speech👩‍🏫

6

10

520

4

80

Tom Zahavy

@TZahavy

28 days

Very excited to share AlphaProof, an agent that self-taught itself Mathematics in Lean and achieved a silver-medal standard in the International Math Olympiad 🥈🥈🥈🥈 @leanprover is a functional programming language for formal Mathematics and a theorem prover. It enables you to

Google DeepMind

@GoogleDeepMind

28 days

We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵

306

1K

5K

3

15

71

Tom Zahavy

@TZahavy

2 years

Late on arXiv (oral @CoLLAs_Conf ): @jelennal_ who did a fantastic internship with us at the Discovery team @DeepMind studies how adding context to meta gradients can help agents to adapt when the environment changes. Thanks for sharing @_akhaliq

AK

@_akhaliq

2 years

Meta-Gradients in Non-Stationary Environments abs:

0

13

57

0

11

63

Tom Zahavy

@TZahavy

2 years

Happy to share that DOMiNO has been accepted to ICLR, details in the 🧵

Tom Zahavy

@TZahavy

2 years

Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:

5

27

152

1

6

62

Tom Zahavy

@TZahavy

2 years

Our internship program @DeepMind opens for applications today. At the Discovery team we look for brilliant and creative people with a passion for RL to join us. Please apply! and DM me if you have specific questions about working with us 🙋

Google DeepMind

Artificial intelligence could be one of humanity’s most useful inventions. We research and build safe artificial intelligence systems. We're committed to solving intelligence, to advance science...

deepmind.google

0

11

61

Tom Zahavy

@TZahavy

2 years

Excited about meta-gradients (MGs)? join our intern @jelennal_ tomorrow at @aloeworkshop to hear about a systematic study of MGs in non stationary RL environments. We study how adding a context to MGs can assist in adapting to changes and visualise what do these MGs learn.

2

9

51

Tom Zahavy

@TZahavy

4 years

Excited that our paper "Discovering a set of policies for the worst case reward" was accepted to #ICLR2021 as a spotlight! We present a policy iteration algorithm that discovers meaningful behaviours in the no-reward setting by maximising robustness!

1

8

50

Tom Zahavy

@TZahavy

3 years

Which reward is enough? Is the reward a gradient? Of what? Come and chat with me tomorrow at the Unsupervised RL workshop poster session on my cherry pie recipe ,

4

6

44

Tom Zahavy

@TZahavy

3 years

I had a fantastic time talking with @ecsquendor , @ykilcher and @RobertTLange about my deep RL journey, automatic discovery of structure and our recent metagradient work @deepmind , you can watch it at

#49 - Meta-Gradients in RL - Dr. Tom Zahavy (DeepMind)

The race is on, we are on a collective mission to understand and create artificial general intelligence. Dr. Tom Zahavy, a Research Scientist at DeepMind thi...

www.youtube.com

Machine Learning Street Talk

@MLStreetTalk

3 years

We go into full inception mode with @DeepMind research scientist Dr. Tom Zahavy @TZahavy and discuss metagradients in reinforcement learning! With @ykilcher , @RobertTLange and @ecsquendor

2

6

49

1

8

43

Tom Zahavy

@TZahavy

2 years

Super proud to share ReLoad (), an RL agent that converges in challenging constrained RL problems. The project was led by our fantastic intern @ted_moskovitz at the discovery team last summer. A thread 🧵

Ted Moskovitz

@ted_moskovitz

2 years

Tired of ChatGPT screenshots? Miss the old days of watching RL agents walking around doing weird stuff? Look no further–I'm excited to share my @DeepMind internship project, where we develop a method to stabilize optimization in constrained RL. Link: 🧵

10

64

723

2

8

41

Tom Zahavy

@TZahavy

1 year

Training RL agents to satisfy constraints is important for safety and efficiency. But standard methods often oscillate between maximizing reward and satisfying constraints. With @ted_moskovitz we propose a solution that converges: Reload. Visit us @icml !

Google DeepMind

@GoogleDeepMind

1 year

What are some of the papers we’ll be talking about? 🔵 Using VLMs to help train embodied agents 🔵 Introducing a new family of recurrent neural networks that perform better on long-term reasoning tasks 🔵 Better training RL algorithms within constraints for AI safety

1

3

34

0

9

39

Tom Zahavy

@TZahavy

3 years

Reward is enough 😹

Steve Stewart-Williams

@SteveStuWill

3 years

Rat uses pencil to activate trap and get food 😲

39

185

1K

1

36

Tom Zahavy

@TZahavy

2 years

Excited to be traveling for a conference again, and in particular, to my third #EWRL ! I will give a talk about Domino , our recent paper on discovering high quality diverse policies. Ping me if you want to chat about diversity, convex RL or anything else:)

Discovering Policies with DOMiNO: Diversity Optimization...

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies...

arxiv.org

Tom Zahavy

@TZahavy

2 years

Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:

5

27

152

1

34

Tom Zahavy

@TZahavy

1 year

I couldn't make it to #ICLR this year :( but pls check our virtual poster for DOMiNO and DM me if you have questions!

Tom Zahavy

@TZahavy

2 years

Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:

5

27

152

0

6

27

Tom Zahavy

@TZahavy

3 years

Wondering what’s a monkey median score over the 57 Atari games would look like 🧐

TechCrunch

@TechCrunch

3 years

Watch a monkey equipped with Elon Musk's Neuralink device play Pong with its brain by @etherington

151

888

4K

3

1

26

Tom Zahavy

@TZahavy

3 years

200X faster DRL training in 6 years — Sebulba is all you need. Huge thanks to this team for this amazing project #JAX #TPU #DRL

0

24

Tom Zahavy

@TZahavy

3 years

Predicting the value of auxiliary policies, in an off-policy manner, can significantly improve #DRL agents (unreal, stacx). In a recent work led by Ray Jiang @icmlconf we study how to improve this off-policy learning setup and design empathic td algorithms, suited for DRL agents.

AK

@_akhaliq

3 years

Emphatic Algorithms for Deep Reinforcement Learning pdf: abs:

1

15

0

23

Tom Zahavy

@TZahavy

2 years

I disagree. Building is not understanding. Computers excel in predicting future from past. They scale with compute and history. We are more predictable than we think. But the future is not fixed. We are creative, diverse, irrational and incomputable. The prize has a long tail.

Richard Sutton

@RichardSSutton

2 years

The case for ambition in artificial intelligence research: Within your lifetime, AI researchers will understand the principles of intelligence—what it is and how it works—well enough to create beings of far greater intelligence than current humans.

81

139

1K

3

0

22

Tom Zahavy

@TZahavy

4 months

We are now accepting applications for RS and RE positions:

DeepMind

boards.greenhouse.io

0

8

22

Tom Zahavy

@TZahavy

3 years

Interested to chat about this work? Please give us a visit at the #NeurIPS2021 poster session (16:30 GMT)!

Tom Zahavy

@TZahavy

3 years

In our #Neurips2021 spotlight, we study RL problems where the goal is to minimize a cost over the state occupancy. When this cost is linear, we get the standard RL problem. When it is non-linear, we get apprenticeship learning, pure exploration, diversity and more. [1/7]

3

30

159

0

5

21

Tom Zahavy

@TZahavy

11 months

Very happy to present AlphaZeroᵈᵇ this week at the MARL seminar, thanks for inviting me!

Eugene Vinitsky

@EugeneVinitsky

11 months

This week at the MARL seminar, @TZahavy presents new work extended the capabilities of AlphaZero through diversity: More details below: 👇

2

14

0

1

21

Tom Zahavy

@TZahavy

3 years

If you want to hear more about this work, please join us today at 17:00 (GMT) for the poster session and on Thu 11:45 for the spotlight and Q&A. #ICLR2021

Tom Zahavy

@TZahavy

4 years

Excited that our paper "Discovering a set of policies for the worst case reward" was accepted to #ICLR2021 as a spotlight! We present a policy iteration algorithm that discovers meaningful behaviours in the no-reward setting by maximising robustness!

1

8

50

1

21

Tom Zahavy

@TZahavy

9 months

Great article by @StephenOrnes @QuantaMagazine on our recent AlphaZeroᵈᵇ paper: !

Diversifying AI: Towards Creative Chess with AlphaZero

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots,...

arxiv.org

Quanta Magazine

@QuantaMagazine

9 months

Computer scientists at Google DeepMind have built a diverse array of AI systems, starting with AlphaZero, that can better solve chess problems. The implications of this work extend beyond games. @StephenOrnes reports:

4

60

166

1

6

20

Tom Zahavy

@TZahavy

2 years

Finally on my way to #NeurIPS2022 ! Pls reach out if you want to chat on anything RL (discovery, creativity, adaptivity, diversity or convex utilities) or if you want to hang out around Frenchman st 🥳

0

20

Tom Zahavy

@TZahavy

9 months

I had a great time chatting with @ron_itelman about diversity and creativity in AI. You can read more about this work in our recent AlphaZeroᵈᵇ paper: !

Diversifying AI: Towards Creative Chess with AlphaZero

In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots,...

arxiv.org

Ron Itelman

@ron_itelman

9 months

It was an honor and a privilege to have the esteemed @TZahavy , staff research scientist at @GoogleDeepMind , to share his research in creativity and problem solving, on Episode 11 of Principles of Designing Intelligence.

0

3

8

0

3

20

Tom Zahavy

@TZahavy

3 years

Totally forgot how good it feels to get back home from the office :)

0

1

20

Tom Zahavy

@TZahavy

2 years

Working on self-tuning agents like STACX and BMG, I was curious to understand what can agents benefit from adapting their hyper parameters while (meta) learning. In this brilliant paper led by @khimya we give one answer using an "ARUBA style" analysis of meta-RL. Details in the🧵

Khimya

@khimya

2 years

Happy to share our work ＊＊POMRL: No-Regret Learning-to-Plan with Increasing Horizons＊＊ at the intersection of online meta learning and planning in reinforcement learning jointly with 🐘vernadec @mastodon .social…

1

4

37

0

7

20

Tom Zahavy

@TZahavy

3 years

Check out this fantastic summary by @RobertTLange on our bootstrapped meta gradient paper!

Robert Lange

@RobertTLange

3 years

How to overcome short-sightedness⌛️& ill-conditioned outer objectives in meta-learning❓BMG constructs a bootstrap target by updating the learner for a couple more steps 👣 Provides a new ATARI SOTA 🕹️ & opens up new perspectives 🎨 #mlcollage [35/52] 📜:

2

8

42

0

1

20

Tom Zahavy

@TZahavy

1 year

Aloha 👋 I am in Honolulu for #ICML2023 and it will be great to meet old and new friends! Give me a ping and lets meet for coffee or chat. I am interested in anything RL, and more recently in general utilities, diversity and bounded rationality.

0

20

Tom Zahavy

@TZahavy

4 years

Very excited about this work! We present an algorithm that learns from demonstrations in personalized medicine. The idea is to model each patient as an MDP and generalize to new patients. We show, in theory & practice, that the right way to do it is via IRL (not BC/IL/AL)!

Technion - Reinforcement Learning Research Labs

@Technion_RL

4 years

Interested in #RealWorld applications of #ReinforcementLearning ? Check out our recent work, “Inverse Reinforcement Learning in Contextual MDPs” where we learn from clinicians how to treat patients with Sepsis! Paper: Code:

1

12

0

3

18

Tom Zahavy

@TZahavy

1 year

Aloha!

Ted Moskovitz

@ted_moskovitz

1 year

Happy to say ReLOAD hopped its way into #ICML2023 —looking forward to seeing everyone in Hawaii!!

2

3

37

1

0

17

Tom Zahavy

@TZahavy

3 years

Check out our recent work led by @vivek_veeriah on using meta gradients to #discover useful sub goals and options that maximize them. And thanks @RobertTLange for another fantastic summary!

Robert Lange

@RobertTLange

3 years

🤖 How can we learn useful temporal abstractions that transfer across tasks? Veeriah et al. (21') propose to discover options by optimizing their parametrization via meta-∇ 🧠 Loved the idea to disentangle option reward & policy. #mlcollage [9/52] 📜:

1

12

44

0

2

17

Tom Zahavy

@TZahavy

3 years

Neural linear bandits combine a dnn with a linear contextual bandit on top of it’s last layer to explore efficiently. In our @icmlconf paper led by Ofir Nabati, we address the problem of catastrophic forgetting (due to feature learning), and propose a moment matching solution.

Technion - Reinforcement Learning Research Labs

@Technion_RL

3 years

Deep networks combined with finite memory poses a problem for bandit-based algorithms. In this blog post, Ofir explains their method for overcoming the drift in the learned features and avoiding catastrophic forgetting!

0

2

5

1

0

16

Tom Zahavy

@TZahavy

4 years

Finally some silence #ICML2021 #DeadlineDay

0

15

Tom Zahavy

@TZahavy

3 years

Nice summary by @Synced_Global on our recent #ICML2021 paper: “Empathic Algorithms for Deep RL”.

Synced

@Synced_Global

3 years

DeepMind & Amii Extend Emphatic Algorithms for Deep RL, Improving Performance on Atari Games | #AI #ML #ArtificialIntelligence #MachineLearning #Technology #ReinforcementLearning #DeepLearning

0

3

1

3

15

Tom Zahavy

@TZahavy

1 year

Exciting results by @RobertTLange on algorithms that discover learning without access to gradients!

Robert Lange

@RobertTLange

1 year

🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎? Excited to share my @DeepMind internship project and our #ICLR2023 paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉 📜:

2

90

386

0

2

14

Tom Zahavy

@TZahavy

1 year

Our experiments show that players in AlphaZeroᵈᵇ play chess differently, solve more puzzles together, outperform a more homogeneous team and specialize in different openings. Remarkably, some of the players solve the challenging @penrose positions. (5/n)

Will This Position Help Understand Human Consciousness?

It's a position computers struggle with, but chess players might solve instantly. Sir Roger Penrose from the Mathematical Institute of Oxford has started a public challenge that involves chess, in...

www.chess.com

1

4

15

Tom Zahavy

@TZahavy

3 years

Another virtual conference is around the corner, so its a good time to thank the organizers and also everyone who came to talk with me! I enjoyed the discussions, suggestions and constructive feedback a lot this time, this is what conferences are for! #ICLR2021

2

0

14

Tom Zahavy

@TZahavy

2 years

ChatGPT to hallucinate and win!

ChatGPT Just Solved Chess

Anarchy Chess Post: http://bit.ly/3jLT4JZ➡️ ️Get My Chess Courses: https://www.chessly.com/➡️ Get my best-selling chess book: https://geni.us/gothamchess➡️ ...

www.youtube.com

jorbs

@JoINrbs

2 years

this is such an incredible illustration. stockfish (white) plays chatgpt (black) (source: )

250

1K

6K

3

1

13

Tom Zahavy

@TZahavy

8 months

FunSearch is one of these projects that really surprised me when I first heard about it, and even impressed me more as I further learned about it. Huge congrats to the team!

Google DeepMind

@GoogleDeepMind

8 months

Introducing FunSearch in @Nature : a method using large language models to search for new solutions in mathematics & computer science. 🔍 It pairs the creativity of an LLM with an automated evaluator to guard against hallucinations and incorrect ideas. 🧵

48

517

2K

0

1

13

Tom Zahavy

@TZahavy

5 years

On the importance of randomization for decomposition in reinforcement learning, @ALT2020 . Joint work with Avinatan H., Haim K., and Yishay M.

0

1

12

Tom Zahavy

@TZahavy

1 year

Yes! 🧙 AlphaZeroᵈᵇ discovers a group of strong chess players with distinct playing styles. It models the players with a single, player-conditioned architecture and encourages them to play differently using behavioural and response diversity techniques. (4/n)

1

2

12

Tom Zahavy

@TZahavy

1 year

This research was a special journey for me. I would like to thank my collaborators @vivek_veeriah , @shaobohou , @kdub0 , Matthew Lai, @eleurent , @weballergy , @miouantoinette , @demishassabis and Satinder Singh amongst many more friends @GoogleDeepMind . (6/n)

0

1

10

Tom Zahavy

@TZahavy

3 years

@ylecun @HazanPrinceton The reward is a gradient

1

0

10

Tom Zahavy

@TZahavy

2 years

Work done by the Discovery team @DeepMind , with @TZahavy , Yannick Schroecker, @FeryalMP , @katebaumli , @flennerhag , @shaobohou and Satinder Singh. DOMiNO's🍕 in Dog stand:

1

9

Tom Zahavy

@TZahavy

1 year

We therefore take inspiration from humans, who are also computationally bounded, to improve AI performance. In “The Diversity Bonus”, @Scott_E_Page observed that diverse teams of thinkers outperform homogeneous teams in complex tasks. Can we bring these ideas to AI? (3/n)

1

2

9

Tom Zahavy

@TZahavy

24 days

Two clarifications: We have a second post for RE roles in the team Both roles are London based

DeepMind

boards.greenhouse.io

1

9

Tom Zahavy

@TZahavy

11 months

Unable to grasp the horror.

Hen Mazzig

@HenMazzig

11 months

More than 50 Israeli mothers, babies, man were kidnapped by Hamas today from Israel and brutally taken into the Gaza Strip. This could be your mother

500

2K

5K

1

0

9

Tom Zahavy

@TZahavy

3 years

This team have a fantastic ability to summarize knowledge, ask challenging questions and produce high quality content. They are doing a great service to the ML community so make sure you follow them!

0

1

8

Tom Zahavy

@TZahavy

1 year

♗♞ Chess is a complex game with many possible moves. Even though modern chess programs can play superhuman chess, they have their blind spots. For examples, check out this article by @Sam_Copeland , and read our paper to learn more! (2/n)

10 Positions Chess Engines Just Don't Understand

Since IBM's Deep Blue defeated World Chess Champion Garry Kasparov in their 1997 match, chess engines have only increased dramatically in strength and understanding. Today, the best chess engines are...

www.chess.com

1

2

8

Tom Zahavy

@TZahavy

2 years

A reminder that the internship applications for @DeepMind CLOSE this Friday

Tom Zahavy

@TZahavy

2 years

Our internship program @DeepMind opens for applications today. At the Discovery team we look for brilliant and creative people with a passion for RL to join us. Please apply! and DM me if you have specific questions about working with us 🙋

0

11

61

0

1

8

Tom Zahavy

@TZahavy

4 years

Work by @TZahavy , @andre_s_barreto , @bodonoghue85 , @DJ_Mankowitz , Shaobo Hou, @hbq111 and Satinder Baveja Singh

0

7

Tom Zahavy

@TZahavy

2 years

Almost… #NeurIPS2022

0

7

Tom Zahavy

@TZahavy

3 years

Are data scientists the new rockstars? I am recently being harassed by a bot pretending to be “a single women looking to meet people”. It uses the same font in bio + deep fake images & videos generated from the same network. @Twitter please help

0

1

7

Tom Zahavy

@TZahavy

2 years

Our reward has three components. (1) Extrinsic reward. (2) The gradient of a diversity objective, defined as a non-linear function of the state occupancies (Hausdorff distance, Van Der Waals). (3) A multi-objective mechanism to combine the rewards. DOMiNO's🍕 in Walker stand:

1

0

7

Tom Zahavy

@TZahavy

3 years

Reviewer #2 : only one seed?

0

7

Tom Zahavy

@TZahavy

4 years

Very nice blog post by @RobertTLange covering old and new work on meta gradients!

Robert Lange

@RobertTLange

4 years

This includes a lot of recent #NeurIPS2020 work by @zhongwen2009 , @TZahavy , @junh_oh as well as fundamental work by @rein_houthooft , @LouisKirschAI and many others. P.S.: Here is also a pdf version of the post 📚

0

1

15

0

1

7

Tom Zahavy

@TZahavy

4 years

Using JAX’s auto-diff abilities together with its RL libraries made meta gradient research much easier for me. I Highly recommend the JAX ecosystem to everyone!

Google DeepMind

@GoogleDeepMind

4 years

In a new blog post, @davidmbudden and @matteohessel discuss how JAX has helped accelerate our mission, and describe an ecosystem of open source libraries that have been developed to make JAX even better for machine learning researchers everywhere:

4

116

560

0

1

6

Tom Zahavy

@TZahavy

3 years

@mooopan We had a very related work on that a while back but with a DQN. The idea was to use the representation of the DQN for LSPI:

1

0

6

Tom Zahavy

@TZahavy

2 years

@NandoDF @ylecun @sirbayes Nice! If you use this gradient as a reward you can also retrieve almost any unsupervised RL method:

1

0

6

Tom Zahavy

@TZahavy

5 years

In a new paper, , we study the projection method of Abbeel and Ng (2004) for Apprenticeship Learning. We show that it is, in fact, an instantiation of the Frank-Wolfe method -- a projection free method for convex optimization. (1/5)

1

12

6

Tom Zahavy

@TZahavy

1 year

Diversity is all you need!

Just ANIMALS 🐾🌍

@JustAnimalss_

1 year

105

2K

11K

0

6

Tom Zahavy

@TZahavy

3 years

@bodonoghue85 Meta-lica

0

6

Tom Zahavy

@TZahavy

2 years

@ted_moskovitz @bodonoghue85 @kevinjmiller10 Mmm not sure, have you tried intern descent?

1

0

5

Tom Zahavy

@TZahavy

4 years

New GOAT debate! ⛹🏾🤖🐐

Tomer Galanti

@GalantiTomer

4 years

I guess only the ML community would understand.. 😄

3

1

118

1

0

5

Tom Zahavy

@TZahavy

1 year

AI ∩ chess twitter: I found this method peculiar and decided to self-participate. Its suggests overfitting to puzzles by repeatedly solving them. Do we humans not overfit? Or is there something special about puzzles that makes them hard to overfit on? 🤔

2

1

5

Tom Zahavy

@TZahavy

2 years

With great collaborators @jelennal_ , @flennerhag , Yannick Schroecker, @dabelcs , @TZahavy and Satinder Singh. #ICLR22 More details in our paper:

0

5

Tom Zahavy

@TZahavy

2 years

@ted_moskovitz @DeepMind @bodonoghue85 Excited to have you in the Discovery team @ted_moskovitz !

0

5

Tom Zahavy

@TZahavy

2 years

Lastly, thanks again for @ted_moskovitz for doing a brilliant work on this, and to our co-authors and advisors at @DeepMind n/n

0

4

Tom Zahavy

@TZahavy

3 years

@hardmaru Reminds me of

Failures of Gradient-Based Deep Learning

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art. However, it is important, for both theoreticians and...

arxiv.org

0

3

Tom Zahavy

@TZahavy

3 years

@ylecun @HazanPrinceton In its dual formulation, RL is bilinear in the reward and the state occupancy, hence, the reward is indeed its (negative) gradient w.r.t the state occupancy.

1

0

4

Tom Zahavy

@TZahavy

2 years

@CULLYAntoine Thanks! Waiting to hear your thoughts :)

0

3

Tom Zahavy

@TZahavy

6 months

@du_yilun Interesting work! You might find our recent paper relevant

1

3

Tom Zahavy

@TZahavy

3 years

And in case you missed them, please check out our works: Meta gradients in constrained MDPs: Discovering a set of policies for the worst case reward:

0

3

Tom Zahavy

@TZahavy

5 months

@ted_moskovitz 🫶

1

0

3

Tom Zahavy

@TZahavy

2 years

@RobertTLange @flennerhag Welcome to the team Rob!

0

3

Tom Zahavy

@TZahavy

2 years

One example is DOMINO (): an RL agent that discovers high quality and diverse policies by maximizing diversity under the constraint of being nearly optimal and demonstrates robustness to perturbed environments. 3/n

1

0

3

Tom Zahavy

@TZahavy

3 years

On a personal note, I've been working on this problem for the last few years, focusing mainly on apprenticeship learning (, ). I am very excited that we were able to generalize this framework to cover many important problems [7/7].

1

0

3

Tom Zahavy

@TZahavy

2 years

But gradient ascent descent only guarantees that the average (over training) of the policies converges, and it is impossible to guarantee that the actual policy is good at any point in time! 4/n

1

0

3

Tom Zahavy

@TZahavy

3 years

@CsabaSzepesvari @peter_richtarik Fight gate keeping - when a reviewer targets a specific paper from conference to conference. This is something that is happening and we need to monitor and stop it.

0

3

Tom Zahavy

@TZahavy

4 years

@BachFrancis It will be great if we can get a clarification about this, as it is being actively ignored atm

0

3

Tom Zahavy

@TZahavy

11 months

@MikeE_3_14 @HazanPrinceton @Princeton @ml_norms it's never too late

1

0

1

Tom Zahavy

@TZahavy

5 years

Just finished with my ICML submissions and on my way to AAAI & ALT !!! Starting the week at the GenPlan workshop. We present a work on Inverse RL in contextual MDPs, where we show, theoretically and empirically, zero shot transfer.

GenPlan20 - Program

Overview

sites.google.com

1

3

Tom Zahavy

@TZahavy

2 years

@aditimavalankar @DeepMind Welcome back Aditi!

1

0

3

Tom Zahavy

@TZahavy

3 years

We then show how to reformulate the problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature. [3/7]

1

0

3

Tom Zahavy

@TZahavy

2 years

@khimya @white_martha @jpineau1 @prakash127 @rllabmcgill @Mila_Quebec @mcgillu Congrats Khimya, great job!

0

3

Tom Zahavy

@TZahavy

3 years

We then show that for a specific instance of the meta algorithm, the non stationary reward is simply the gradient of the cost with respect to the state occupancy of the previous policies. [5/7]

1

0

3

Tom Zahavy

@TZahavy

3 years

Oldies but goldies 😂

Arash Markazi

@ArashMarkazi

3 years

Carmelo Anthony — 37 years old LeBron James – 36 years old Trevor Ariza – 36 years old Marc Gasol – 36 years old Dwight Howard – 35 years old (turning 36 Dec. 8) Wayne Ellington – 33 years old Kent Bazemore – 32 years old Russell Westbrook — 32 years old

642

1K

7K

0

3

Tom Zahavy

@TZahavy

4 years

Follow the regularized leader 🧝🏻‍♀️

Brendan O'Donoghue

@bodonoghue85

4 years

End the duality gap! 💪

0

1

7

0

3

Tom Zahavy

@TZahavy

3 years

@pcastr @JohnCLangford On a similar fashion, one thing that I found really nice in #UAI was the discussant format. An author from each accepted paper was requested to prepare a slide with a question / discussion about another paper, which really improved the Q&A.

0

3

Tom Zahavy

@TZahavy

3 years

See you soon!

0

2

Tom Zahavy

@TZahavy

2 years

We demonstrate that our QD policies are robust to environment perturbations, by training DOMiNO in a "baseline" domain and highlighting the QD policies that are robust to domain perturbations.

1

0

2

Tom Zahavy

@TZahavy

2 years

Confused? consider a simple simple saddle point problem. The dynamics of gradient ascent descent (orange) spirals away from the optimal point, but an OPTIMISTIC version of the gradient bends inside and converges 🧙🧙‍♂️🧙‍♀️ 5/n

1

0

2