Tom Zahavy Profile Banner
Tom Zahavy Profile
Tom Zahavy

@TZahavy

2,071
Followers
344
Following
27
Media
397
Statuses

Building agents that discover knowledge and get better at doing so over time. Staff research scientist @GoogleDeepMind

London, England
Joined December 2018
Don't wanna be here? Send us removal request.
Pinned Tweet
@TZahavy
Tom Zahavy
1 year
I'm super excited to share AlphaZeroᵈᵇ, a team of diverse #AlphaZero agents that collaborate to solve #Chess puzzles and demonstrate increased creativity. Check out our paper to learn more! A quick 🧵(1/n)
Tweet media one
5
72
336
@TZahavy
Tom Zahavy
5 months
We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We build AI agents that discover new knowledge using RL, planning and LLMs. DM me if you have specific questions about working with us 🙏
6
33
287
@TZahavy
Tom Zahavy
24 days
We are looking for brilliant and creative candidates with strong programming skills to join us at the Discovery team at @GoogleDeepMind 🧙 We are building AI agents that create new knowledge using RL, planning and LLMs in domains like Mathematics, chess and more. Please apply
8
30
256
@TZahavy
Tom Zahavy
3 years
In our #Neurips2021 spotlight, we study RL problems where the goal is to minimize a cost over the state occupancy. When this cost is linear, we get the standard RL problem. When it is non-linear, we get apprenticeship learning, pure exploration, diversity and more. [1/7]
Tweet media one
3
30
159
@TZahavy
Tom Zahavy
2 years
Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:
5
27
152
@TZahavy
Tom Zahavy
2 years
Super excited to share that our Bootstrapped Meta Learning paper led by @flennerhag received an Outstanding Paper Award from #iclr2022 Better meta learning -> doubled the performance of STACX in Atari to a new SOTA. Come talk with us at the poster session!
@flennerhag
Sebastian Flennerhag
3 years
What should a meta-learner optimize? What if we make it chase its own future outputs? Turns out, it can improve meta-optimization, set new SOTAs, and lead to new types of meta-learning. w. Y. Schroecker, @tomzhavy , @hado , D. Silver, S. Singh. 🧵👇
Tweet media one
4
39
185
0
9
93
@TZahavy
Tom Zahavy
3 years
A rejection story with a happy end. A paper from my #Phd was accepted to #ICML2021 after 4-5 rejections (I lost count honestly). Each time we had reviewers that liked it and some that didn’t. Believing in it and keep improving it over time eventually got it in. Don’t loose hope!
@khimya
Khimya
3 years
2/2 #ICML2021 submissions rejected. I would like to thank all parties involved. Also, this marks 3 consecutive years of #PhD #fellowship #rejections . But hey, who cares, I am a rising 🌟 in EECS 😂 and will rise above this one day✌️Encourage you to share your rejection speech👩‍🏫
6
10
520
4
4
80
@TZahavy
Tom Zahavy
28 days
Very excited to share AlphaProof, an agent that self-taught itself Mathematics in Lean and achieved a silver-medal standard in the International Math Olympiad 🥈🥈🥈🥈 @leanprover is a functional programming language for formal Mathematics and a theorem prover. It enables you to
@GoogleDeepMind
Google DeepMind
28 days
We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵
306
1K
5K
3
15
71
@TZahavy
Tom Zahavy
2 years
Late on arXiv (oral @CoLLAs_Conf ): @jelennal_ who did a fantastic internship with us at the Discovery team @DeepMind studies how adding context to meta gradients can help agents to adapt when the environment changes. Thanks for sharing @_akhaliq
@_akhaliq
AK
2 years
Meta-Gradients in Non-Stationary Environments abs:
Tweet media one
0
13
57
0
11
63
@TZahavy
Tom Zahavy
2 years
Happy to share that DOMiNO has been accepted to ICLR, details in the 🧵
@TZahavy
Tom Zahavy
2 years
Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:
5
27
152
1
6
62
@TZahavy
Tom Zahavy
2 years
Our internship program @DeepMind opens for applications today. At the Discovery team we look for brilliant and creative people with a passion for RL to join us. Please apply! and DM me if you have specific questions about working with us 🙋
0
11
61
@TZahavy
Tom Zahavy
2 years
Excited about meta-gradients (MGs)? join our intern  @jelennal_ tomorrow at @aloeworkshop to hear about a systematic study of MGs in non stationary RL environments. We study how adding a context to MGs can assist in adapting to changes and visualise what do these MGs learn.
Tweet media one
Tweet media two
2
9
51
@TZahavy
Tom Zahavy
4 years
Excited that our paper "Discovering a set of policies for the worst case reward" was accepted to #ICLR2021 as a spotlight!   We present a policy iteration algorithm that discovers meaningful behaviours in the no-reward setting by maximising robustness!
1
8
50
@TZahavy
Tom Zahavy
3 years
Which reward is enough? Is the reward a gradient? Of what? Come and chat with me tomorrow at the Unsupervised RL workshop poster session on my cherry pie recipe ,
Tweet media one
Tweet media two
4
6
44
@TZahavy
Tom Zahavy
3 years
I had a fantastic time talking with @ecsquendor , @ykilcher and @RobertTLange about my deep RL journey, automatic discovery of structure and our recent metagradient work @deepmind , you can watch it at
@MLStreetTalk
Machine Learning Street Talk
3 years
We go into full inception mode with @DeepMind research scientist Dr. Tom Zahavy @TZahavy and discuss metagradients in reinforcement learning! With @ykilcher , @RobertTLange and @ecsquendor
Tweet media one
2
6
49
1
8
43
@TZahavy
Tom Zahavy
2 years
Super proud to share ReLoad (), an RL agent that converges in challenging constrained RL problems. The project was led by our fantastic intern @ted_moskovitz at the discovery team last summer. A thread 🧵
@ted_moskovitz
Ted Moskovitz
2 years
Tired of ChatGPT screenshots? Miss the old days of watching RL agents walking around doing weird stuff? Look no further–I'm excited to share my @DeepMind internship project, where we develop a method to stabilize optimization in constrained RL. Link: 🧵
10
64
723
2
8
41
@TZahavy
Tom Zahavy
1 year
Training RL agents to satisfy constraints is important for safety and efficiency. But standard methods often oscillate between maximizing reward and satisfying constraints. With @ted_moskovitz we propose a solution that converges: Reload. Visit us @icml !
@GoogleDeepMind
Google DeepMind
1 year
What are some of the papers we’ll be talking about? 🔵 Using VLMs to help train embodied agents 🔵 Introducing a new family of recurrent neural networks that perform better on long-term reasoning tasks 🔵 Better training RL algorithms within constraints for AI safety
Tweet media one
1
3
34
0
9
39
@TZahavy
Tom Zahavy
3 years
Reward is enough 😹
@SteveStuWill
Steve Stewart-Williams
3 years
Rat uses pencil to activate trap and get food 😲
39
185
1K
1
1
36
@TZahavy
Tom Zahavy
2 years
Excited to be traveling for a conference again, and in particular, to my third #EWRL ! I will give a talk about Domino , our recent paper on discovering high quality diverse policies. Ping me if you want to chat about diversity, convex RL or anything else:)
@TZahavy
Tom Zahavy
2 years
Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:
5
27
152
1
1
34
@TZahavy
Tom Zahavy
1 year
I couldn't make it to #ICLR this year :( but pls check our virtual poster for DOMiNO and DM me if you have questions!
@TZahavy
Tom Zahavy
2 years
Excited to share DOMiNO, a method to discover qualitative-diverse policies using a single latent-conditioned architecture and the "reward is enough" principle. Read more about it here: DOMiNO's🍕 in Walker walk:
5
27
152
0
6
27
@TZahavy
Tom Zahavy
3 years
Wondering what’s a monkey median score over the 57 Atari games would look like 🧐
@TechCrunch
TechCrunch
3 years
Watch a monkey equipped with Elon Musk's Neuralink device play Pong with its brain by @etherington
151
888
4K
3
1
26
@TZahavy
Tom Zahavy
3 years
200X faster DRL training in 6 years — Sebulba is all you need. Huge thanks to this team for this amazing project #JAX #TPU #DRL
0
0
24
@TZahavy
Tom Zahavy
3 years
Predicting the value of auxiliary policies, in an off-policy manner, can significantly improve #DRL agents (unreal, stacx). In a recent work led by Ray Jiang @icmlconf we study how to improve this off-policy learning setup and design empathic td algorithms, suited for DRL agents.
@_akhaliq
AK
3 years
Emphatic Algorithms for Deep Reinforcement Learning pdf: abs:
Tweet media one
Tweet media two
Tweet media three
1
1
15
0
0
23
@TZahavy
Tom Zahavy
2 years
I disagree. Building is not understanding. Computers excel in predicting future from past. They scale with compute and history. We are more predictable than we think. But the future is not fixed. We are creative, diverse, irrational and incomputable. The prize has a long tail.
@RichardSSutton
Richard Sutton
2 years
The case for ambition in artificial intelligence research: Within your lifetime, AI researchers will understand the principles of intelligence—what it is and how it works—well enough to create beings of far greater intelligence than current humans.
81
139
1K
3
0
22
@TZahavy
Tom Zahavy
4 months
We are now accepting applications for RS and RE positions:
0
8
22
@TZahavy
Tom Zahavy
3 years
Interested to chat about this work? Please give us a visit at the #NeurIPS2021 poster session (16:30 GMT)!
@TZahavy
Tom Zahavy
3 years
In our #Neurips2021 spotlight, we study RL problems where the goal is to minimize a cost over the state occupancy. When this cost is linear, we get the standard RL problem. When it is non-linear, we get apprenticeship learning, pure exploration, diversity and more. [1/7]
Tweet media one
3
30
159
0
5
21
@TZahavy
Tom Zahavy
11 months
Very happy to present AlphaZeroᵈᵇ this week at the MARL seminar, thanks for inviting me!
@EugeneVinitsky
Eugene Vinitsky
11 months
This week at the MARL seminar, @TZahavy presents new work extended the capabilities of AlphaZero through diversity: More details below: 👇
Tweet media one
2
2
14
0
1
21
@TZahavy
Tom Zahavy
3 years
If you want to hear more about this work, please join us today at 17:00 (GMT) for the poster session and on Thu 11:45 for the spotlight and Q&A. #ICLR2021
@TZahavy
Tom Zahavy
4 years
Excited that our paper "Discovering a set of policies for the worst case reward" was accepted to #ICLR2021 as a spotlight!   We present a policy iteration algorithm that discovers meaningful behaviours in the no-reward setting by maximising robustness!
1
8
50
1
1
21
@TZahavy
Tom Zahavy
9 months
Great article by @StephenOrnes @QuantaMagazine on our recent AlphaZeroᵈᵇ paper: !
@QuantaMagazine
Quanta Magazine
9 months
Computer scientists at Google DeepMind have built a diverse array of AI systems, starting with AlphaZero, that can better solve chess problems. The implications of this work extend beyond games. @StephenOrnes reports:
4
60
166
1
6
20
@TZahavy
Tom Zahavy
2 years
Finally on my way to #NeurIPS2022 ! Pls reach out if you want to chat on anything RL (discovery, creativity, adaptivity, diversity or convex utilities) or if you want to hang out around Frenchman st 🥳
0
0
20
@TZahavy
Tom Zahavy
9 months
I had a great time chatting with @ron_itelman about diversity and creativity in AI. You can read more about this work in our recent AlphaZeroᵈᵇ paper: !
@ron_itelman
Ron Itelman
9 months
It was an honor and a privilege to have the esteemed @TZahavy , staff research scientist at @GoogleDeepMind , to share his research in creativity and problem solving, on Episode 11 of Principles of Designing Intelligence.
0
3
8
0
3
20
@TZahavy
Tom Zahavy
3 years
Totally forgot how good it feels to get back home from the office :)
0
1
20
@TZahavy
Tom Zahavy
2 years
Working on self-tuning agents like STACX and BMG, I was curious to understand what can agents benefit from adapting their hyper parameters while (meta) learning. In this brilliant paper led by @khimya we give one answer using an "ARUBA style" analysis of meta-RL. Details in the🧵
@khimya
Khimya
2 years
Happy to share our work **POMRL: No-Regret Learning-to-Plan with Increasing Horizons** at the intersection of online meta learning and planning in reinforcement learning jointly with 🐘vernadec @mastodon .social…
Tweet media one
1
4
37
0
7
20
@TZahavy
Tom Zahavy
3 years
Check out this fantastic summary by @RobertTLange on our bootstrapped meta gradient paper!
@RobertTLange
Robert Lange
3 years
How to overcome short-sightedness⌛️& ill-conditioned outer objectives in meta-learning❓BMG constructs a bootstrap target by updating the learner for a couple more steps 👣 Provides a new ATARI SOTA 🕹️ & opens up new perspectives 🎨 #mlcollage [35/52] 📜:
Tweet media one
2
8
42
0
1
20
@TZahavy
Tom Zahavy
1 year
Aloha 👋 I am in Honolulu for #ICML2023 and it will be great to meet old and new friends! Give me a ping and lets meet for coffee or chat. I am interested in anything RL, and more recently in general utilities, diversity and bounded rationality.
0
0
20
@TZahavy
Tom Zahavy
4 years
Very excited about this work! We present an algorithm that learns from demonstrations in personalized medicine. The idea is to model each patient as an MDP and generalize to new patients. We show, in theory & practice, that the right way to do it is via IRL (not BC/IL/AL)!
@Technion_RL
Technion - Reinforcement Learning Research Labs
4 years
Interested in #RealWorld applications of #ReinforcementLearning ? Check out our recent work, “Inverse Reinforcement Learning in Contextual MDPs” where we learn from clinicians how to treat patients with Sepsis! Paper: Code:
1
1
12
0
3
18
@TZahavy
Tom Zahavy
1 year
Aloha!
@ted_moskovitz
Ted Moskovitz
1 year
Happy to say ReLOAD hopped its way into #ICML2023 —looking forward to seeing everyone in Hawaii!!
2
3
37
1
0
17
@TZahavy
Tom Zahavy
3 years
Check out our recent work led by @vivek_veeriah on using meta gradients to #discover useful sub goals and options that maximize them. And thanks @RobertTLange for another fantastic summary!
@RobertTLange
Robert Lange
3 years
🤖 How can we learn useful temporal abstractions that transfer across tasks? Veeriah et al. (21') propose to discover options by optimizing their parametrization via meta-∇ 🧠 Loved the idea to disentangle option reward & policy. #mlcollage [9/52] 📜:
Tweet media one
1
12
44
0
2
17
@TZahavy
Tom Zahavy
3 years
Neural linear bandits combine a dnn with a linear contextual bandit on top of it’s last layer to explore efficiently. In our @icmlconf paper led by Ofir Nabati, we address the problem of catastrophic forgetting (due to feature learning), and propose a moment matching solution.
Tweet media one
@Technion_RL
Technion - Reinforcement Learning Research Labs
3 years
Deep networks combined with finite memory poses a problem for bandit-based algorithms. In this blog post, Ofir explains their method for overcoming the drift in the learned features and avoiding catastrophic forgetting!
0
2
5
1
0
16
@TZahavy
Tom Zahavy
4 years
Finally some silence #ICML2021 #DeadlineDay
0
0
15
@TZahavy
Tom Zahavy
3 years
Nice summary by @Synced_Global on our recent #ICML2021 paper: “Empathic Algorithms for Deep RL”.
@Synced_Global
Synced
3 years
DeepMind & Amii Extend Emphatic Algorithms for Deep RL, Improving Performance on Atari Games | #AI #ML #ArtificialIntelligence #MachineLearning #Technology #ReinforcementLearning #DeepLearning
Tweet media one
0
3
1
1
3
15
@TZahavy
Tom Zahavy
1 year
Exciting results by @RobertTLange on algorithms that discover learning without access to gradients!
@RobertTLange
Robert Lange
1 year
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎? Excited to share my @DeepMind internship project and our #ICLR2023 paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉 📜:
2
90
386
0
2
14
@TZahavy
Tom Zahavy
1 year
Our experiments show that players in AlphaZeroᵈᵇ play chess differently, solve more puzzles together, outperform a more homogeneous team and specialize in different openings. Remarkably, some of the players solve the challenging @penrose positions. (5/n)
1
4
15
@TZahavy
Tom Zahavy
3 years
Another virtual conference is around the corner, so its a good time to thank the organizers and also everyone who came to talk with me! I enjoyed the discussions, suggestions and constructive feedback a lot this time, this is what conferences are for! #ICLR2021
2
0
14
@TZahavy
Tom Zahavy
2 years
ChatGPT to hallucinate and win!
@JoINrbs
jorbs
2 years
this is such an incredible illustration. stockfish (white) plays chatgpt (black) (source: )
250
1K
6K
3
1
13
@TZahavy
Tom Zahavy
8 months
FunSearch is one of these projects that really surprised me when I first heard about it, and even impressed me more as I further learned about it. Huge congrats to the team!
@GoogleDeepMind
Google DeepMind
8 months
Introducing FunSearch in @Nature : a method using large language models to search for new solutions in mathematics & computer science. 🔍 It pairs the creativity of an LLM with an automated evaluator to guard against hallucinations and incorrect ideas. 🧵
48
517
2K
0
1
13
@TZahavy
Tom Zahavy
5 years
On the importance of randomization for decomposition in reinforcement learning, @ALT2020 . Joint work with Avinatan H., Haim K., and Yishay M.
0
1
12
@TZahavy
Tom Zahavy
1 year
Yes! 🧙 AlphaZeroᵈᵇ discovers a group of strong chess players with distinct playing styles. It models the players with a single, player-conditioned architecture and encourages them to play differently using behavioural and response diversity techniques. (4/n)
Tweet media one
1
2
12
@TZahavy
Tom Zahavy
1 year
This research was a special journey for me. I would like to thank my collaborators @vivek_veeriah , @shaobohou , @kdub0 , Matthew Lai, @eleurent , @weballergy , @miouantoinette , @demishassabis and Satinder Singh amongst many more friends @GoogleDeepMind . (6/n)
0
1
10
@TZahavy
Tom Zahavy
3 years
@ylecun @HazanPrinceton The reward is a gradient
1
0
10
@TZahavy
Tom Zahavy
2 years
Work done by the Discovery team @DeepMind , with @TZahavy , Yannick Schroecker, @FeryalMP , @katebaumli , @flennerhag , @shaobohou and Satinder Singh. DOMiNO's🍕 in Dog stand:
1
1
9
@TZahavy
Tom Zahavy
1 year
We therefore take inspiration from humans, who are also computationally bounded, to improve AI performance. In “The Diversity Bonus”, @Scott_E_Page observed that diverse teams of thinkers outperform homogeneous teams in complex tasks. Can we bring these ideas to AI? (3/n)
1
2
9
@TZahavy
Tom Zahavy
24 days
Two clarifications: We have a second post for RE roles in the team Both roles are London based
1
1
9
@TZahavy
Tom Zahavy
11 months
Unable to grasp the horror.
@HenMazzig
Hen Mazzig
11 months
More than 50 Israeli mothers, babies, man were kidnapped by Hamas today from Israel and brutally taken into the Gaza Strip. This could be your mother
500
2K
5K
1
0
9
@TZahavy
Tom Zahavy
3 years
This team have a fantastic ability to summarize knowledge, ask challenging questions and produce high quality content. They are doing a great service to the ML community so make sure you follow them!
0
1
8
@TZahavy
Tom Zahavy
1 year
♗♞ Chess is a complex game with many possible moves. Even though modern chess programs can play superhuman chess, they have their blind spots. For examples, check out this article by @Sam_Copeland , and read our paper to learn more! (2/n)
1
2
8
@TZahavy
Tom Zahavy
2 years
A reminder that the internship applications for @DeepMind CLOSE this Friday
@TZahavy
Tom Zahavy
2 years
Our internship program @DeepMind opens for applications today. At the Discovery team we look for brilliant and creative people with a passion for RL to join us. Please apply! and DM me if you have specific questions about working with us 🙋
0
11
61
0
1
8
@TZahavy
Tom Zahavy
4 years
Work by @TZahavy , @andre_s_barreto , @bodonoghue85 , @DJ_Mankowitz , Shaobo Hou, @hbq111 and Satinder Baveja Singh
0
0
7
@TZahavy
Tom Zahavy
2 years
Almost… #NeurIPS2022
0
0
7
@TZahavy
Tom Zahavy
3 years
Are data scientists the new rockstars? I am recently being harassed by a bot pretending to be “a single women looking to meet people”. It uses the same font in bio + deep fake images & videos generated from the same network. @Twitter please help
0
1
7
@TZahavy
Tom Zahavy
2 years
Our reward has three components. (1) Extrinsic reward. (2) The gradient of a diversity objective, defined as a non-linear function of the state occupancies (Hausdorff distance, Van Der Waals). (3) A multi-objective mechanism to combine the rewards. DOMiNO's🍕 in Walker stand:
1
0
7
@TZahavy
Tom Zahavy
3 years
Reviewer #2 : only one seed?
0
0
7
@TZahavy
Tom Zahavy
4 years
Very nice blog post by @RobertTLange covering old and new work on meta gradients!
@RobertTLange
Robert Lange
4 years
This includes a lot of recent #NeurIPS2020 work by @zhongwen2009 , @TZahavy , @junh_oh as well as fundamental work by @rein_houthooft , @LouisKirschAI and many others. P.S.: Here is also a pdf version of the post 📚
Tweet media one
0
1
15
0
1
7
@TZahavy
Tom Zahavy
4 years
Using JAX’s auto-diff abilities together with its RL libraries made meta gradient research much easier for me. I Highly recommend the JAX ecosystem to everyone!
@GoogleDeepMind
Google DeepMind
4 years
In a new blog post, @davidmbudden and @matteohessel discuss how JAX has helped accelerate our mission, and describe an ecosystem of open source libraries that have been developed to make JAX even better for machine learning researchers everywhere:
Tweet media one
4
116
560
0
1
6
@TZahavy
Tom Zahavy
3 years
@mooopan We had a very related work on that a while back but with a DQN. The idea was to use the representation of the DQN for LSPI:
1
0
6
@TZahavy
Tom Zahavy
2 years
@NandoDF @ylecun @sirbayes Nice! If you use this gradient as a reward you can also retrieve almost any unsupervised RL method:
1
0
6
@TZahavy
Tom Zahavy
5 years
In a new paper, , we study the projection method of Abbeel and Ng (2004) for Apprenticeship Learning. We show that it is, in fact, an instantiation of the Frank-Wolfe method -- a projection free method for convex optimization. (1/5)
1
12
6
@TZahavy
Tom Zahavy
1 year
Diversity is all you need!
@JustAnimalss_
Just ANIMALS 🐾🌍
1 year
105
2K
11K
0
0
6
@TZahavy
Tom Zahavy
3 years
@bodonoghue85 Meta-lica
0
0
6
@TZahavy
Tom Zahavy
2 years
@ted_moskovitz @bodonoghue85 @kevinjmiller10 Mmm not sure, have you tried intern descent?
1
0
5
@TZahavy
Tom Zahavy
4 years
New GOAT debate! ⛹🏾🤖🐐
@GalantiTomer
Tomer Galanti
4 years
I guess only the ML community would understand.. 😄
Tweet media one
3
1
118
1
0
5
@TZahavy
Tom Zahavy
1 year
AI ∩ chess twitter: I found this method peculiar and decided to self-participate. Its suggests overfitting to puzzles by repeatedly solving them. Do we humans not overfit? Or is there something special about puzzles that makes them hard to overfit on? 🤔
Tweet media one
2
1
5
@TZahavy
Tom Zahavy
2 years
With great collaborators @jelennal_ , @flennerhag , Yannick Schroecker, @dabelcs , @TZahavy and Satinder Singh. #ICLR22 More details in our paper:
0
0
5
@TZahavy
Tom Zahavy
2 years
@ted_moskovitz @DeepMind @bodonoghue85 Excited to have you in the Discovery team @ted_moskovitz !
0
0
5
@TZahavy
Tom Zahavy
2 years
Lastly, thanks again for @ted_moskovitz for doing a brilliant work on this, and to our co-authors and advisors at @DeepMind n/n
0
0
4
@TZahavy
Tom Zahavy
3 years
@ylecun @HazanPrinceton In its dual formulation, RL is bilinear in the reward and the state occupancy, hence, the reward is indeed its (negative) gradient w.r.t the state occupancy.
1
0
4
@TZahavy
Tom Zahavy
2 years
@CULLYAntoine Thanks! Waiting to hear your thoughts :)
0
0
3
@TZahavy
Tom Zahavy
6 months
@du_yilun Interesting work! You might find our recent paper relevant
1
1
3
@TZahavy
Tom Zahavy
3 years
And in case you missed them, please check out our works: Meta gradients in constrained MDPs: Discovering a set of policies for the worst case reward:
0
0
3
@TZahavy
Tom Zahavy
5 months
1
0
3
@TZahavy
Tom Zahavy
2 years
@RobertTLange @flennerhag Welcome to the team Rob!
0
0
3
@TZahavy
Tom Zahavy
2 years
One example is DOMINO (): an RL agent that discovers high quality and diverse policies by maximizing diversity under the constraint of being nearly optimal and demonstrates robustness to perturbed environments. 3/n
1
0
3
@TZahavy
Tom Zahavy
3 years
On a personal note, I've been working on this problem for the last few years, focusing mainly on apprenticeship learning (, ). I am very excited that we were able to generalize this framework to cover many important problems [7/7].
1
0
3
@TZahavy
Tom Zahavy
2 years
But gradient ascent descent only guarantees that the average (over training) of the policies converges, and it is impossible to guarantee that the actual policy is good at any point in time! 4/n
Tweet media one
1
0
3
@TZahavy
Tom Zahavy
3 years
@CsabaSzepesvari @peter_richtarik Fight gate keeping - when a reviewer targets a specific paper from conference to conference. This is something that is happening and we need to monitor and stop it.
0
0
3
@TZahavy
Tom Zahavy
4 years
@BachFrancis It will be great if we can get a clarification about this, as it is being actively ignored atm
0
0
3
@TZahavy
Tom Zahavy
5 years
Just finished with my ICML submissions and on my way to AAAI & ALT !!! Starting the week at the GenPlan workshop. We present a work on Inverse RL in contextual MDPs, where we show, theoretically and empirically, zero shot transfer.
1
1
3
@TZahavy
Tom Zahavy
2 years
1
0
3
@TZahavy
Tom Zahavy
3 years
We then show how to reformulate the problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature. [3/7]
Tweet media one
1
0
3
@TZahavy
Tom Zahavy
3 years
We then show that for a specific instance of the meta algorithm, the non stationary reward is simply the gradient of the cost with respect to the state occupancy of the previous policies. [5/7]
Tweet media one
1
0
3
@TZahavy
Tom Zahavy
3 years
Oldies but goldies 😂
@ArashMarkazi
Arash Markazi
3 years
Carmelo Anthony — 37 years old LeBron James – 36 years old Trevor Ariza – 36 years old Marc Gasol – 36 years old Dwight Howard – 35 years old (turning 36 Dec. 8) Wayne Ellington – 33 years old Kent Bazemore – 32 years old Russell Westbrook — 32 years old
642
1K
7K
0
0
3
@TZahavy
Tom Zahavy
4 years
Follow the regularized leader 🧝🏻‍♀️
@bodonoghue85
Brendan O'Donoghue
4 years
End the duality gap! 💪
0
1
7
0
0
3
@TZahavy
Tom Zahavy
3 years
@pcastr @JohnCLangford On a similar fashion, one thing that I found really nice in #UAI was the discussant format. An author from each accepted paper was requested to prepare a slide with a question / discussion about another paper, which really improved the Q&A.
0
0
3
@TZahavy
Tom Zahavy
3 years
See you soon!
Tweet media one
0
0
2
@TZahavy
Tom Zahavy
2 years
We demonstrate that our QD policies are robust to environment perturbations, by training DOMiNO in a "baseline" domain and highlighting the QD policies that are robust to domain perturbations.
1
0
2
@TZahavy
Tom Zahavy
2 years
Confused? consider a simple simple saddle point problem. The dynamics of gradient ascent descent (orange) spirals away from the optimal point, but an OPTIMISTIC version of the gradient bends inside and converges 🧙🧙‍♂️🧙‍♀️ 5/n
Tweet media one
1
0
2