When we started this project the idea of training world models *exclusively* from Internet videos seemed wild, but it turns out latent actions are the key and the bitter lesson holds. Now we have a viable path to generating the rich diversity of environments we need for AGI. 🚀
I am really excited to reveal what
@GoogleDeepMind
's Open Endedness Team has been up to 🚀. We introduce Genie 🧞, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts.
I'm super excited to be joining
@DeepMind
today as a Research Scientist, working with
@_rockt
! Thank you to everyone who helped make this possible! Watch this space 🌱
🤖 Introducing the first survey on AutoRL: methods for automatically discovering multiple components of the RL training pipeline, from tuning hyperparameters and architectures to learning algorithms or automatically designing environments. Link 👉 [1/4]
Evolving Curricula with Regret-Based Environment Design
Website:
Paper:
TL;DR: We introduce a new open-ended RL algorithm that produces complex levels and a robust agent that can solve them (e.g. below).
Highlights ⬇️! [1/N]
I always love hearing from former ML PhD students about the days before tensorflow/pytorch... maybe in a few years we will tell current PhD students about the time before free MuJoCo 🙌
We’ve acquired the MuJoCo physics simulator () and are making it free for all, to support research everywhere. MuJoCo is a fast, powerful, easy-to-use, and soon to be open-source simulation tool, designed for robotics research:
Feel very fortunate to have contributed to this as my first project
@DeepMind
! It is amazing to see what can be done when combining Transformer models with meta-RL and PLR in a vast, open-ended task space!
I’m super excited to share our work on AdA: An Adaptive Agent capable of hypothesis-driven exploration which solves challenging unseen tasks with just a handful of experience, at a similar timescale to humans.
See the thread for more details 👇 [1/N]
The case for offline RL is clear: we often have access to real world data in settings where it is expensive (and potentially even dangerous) to collect new experience. But what happens if this offline data doesn’t perfectly match the test environment? [1/8]
Population Based Training (PBT) has been shown to be successful in a variety of RL settings, but often requires vast computational resources 💰. To address this, last year we introduced Population Based Bandits (PB2 ) [1/N]
Super exciting time to work on population-based methods! We already have fast data collection, now this paper shows vectorizing agent updates can lead to huge speedups (on a GPU):
Looking forward to discussing with the authors (
@instadeepai
) at
#ICML2022
😀
Working closely with many amazing members of
@UCL_DARK
(and
@robertarail
) over the past few years has been a privilege and I am *also* super excited to make this official!! 😎🚀
We are super excited to announce that Dr Roberta Raileanu (
@robertarail
) and Dr Jack Parker-Holder (
@jparkerholder
) have joined
@UCL_DARK
as Honorary Lecturers! Both have done impressive work in Reinforcement Learning and Open-Endedness, and our lab is lucky to get their support.
Heading to Baltimore for
#ICML2022
✈️ Will be presenting ACCEL on Thursday and would love to chat about unsupervised environment design and open-endedness with many of you there! DM if you're around and want to catch up 😀
Evolving Curricula with Regret-Based Environment Design
Website:
Paper:
TL;DR: We introduce a new open-ended RL algorithm that produces complex levels and a robust agent that can solve them (e.g. below).
Highlights ⬇️! [1/N]
Heading to
@NeurIPSConf
tomorrow, would be great to chat about open-endedness, RL, world models or England’s chances at the word cup 😀 DMs open!
#NeurIPS2022
If you're thinking of applying for PhDs, interested in open-endedness/foundation models and don't mind rainy weather 🇬🇧, then consider applying to
@UCL_DARK
! My DMs are open and I'll be in New Orleans for NeurIPS so please get in touch if this sounds like you! 😀
We (
@_rockt
,
@egrefen
,
@robertarail
, and
@jparkerholder
) are looking for PhD students to join us in Fall 2024. If you are interested in Open-Endedness, RL & Foundation Models, then apply here: and also write us at ucl-dark-admissions
@googlegroups
.com
I’ll be ✈️ to
#NeurIPS2023
on Monday and hoping to discuss:
- open-endedness and why it matters for AGI
#iykyk
- world models
- why it’s never been a better time to do a PhD in ML (especially
@UCL_DARK
😉)!
Find me at two posters +
@aloeworkshop
+ hanging around the GDM booth 🤪
Not sure who needs to hear this, but, effectively filtering large and noisy datasets is a gift that keeps on giving!! 🎁 Often more impactful than fancy new model architectures 😅 We found this same thing in RL with autocurricula (e.g. PLR, ACCEL), and I'd bet it works elsewhere
Going for action-free training is a total game changer and it helps to do it with someone who has been thinking about this for years () who happens to also be one of the nicest people ever
This was such a fun and rewarding project to work on. Amazing job by the team! The most exciting thing for me is that we were able to achieve this without using a single doggone action label, which believe me, was not easy!
I'm super excited to share AlphaZeroᵈᵇ, a team of diverse
#AlphaZero
agents that collaborate to solve
#Chess
puzzles and demonstrate increased creativity. Check out our paper to learn more!
A quick 🧵(1/n)
For anyone interested in finding diverse solutions for exploration or generalization, this is worth checking out! Was awesome to work on this project and I'm excited to see where the next ridges take us!! 🚀
The gradient is a locally greedy direction. Where do you get if you follow the eigenvectors of the Hessian instead? Our new paper, “Ridge Rider” (), explores how to do this and what happens in a variety of (toy) problems (if you dare to do so),.. Thread 1/N
Evolving Curricula with Regret-Based Environment Design
Website:
Paper:
TL;DR: We introduce a new open-ended RL algorithm that produces complex levels and a robust agent that can solve them (e.g. below).
Highlights ⬇️! [1/N]
With Bayesian Generational PBT we can update *both* architectures and >10 hyperparameters on the fly in a single run 😮 even better it’s fast with parallel simulators ⚡️… great time to work in this area!!
(1/7) Population Based Training (PBT) has been shown to be highly effective for tuning hyperparameters (HPs) for deep RL. Now with the advent of massively parallel simulators, there has never been a better time to use these methods! However, PBT has a couple of key problems…
In addition to a Research Engineer, we are also looking for a Research Scientist 🧑🔬 to join
@DeepMind
's Open-Endedness Team!
If you are excited about the intersection of open-ended, self-improving, generalist AI and foundation models, please apply 👇
Predicting the next word "only" is sufficient for language models to learn a large body of knowledge that enables then to code, answer questions, understand many topics, chat, and so on.
This is clear to many researchers now, and there are nice tutorials on why this works by
By curating *randomly generated* environments we can produce a curriculum that makes it possible for a student agent to transfer zero-shot to challenging human designed ones, including Formula One tracks 🏎️... maybe one day F1 teams will use PLR? 😀 come check it out
@NeurIPSConf
🏎️ Replay-Guided Adversarial Environment Design
Prioritized Level Replay (PLR) is secretly a form of unsupervised environment design. This leads to new theory improving PLR + impressive zero-shot transfer, like driving the Nürburgring Grand Prix.
paper:
We introduce ACCEL, a new algorithm that extends replay-based Unsupervised Environment Design (UED) (e.g. ) by including an *editor*. The editor makes small changes to previously useful levels, which compound over time to produce complex structures. [2/N]
🏎️ Replay-Guided Adversarial Environment Design
Prioritized Level Replay (PLR) is secretly a form of unsupervised environment design. This leads to new theory improving PLR + impressive zero-shot transfer, like driving the Nürburgring Grand Prix.
paper:
I think the most exciting thing about the current research paradigm is a shift in focus from *solutions* -> *stepping stones*.
Every time a new LLM or VLM comes out it immediately enables new capabilities in a variety of unexpected downstream areas. What a time to be alive 🌱
Was super fun chatting with
@kanjun
and
@joshalbrecht
, hopefully I said something useful in there somewhere! Also interesting to see how much has changed since we spoke in August (both in the field and for
@genintelligent
🚀) what a time to be an AI researcher!!😀
Had a really fun convo with
@jparkerholder
about co-evolving RL agents & environments, alternatives & blockers to population-based training, and why we aren't thinking properly about data efficiency in RL. We also discussed how Jack managed so many papers during his PhD 💪!
We are hiring for
@DeepMind
’s Open-Endedness team. If you have expertise in topics such as RL, evolutionary computation, PCG, quality diversity, novelty search, generative modelling, world models, intrinsic motivation etc., then please consider applying!
As we see with Genie - foundation world models trained from videos offer the potential for generating the environments we need for AGI 🎮. New paper by
@mengjiao_yang
laying out all the possibilities in the space, exciting times 🚀
Video as the New Language for Real-World Decision Making
Both text and video data are abundant on the internet and support large-scale self-supervised learning through next token or frame prediction. However, they have not been equally leveraged: language models have had
Check out our
#NeurIPS2022
paper showing we can train more general world models by collecting data with a diverse population of agents! Great work by
@YingchenX
and team!! Come chat to us in New Orleans 😀
PSA: you can use linear models in deep RL papers and still get accepted at
#ICML2021
!! Congrats to
@philipjohnball
and
@cong_ml
... now let’s try and beat ViT with ridge regression :)
The case for offline RL is clear: we often have access to real world data in settings where it is expensive (and potentially even dangerous) to collect new experience. But what happens if this offline data doesn’t perfectly match the test environment? [1/8]
We can now scale UED to competitive multi-agent RL!! This plot is my favorite, showing that the agent-level dependence clearly matters 🤹♂️ come check out the paper at
#ICLR2023
A key insight for multi-agent settings is that, from the perspective of the teacher, maximising the student’s regret over co-players independently of the environment (and vice versa) doesn’t guarantee maximising regret in the joint space of co-players and environments.
Probably the shortest reviews I’ve ever seen for a top tier conference… maybe we can use them as a prompt for a language model to generate more thorough reviews?? 🤔
#ICML2022
We're excited to announce that the Genie Team from
@GoogleDeepMind
will be our next invited speakers!
Title: Genie: Generative Interactive Environments
Speakers:
@ashrewards
,
@jparkerholder
,
@YugeTen
Sign up:
📌 90 High Holborn
📅 Tue 30 Apr, 17:00
Thank you
@maxjaderberg
!! XLand was super inspiring for us, it showed that our current RL algorithms are already capable of amazing things when given sufficiently rich and diverse environments. Can't wait to push this direction further with future versions of Genie 🚀🚀
Very cool to see the
@GoogleDeepMind
Genie results: learning an action-conditional generative model purely unsupervised from video data. This is close to my heart in getting towards truly open-ended environments to train truly general agents with RL 1/
Great news!! ALOE is back and in person. If you’re heading to
@NeurIPSConf
and interested in open-endedness, adaptive curricula or self-driven learning systems then hopefully see you there 🕺
🌱 The 2nd Agent Learning in Open-Endedness Workshop will be held at NeurIPS 2023 (Dec 10–16) in magnificent New Orleans. ⚜️
If your research considers learning in open-ended settings, consider submitting your work (by 11:59 PM Sept. 29th, AoE).
One amazing thing Genie enables: anyone, including children, can draw a world and then *step into it* and explore it!! How cool is that!?! We tried this with drawings my children made, to their delight. My child drew this, and now can fly the eagles around. Magic!🧞✨
Thanks to a fantastic effort from
@MinqiJiang
all the code from our recent work on UED is now public!! Excited to see the new ideas that come from this! 🍿
We have open sourced our recent algorithms for Unsupervised Environment Design! These algorithms produce adaptive curricula that result in robust RL agents. This codebase includes our implementations of ACCEL, Robust PLR, and PAIRED.
How can we learn a foundational world model directly from Internet-scale videos without any action annotations?
@YugeTen
,
@ashrewards
and
@jparkerholder
from
@GoogleDeepMind
's Open-Endedness Team are presenting "Genie: Generative Interactive Environments" at the
@UCL_DARK
Seminar
💯 and as many have pointed out, this is the worst video models are ever going to be. Super exciting to see the impact these models will have when used as world simulators with open-ended learning
So, rather than considering video models as a poor approximation to a real simulation engine, I think it's interesting to also consider them as something more: a new kind of world simulation that is in many ways far more complete than anything we have had before.
3/3
Super exciting to see improved techniques for generating synthetic data for agents! Awesome work from
@JacksonMattT
and team, plenty more to be done in this space 🚀🚀🚀
🎮 Introducing the new and improved Policy-Guided Diffusion!
Vastly more accurate trajectory generation than autoregressive models, with strong gains in offline RL performance!
Plus a ton of new theory and results since our NeurIPS workshop paper...
Check it out ⤵️
🧬 For ACCEL, we made an interactive paper to accompany the typical PDF we all know and love. "Figure 1" is a demo that lets you challenge our agents by designing your own environments! Now you can also view agents from many training runs simultaneously.
We are open for submissions!
I know there are lots of people working on large models, pretraining, cross-domain/agent generalization for RL. Please submit your papers to the 1st FMDM workshop at NeurIPS 2022!
We are pleased to announce the first *controllable video generation* workshop at
@icmlconf
2024! 📽️📽️📽️
We welcome submissions that explore video generation via different modes of control (e.g. text, pose, action).
Deadline: 31st May AOE
Website:
PSA: we are super excited to announce the workshop on Agent Learning in Open-Endedness (ALOE) at
#ICLR2022
! If you're interested in open-ended learning systems then check out the amazing speaker line-up and the CfP 😀
Announcing the first Agent Learning in Open-Endedness (ALOE) Workshop at
#ICLR2022
!
We're calling for papers across many fields: If you work on open-ended learning, consider submitting. Paper deadline is February 25, 2022, AoE. .
🥚Eggsclusive🥚… introducing the first workshop on Environment Generation for Generalizable robots at
#RSS2023
!! This workshop brings together many topics close to my heart: PCG, large offline datasets, generative modelling and much more! More info from
@vbhatt_cs
⬇️⬇️⬇️
We are excited to announce the first workshop on Environment Generation for Generalizable Robots (EGG) at
#RSS2023
()! Consider submitting if you are working in any area relevant to environment generation for robotics. Submissions due on May 17, 2023, AoE.
Uncovering vulnerabilities in multi-agent systems with the power of Open-Endedness!
Introducing MADRID: Multi-Agent Diagnostics for Robustness via Illuminated Diversity ⚽️
Paper:
Site:
Code: 🔜
Here's what it's all about: 🧵👇
Thanks to
@Bam4d
, we now have a MiniHack Level Editor inside a browser which allows to easily design custom MiniHack environments using a convenient drag-and-drop functionality. Check it out at
Spent time with the Google DeepMind team in London this week, including the people working on our next generation models. Great to see the exciting progress and talk to
@demishassabis
and the teams about the future of AI.
It has been a dream to work on Genie with such fantastic people, I’ve learned so much from all of them. We've also had a lot of fun, for example, using our model trained on platformers to convert random pictures of our pets into playable worlds 🤯🐶
Access to diverse partners is crucial when training robust cooperators or evaluating ad-hoc coordination. In our top 25%
#iclr2023
paper, we tackle the challenge of generating diverse cooperative policies and expose the issue of "sabotages" affecting simpler methods.
A 🧵!
Learned adversaries are back 😎... after some amazing work from
@ishitamed
a variant of PAIRED can now match our previous sota UED algorithms (ACCEL and Robust PLR). This should unlock some exciting new research directions for autocurricula and environment generation 🚀
Despite starting simple, levels in the replay buffer quickly become complex. Not only that, but ACCEL agents are capable of transfer to challenging human designed out-of-distribution environments, outperforming several strong baselines! [3/N]
We're excited to present
@UCL_DARK
's work at
#NeurIPS2021
and look forward to seeing you at the virtual conference!
Check out all posters sessions and activities by our members below 👇
📉 GD can be biased towards finding 'easy' solutions 🐈 By following the eigenvectors of the Hessian with negative eigenvalues, Ridge Rider explores a diverse set of solutions 🎨
#mlcollage
[40]
📜:
💻:
🎬:
Super stoked to be back at
@DeepMind
in London this time as a Research Scientist in the Open-Endedness team! I look forward to working with all my brilliant colleagues here!
Soccer players have to master a range of dynamic skills, from turning and kicking to chasing a ball. How could robots do the same? ⚽
We trained our AI agents to demonstrate a range of agile behaviors using reinforcement learning.
Here’s how. 🧵
Why generate one adversarial prompt when you can instead generate them all…. And then train a drastically more robust model 🌈🌈🌈
Amazing work from
@_samvelyan
@_andreilupu
@sharathraparthy
and team!!
Introducing 🌈 Rainbow Teaming, a new method for generating diverse adversarial prompts for LLMs via LLMs
It's a versatile tool 🛠️ for diagnosing model vulnerabilities across domains and creating data to enhance robustness & safety 🦺
Co-lead w/
@sharathraparthy
&
@_andreilupu
Given the empirical gains, we wanted to see how far we could push the ACCEL agent. It turns out it gets over 50% success rate on mazes over an order of magnitude larger than the training curriculum! The next best baseline was PLR (25% success), while other methods failed. [4/N]
New Article: "Automated Reinforcement Learning (AutoRL): A Survey and Open Problems" by Parker-Holder, Rajan, Song, Biedenkapp, Miao, Eimer, Zhang, Nguyen, Calandra, Faust, Hutter and Lindauer
AutoRL faces significant challenges not seen in typical AutoML problems, leading to a distinct set of methods. In addition, the diversity of RL problems means methods span a wide range of communities. We provide a common taxonomy, discuss each area and pose open problems. [3/4]
Access to useful data is critical for training (and scaling) RL agents... and now we can cheaply generate it 😎! We have been discussing this type of thing for a while and diffusion seems to be the missing ingredient 🧑🍳 Amazing work as always by
@cong_ml
&
@philipjohnball
!!
RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in
#GenerativeAI
in order to vastly upsample⬆️ an agent’s training data!
[1/N]
Couldn’t agree more with this! It’s also unfairly biased towards people who have more recently taken an algorithms class. I don’t see how there’s any useful signal in these interviews and it just adds loads of stress for candidates
I rarely rant, but Leetcode is about the stupidest interview for AI scientist positions. It is totally out of distribution for daily tasks in AI, and doesn’t at all reflect research taste & skills.
I honestly don’t think most tenured AI profs can solve hard Leetcode Qs without
Effective Diversity in Population-Based Reinforcement Learning
Interesting work that looks at ways to increase diversity in behaviors found using population-based methods for RL. Comparisons made to existing Evolution Strategies and Novelty Search methods
We also tested ACCEL in the BipedalWalker environment. ACCEL produces agents that are robust to a wide range of individual challenges, while the baselines often struggle to solve even the simple test tasks. [5/N]
AutoRL has shown to be effective for training RL agents on new problems where optimal configurations are not known, while also providing opportunities for significant performance gains on existing problems with access to more resources. 🚀 [2/4]
Given the strength and simplicity of ACCEL, we think there is huge potential for future work. In particular, scaling to larger problems may require additional mechanisms to directly encourage diversity or adapt agent configurations. Plenty to do here! [7/N]
If you don’t know about UED yet then check out this thread👇 These methods look set to play an increasingly prominent role as we seek to train more general agents for the real world 🚀
If after 3 years of
@MichaelD1729
's work on Unsupervised Environment Design (leading to & ) you are still using domain randomization (DR) for training more robust agents, consider PLR as a drop-in replacement!
@DrJimFan
Totally agreed, lucky I met
@ashrewards
in my first week at GDM who has been doing this for years - but I think we were all surprised by how consistent the actions become at scale and it makes so much sense for world models
Going for action-free training is a total game changer and it helps to do it with someone who has been thinking about this for years () who happens to also be one of the nicest people ever
Offline RL from pixels starter pack:
* new datasets featuring visual observations ✅
* competitive baselines ✅
* a set of exciting open problems ✅
...time to get started!! 🚀
Offline RL offers tremendous potential for training agents from large pre-collected datasets. However, the majority of work focuses on the proprioceptive setting. In this work we release the first public benchmark for continuous control using *visual observations*, V-D4RL. [1/N]
Interested in learning behaviors from offline data? Check out VD4RL for a set of standardized datasets and baselines… already used in some exciting recent papers and now published in
@TmlrOrg
🔥🔥
Delighted that V-D4RL has been accepted at TMLR! Our benchmark and algorithms are the perfect way to start studying offline RL from pixels.
As performance in proprioceptive envs saturate, it’s increasingly necessary to look further! 🧐 Here are some notable uses so far…
[1/N]
Ps. if these don't happen to be your research interests... I'd also happily spend hours talking about being a new parent or Chelsea FC's prospects for the upcoming season!
Excited to say that our
#AISTATS2022
paper “Towards an Understanding of Default Policies in Multitask Policy Optimization” was given an Honorable Mention for Best Paper! If you’re interested in hearing more (or are very bored), stop by our poster tomorrow at 4:30 BST
1/
One of the main questions we get asked about Genie is where the rewards would come from. This work shows we can learn "well shaped rewards purely from internet-video" 😎... Looks like the pieces are coming together 🧩
Also, since this is simple classification, we can also apply this to non-robotic datasets such as Ego4D - ranking frames temporally within a video and using *other* videos as negatives for the discriminator. This result in well shaped rewards purely from internet-video (9/10)
Proud and thankful for my wonderful (human) collaborators. We are all thrilled with our accepted
@NeurIPSConf
papers... except for Doris who is now fighting for authorship. Will get her a bone instead :)
cc
@aldopacchiano
@nguyentienvu
@j_foerst
and others!
Note that in all cases the complexity is emergent: There is no bonus for adding blocks or stumps, but this naturally occurs in the pursuit of high regret. Using the criteria from POET, we see that the ACCEL agent actually produces “Extremely Challenging” levels. [6/N]
Autocurricula can produce more general agents...but can be expensive to run 💸.
Today, we're releasing minimax, a JAX library for RL autocurricula with 120x faster baselines. Runs that took 1 week now take < 3 hours.
Paper:
This work was led by
@philipjohnball
and
@cong_ml
and will be presented as a Spotlight at the
#ICLR2021
SSL-RL workshop.
Paper:
Website:
Please get in touch with any questions!! [8/8]
The surge in
#OpenEndedness
research on arXiv marks a burgeoning interest in the field!
The ascent is largely propelled by the trailblazing contributions of visionaries like
@kenneth0stanley
,
@jeffclune
, and
@joelbot3000
, whose work continues to pave new pathways.
This is one of the biggest issues with RL papers in my view.. and it is compounded when there are also different versions of benchmarks or when baselines use different hyperparameters/architectures. Looks like great work! 👀
We also comment on the incompatibility of alternative evaluation protocols involving maximum across runs or during training to end-performance results. On Atari 100k, we find that the two protocols produce substantially different results. (5/N)
RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in
#GenerativeAI
in order to vastly upsample⬆️ an agent’s training data!
[1/N]
(1/2) In 2024 I will be joining Boston University as an Assistant Professor in Computing and Data Sciences (CDS). Seeking Ph.D. students passionate about sequential decision making, reinforcement learning, and/or algorithmic fairness.
i’m curious about effective altruism: how do so many smart people with the goal “do good for the world” wind up with the subgoal “analyze the neurons of GPT-2 small” or something similar?