Mikael Henaff @HenaffMikael Twitter profile | Pikagi

Pikagi

Mikael Henaff

@HenaffMikael

1,329

Followers

379

Following

21

Media

169

Statuses

Research Scientist at @MetaAI , previously postdoc at @MSFTResearch and PhD at @nyuniversity . All views my own.

New York, USA

https://t.co/6eOH2HHdTD

Joined January 2019

Don't wanna be here? Send us removal request.

Pinned Tweet

@HenaffMikael

Mikael Henaff

1 year

Super stoked to share this work led by @proceduralia & @MartinKlissarov . Our method Motif uses LLMs to rank pairs of observation captions and synthesize dense intrinsic rewards specified by natural language. New SOTA on NetHack while being easily steerable. Paper+code in thread!

@proceduralia

Pierluca D'Oro

1 year

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement

15

166

744

2

3

33

Last Seen Profiles

@shshnmilng61833

@ri_39_

@spandexgasm2

@meekthegodd

@GirthBar

@FPL_Alerts

@soul_oec

@techy_fn

@fhussain73

@TheSeventh11

@bvsheer__

@2catfishhoe

@mal_ih1

@1969_tad

@scorpiojez51

@AzizHe01

@VamosBarca_

@teiyoshiki

@stw_pdg

@MaxLewisTV

@NossaPutinha

@galery_basah10

@CentralTexHoops

@sydneymlevine

@jandakembangstw

@MeoCSGO

@TaifHCluster

@G_Isaczai

@killedbygoogle

@aunties_the

@bradleyziffer

@SawakoSawady

@hollowFPS

@xyg288

@_Observandoo_

@PropbaseApp

@HenaffMikael

Mikael Henaff

13 days

I'm looking for a PhD intern for next year! If you are interested in any combination of: intrinsic motivation, LLM/VLM-guided reward design, long-horizon tasks, hierarchical RL, NetHack, MineCraft, representation learning, I'd love to hear from you. Details below...

6

37

252

@HenaffMikael

Mikael Henaff

2 years

Hiring a #research #intern for 2023 at FAIR ( @MetaAI ), if you're interested in working on exploration, generalization, imitation learning or hierarchical RL please get in touch :)

8

20

174

@HenaffMikael

Mikael Henaff

2 years

Excited to share our @NeurIPSConf paper where we propose E3B--a new algorithm for exploration in varying environments. Paper: Website: E3B sets new SOTA for both MiniHack and reward-free exploration on Habitat. A thread [1/N]

3

24

109

@HenaffMikael

Mikael Henaff

9 months

I am looking for an intern for 2024 to work on the Cortex project in @AIatMeta 's Embodied AI team! Relevant skills include: experience with LLMs/VLMs, EAI simulators such as Habitat, and RL. DM or email at mikaelhenaff [at] meta [dot] com ✨ #AI #InternshipOpportunity #LLM

4

17

81

@HenaffMikael

Mikael Henaff

1 year

Signed. Keeping models open is the best way to ensure high scientific standards for safety research and fair representation in AI development. via @mozilla

Joint Statement on AI Safety and Openness

We are at a critical juncture in AI governance. To mitigate current and future harms from AI systems, we need to embrace openness, transparency, and broad access.

open.mozilla.org

1

9

67

@HenaffMikael

Mikael Henaff

6 years

New paper with @alfcnz and @ylecun , which we will be presenting at #iclr2019 . We learn policies from purely observational data using uncertainty-regularized forward models. #DeepLearning #autonomousdriving Paper: Project site:

Tweet card media

Abstract Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We...

sites.google.com

0

25

56

@HenaffMikael

Mikael Henaff

1 year

Exploration is well-studied for singleton MDPs, but many envs of interest change across episodes (i.e. procgen envs or embodied AI tasks). How should we explore in this case? In our upcoming @icmlconf oral, we study this question. A thread...1/N

Tweet card media

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the...

1

11

51

@HenaffMikael

Mikael Henaff

9 months

Very happy to share that our Motif work was accepted at #ICLR2024 :) come say hi in Vienna!

@proceduralia

Pierluca D'Oro

1 year

Can reinforcement learning from AI feedback unlock new capabilities in AI agents? Introducing Motif, an LLM-powered method for intrinsic motivation from AI feedback. Motif extracts reward functions from Llama 2's preferences and uses them to train agents with reinforcement

15

166

744

0

1

29

@HenaffMikael

Mikael Henaff

4 years

A simple way to help with #Covid_19 and medicine generally is to donate spare computer time to biomedical researchers through projects like @foldingathome or @RosettaAtHome . Small contributions add up to make distributed peta/exaFLOP supercomputers!

0

11

28

@HenaffMikael

Mikael Henaff

2 years

@_aidan_clark_ Step 1 doesn't have to be random: there is a large literature on directed exploration strategies, going back at least to Kearns and Singh's 2003 E^3 work that showed you can avoid the exponential sample complexity due to random random exploration.

2

1

26

@HenaffMikael

Mikael Henaff

1 year

The embodied AI team I'm part of at @MetaAI has multiple Research Scientist / Research Engineer positions open, come work with us ✨

@mkalakrishnan

Mrinal Kalakrishnan

1 year

(1/6) The FAIR Embodied AI team at @MetaAI has multiple full-time openings! If you’re interested in cutting-edge research in AI for robotics, AR and VR, and sharing it with the world, read on. 🧵

6

32

217

0

3

21

@HenaffMikael

Mikael Henaff

1 year

Also feel free to reach out if you want to grab coffee and chat about RL, exploration, generalization, LLMs for decision making, or anything else :) #ICML2023

0

2

19

@HenaffMikael

Mikael Henaff

4 years

Excited to share some recent work in imitation learning at #iclr2020 , which uses an ensemble of policies to reduce covariate shift. Joint work with @xkianteb and Wen Sun. Paper: Talk:

0

5

17

@HenaffMikael

Mikael Henaff

3 months

Stoked about this new benchmark for long-horizon planning, intrinsic motivation, procedural generalization and memory

@mitrma

Michael Matthews

8 months

I’m excited to announce Craftax, a new benchmark for open-ended RL! ⚔️ Extends the popular Crafter benchmark with Nethack-like dungeons ⚡Implemented entirely in Jax, achieving speedups of over 100x 1/

10

60

276

0

2

17

@HenaffMikael

Mikael Henaff

2 years

This is a very exciting dataset - stochastic policies/dynamics, large action space, partial observability, rich dynamics, *very* large scale while still enabling fast experiments. Can't wait to start playing with it and hope others do too!

@erichammy

Eric Hambro

2 years

Delighted to present the NetHack Learning Dataset (NLD) at #NeurIPS2022 next week! NLD is a new large-scale dataset for NetHack and MiniHack, aimed at supercharging research in offline RL, learning from observations, and imitation learning. 1/

Tweet media one

3

20

137

0

4

15

@HenaffMikael

Mikael Henaff

6 months

Latest work where we present OpenEQA, a modern embodied Q&A benchmark which tests multiple capabilities such as spatial reasoning, object recognition and world knowledge on which SOTA VLMS like GPT4V/Claude/Gemini fail. A new challenge for embodied AI! To be presented @CVPR .

@AIatMeta

AI at Meta

6 months

Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?” More details ➡️

38

258

1K

0

6

15

@HenaffMikael

Mikael Henaff

8 months

Interview of @sharathraparthy discussing our recent work showing that transformers can in-context learn new sequential decision-making tasks in new environments. Check it out!

Tweet card media

Generalization to New Sequential Decision Making Tasks with...

Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning. Recently, transformers have been shown to learn new...

@TalkRLPodcast

TalkRL Podcast

8 months

Episode 48: Sharath Chandra Raparthy @sharathraparthy (AI Resident at @AIatMeta ) on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!

1

2

21

1

1

14

@HenaffMikael

Mikael Henaff

1 year

In Hawaii for #ICML2023 , presenting two works Tuesday: - A Study of Global and Episodic Bonuses in Contextual MDPs (poster at 2pm, oral at 6:10 pm) - Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories (poster at 11am) Hope to see you there :)

Tweet media one

0

3

14

@HenaffMikael

Mikael Henaff

2 years

We are hiring a research intern for next year - if you would like to work on hierarchical RL, world models, modular networks and related topics with @shagunsodhani , myself and other researchers at FAIR please reach out! :)

@shagunsodhani

Shagun Sodhani

2 years

We are hiring a #research #intern at FAIR ( @MetaAI ) to work in areas related to #RL , #hierarchical RL, #modular #networks , and #world #models . Location: Montreal / New York / Remote. You can dm me your questions and resume!

4

34

286

0

0

14

@HenaffMikael

Mikael Henaff

13 days

Internship is in NYC, if interested please email me at: mikaelhenaff at meta dot com and apply here: looking forward to hearing from you!

Tweet card media

Research Scientist Intern, AI Core Machine Learning (PhD)

Meta's mission is to give people the power to build community and bring the world closer together. Together, we can help people build stronger communities - join us.

www.metacareers.com

1

0

16

@HenaffMikael

Mikael Henaff

2 years

Presenting our E3B work on exploration in changing environments at #NeurIPS at 11 am NOLA time in Hall J #105 ...come by and say hi! with @robertarail @MinqiJiang @_rockt

Tweet media one

0

3

12

@HenaffMikael

Mikael Henaff

1 year

@_rockt @HeinrichKuttler @_samvelyan @erichammy you might be interested, this method is able to make progress on the Oracle task without demos (although sometimes in unexpected ways ;))

3

0

9

@HenaffMikael

Mikael Henaff

6 years

Nice article in @techreview about our paper on model-based RL with uncertainty regularization for #autonomousdriving

@techreview

MIT Technology Review

6 years

Reinforcement learning makes mistakes as it learns. That's fine when playing a board game. It's, erm, not great in a life-or-death situation.

1

38

65

0

5

11

@HenaffMikael

Mikael Henaff

9 months

@EugeneVinitsky The difference between algorithms that explore efficiently vs. not is essentially polynomial vs. exponential sample complexity (itself a lower bound on compute complexity). Imo more compute can crack some harder poly problems but will eventually hit a wall with exponential ones:)

3

0

10

@HenaffMikael

Mikael Henaff

1 year

@patrickmineault @ylecun End to end memory networks in 2015 () by @tesatory were an important precursor in the sense that like the transformer (and unlike the NTM), they maintain the sequence structure and perform multiple layers of attention over it.

Tweet card media

End-To-End Memory Networks

We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in...

0

2

10

@HenaffMikael

Mikael Henaff

11 months

New work led by @sharathraparthy and jointly with @robertarail @erichammy @_robertkirk showing that one can in-context learn completely *new tasks* on *new environments* via large-scale pretraining and few shot examples. To be presented at upcoming @NeurIPSConf FMDM workshop!

@sharathraparthy

Sharath Raparthy

@sharathraparthy

11 months

🚨 🚨 !!New Paper Alert!! 🚨 🚨 How can we train agents that learn new tasks (with different states, actions, dynamics and reward functions) from only a few demonstrations and no weight updates? In-context learning to the rescue! In our new paper, we show that by training

3

32

264

0

2

9

@HenaffMikael

Mikael Henaff

2 months

Takes me back to my days as a starry-eyed master's student, when Pytorch's grandparent Lush was still used in @ylecun 's lab <3 Lush was actually the first programming language I seriously learned (I'd been studying math until then). Such fond memories counting parentheses!

@alfcnz

Alfredo Canziani

2 months

I wrote two blog posts about SN, Léon Bottou and @ylecun 's 1988 Simulateur de Neurones. One is an English translation of the original paper, for which I've reproduced the figures. The other is a tutorial on how to run their code on Apple silicon.

Tweet media one

2

12

82

1

0

8

@HenaffMikael

Mikael Henaff

10 months

@jsuarez5341 Procedural generation or settings where the environment changes across episodes. Exploration operates very differently in that setting and a lot of algorithms for static MDPs fail.

0

0

8

@HenaffMikael

Mikael Henaff

2 years

Nice opportunity to work with some great researchers!

@robertarail

Roberta Raileanu

2 years

Our group has multiple openings for internships at FAIR London ( @MetaAI ). I’m looking for someone to work on language models + decision making e.g. augmenting LMs with actions / tools / goals, interactive / open-ended learning for LMs, or RLHF. Apply at

10

67

337

0

0

7

@HenaffMikael

Mikael Henaff

28 days

High performing, open source VLMs and smol LLMs now available...nature is healing🌱

@AIatMeta

AI at Meta

28 days

📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more! What’s new? • Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for @Arm , @MediaTek & @Qualcomm on day one. •

Tweet media one

160

937

4K

0

0

7

@HenaffMikael

Mikael Henaff

2 years

E3B sets a new SOTA on 16 challenging sparse-reward tasks from the MiniHack suite. In particular, it does so without requiring any feature engineering or task-specific prior knowledge. [6/N]

Tweet media one

1

2

7

@HenaffMikael

Mikael Henaff

2 years

@akbirkhan @MetaAI definitely possible in NYC, would have to see about London...feel free to apply here and send your cv to mikaelhenaff @meta .com :)

0

0

7

@HenaffMikael

Mikael Henaff

2 years

@HaqueIshfaq Minihack is quite nice, there are lots of tasks and many of them are sparse reward, and it has the additional interesting twist of being procedurally generated. We have some code to train a variety of exploration algorithms here:

Tweet card media

GitHub - facebookresearch/e3b: Official repo for the E3B algorithm described in the paper "Explor...

Official repo for the E3B algorithm described in the paper "Exploration via Elliptical Episodic Bonuses". - GitHub - facebookresearch/e3b: Official repo for the E3B algorithm des...

1

1

7

@HenaffMikael

Mikael Henaff

1 year

It was also a pleasure working with @shagunsodhani @robertarail @yayitsamyzhang and Pascal Vincent on this project!

0

0

5

@HenaffMikael

Mikael Henaff

6 months

@TongzhouWang I think reorganizing information can be seen as adding new information encoded in the reorganization scheme. For example, if you are reorganizing N bits of information with program P of length K bits, you are effectively adding K bits of new information.

1

0

6

@HenaffMikael

Mikael Henaff

2 years

@_rockt @NetHack_LE 'nutritiously hard', sounds like a juicy problem and a hard nut to crack ;)

1

0

6

@HenaffMikael

Mikael Henaff

2 years

Exploration in standard MDPs is well studied, but what about contextual MDPs (CMDPs) where the environment changes each episode? This general framework captures scenarios such as procgen video games or embodied AI tasks where the agent must generalize across physical spaces.[2/N]

1

0

5

@HenaffMikael

Mikael Henaff

2 years

@HaqueIshfaq It has several of the minigrid envs ported but is a lot more challenging because count-based episodic bonuses do not work, we discuss some more here:

0

0

4

@HenaffMikael

Mikael Henaff

15 days

@FelixHill84 Sorry to hear about this Felix, but I'm glad things are starting to look up. I remember when we interned together in the early days which felt like a different world. I really admire both your scientific and human contributions to the field, wishing you well!

1

0

4

@HenaffMikael

Mikael Henaff

2 years

While exploration in CMDPs has recently started receiving attention, we show that existing methods critically rely on an episodic count-based bonus, and fail if this bonus is removed. This also means they fail in complex envs where each state is seen at most once. [3/N]

Tweet media one

1

0

4

@HenaffMikael

Mikael Henaff

2 years

@SinghAyush2811 @MetaAI these are for students in a PhD program, but we sometimes have AI resident spots too which do not have this requirement...will advertise if so

0

0

4

@HenaffMikael

Mikael Henaff

2 years

@alfcnz When the turntable was invented, some people thought it was the end of music. Then people used it to make entirely new kinds of music (sampling, DJing etc). Human creativity always finds a way to express itself given the tools available :)

1

0

4

@HenaffMikael

Mikael Henaff

4 years

New paper accepted to #icml2020 - this takes steps towards bridging the theory-practice gap in RL by providing a provably sample-efficient algorithm for block MDPs which uses contrastive learning. Long version: #ReinforcementLearning

0

0

4

@HenaffMikael

Mikael Henaff

2 years

Big thanks to my co-authors @robertarail @MinqiJiang @_rockt ! Try out our code at: [8/N, N=8]

Tweet card media

GitHub - facebookresearch/e3b: Official repo for the E3B algorithm described in the paper "Explor...

Official repo for the E3B algorithm described in the paper "Exploration via Elliptical Episodic Bonuses". - GitHub - facebookresearch/e3b: Official repo for the E3B algorithm des...

1

0

4

@HenaffMikael

Mikael Henaff

2 years

just checked out Movetodon and it's a very easy way to automatically follow all your twitter contacts on Mastodon...i was pleasantly surprised that lots of people are there already! hope to see you there

0

1

4

@HenaffMikael

Mikael Henaff

9 months

@CupiaBart Nice work, it's great to see interest in NetHack! If you're in this space you might be interested in a couple other repos: In particular, Motif makes some progress on the very challenging Oracle task and uses SF as the RL env.

Tweet card media

GitHub - facebookresearch/motif: Intrinsic Motivation from Artificial Intelligence Feedback

Intrinsic Motivation from Artificial Intelligence Feedback - facebookresearch/motif

0

0

3

@HenaffMikael

Mikael Henaff

2 years

My friend Kelsey (aka @arcanelibrary ) designed a new D&D system inspired by the earlier versions of the game - simple, fast and deadly. I playtested the game during development and can't recommend it enough :) it's now available on Kickstarter!

Tweet card media

Shadowdark RPG: Old-School Gaming, Modernized

Classic adventure gaming for 5E and old-school players alike! One book, all you need to play.

www.kickstarter.com

0

0

3

@HenaffMikael

Mikael Henaff

2 years

To address this limitation, we propose Exploration via Elliptical Episodic Bonuses (E3B). E3B uses an elliptical episodic bonus, which generalizes count-based episodic bonuses to continuous state spaces, paired with a feature extractor learned with an inverse dynamics model.[5/N]

Tweet media one

1

0

3

@HenaffMikael

Mikael Henaff

2 years

We also evaluate E3B for reward-free exploration on Habitat, which provides photorealistic simulations of real indoor environments. Here, E3B outperforms existing methods by a wide margin. [7/N]

Tweet media one

1

0

3

@HenaffMikael

Mikael Henaff

1 year

@UCL_DARK @MinqiJiang Big congrats Dr. @MinqiJiang !!! Very well deserved and it's been a pleasure collaborating during your time at FAIR. Looking forward to seeing what you come up with next :)

1

0

2

@HenaffMikael

Mikael Henaff

2 years

@_aidan_clark_ Near-Optimal Reinforcement Learning in Polynomial Time - UPenn CIS This is based on the idea of novelty bonuses, which has also been extended to deep RL settings (e.g. RND, ICM, pseudocounts, etc)

0

0

2

@HenaffMikael

Mikael Henaff

4 years

New paper at #NeurIPS2020 presenting PC-PG, a policy gradient algorithm that explores by growing a set of policies covering the set of possible states. Polynomial sample complexity in the linear case, and plays nice with modern deep RL methods.

0

0

2

@HenaffMikael

Mikael Henaff

2 years

@cedcolas @robertarail @MinqiJiang @_rockt ...for NGU's KNN-based bonus, if one of the dimensions has much larger scale than the others it can dominate the bonus due to euclidean distance being used

0

0

2

@HenaffMikael

Mikael Henaff

2 years

@NicoBohlinger Thanks! We didn't compare to NGU but others have found it not to work well on procgen envs: One conceptual difference is that the elliptical bonus automatically normalizes wrt scale but NGU's KNN-based one doesn't which means a few features could dominate

1

0

2

@HenaffMikael

Mikael Henaff

6 months

@TongzhouWang Oh interesting, yes that sounds quite related! Yeah algorithmic complexity leads to cool thought experiments despite being not practical unless you have a universe sized computer ;p

0

0

2

@HenaffMikael

Mikael Henaff

3 years

@robertarail Congrats and welcome Roberta!

0

0

2

@HenaffMikael

Mikael Henaff

1 year

Finally, we conduct a systematic comparison of global & episodic design choices across 16 MiniHack tasks. We find that combining the episodic E3B bonus with the global RND bonus sets a new SOTA on MiniHack. Multiplying is also consistently better than adding. 14/N

Tweet media one

1

0

2

@HenaffMikael

Mikael Henaff

1 year

Contextual MDPs are MDPs where the environment changes each episode, and have been gathering increasing interest. For example, Procgen, NetHack/MiniHack, Minecraft/Crafter and embodied AI envs all fall within this category. How should we best explore in this setting? 2/N

1

0

2

@HenaffMikael

Mikael Henaff

1 year

Overall, this clarifies our understanding of how different exploration algorithms operate in CMDPs and opens up a number of exciting new directions. See paper for full details: Thanks to my collaborators @MinqiJiang and @robertarail ! 15/N, N=15

Tweet card media

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received increasing attention in recent years. Current methods use some combination of global novelty bonuses, computed using the...

0

0

2

@HenaffMikael

Mikael Henaff

1 year

@ashkamath20 @ylecun @kchonyc @sainingxie @mengyer Congrats Dr. Kamath!!

0

0

2

@HenaffMikael

Mikael Henaff

1 year

Very nice work by @mklissar on learning long-horizon exploratory behaviors using Laplacian eigenfunctions.

@MartinKlissarov

Martin Klissarov

@MartinKlissarov

1 year

🎉I'm particularly excited to share this project I worked on under the guidance of @MarlosCMachado 🧙 We ask: what is the *right scaffold* for building temporal abstractions, from the ground up? Website: It will be presented next week at #ICML2023 🏝️

3

13

88

0

0

2

@HenaffMikael

Mikael Henaff

3 years

@yayitsamyzhang Woo congrats Amy!! UT is a great place (I did my undergrad there)

0

0

2

@HenaffMikael

Mikael Henaff

2 years

@alfcnz achievement unlocked

0

0

1

@HenaffMikael

Mikael Henaff

1 year

Can we design a bonus which is robust to differing amounts of shared structure? We investigate a simple multiplicative combination of global and episodic bonuses, which performs robustly across envs with differing degrees of shared structure. 12/N

Tweet media one

2

0

1

@HenaffMikael

Mikael Henaff

1 year

@healingfromlc Don't lose hope, I got it almost 3 years ago, was mostly non-functional for a year but it got better little by little & I am now doing MUCH better to the point where symptoms are mostly just an inconvenience. It will get better just v slowly and with lots of ups & downs

0

0

1

@HenaffMikael

Mikael Henaff

1 year

However, because they aim to cover the entire feature space each episode, episodic bonuses can be inefficient when there is lots of shared structure across episodes. Covering the entire feature space may also be simply be impossible, as shown in our counterexample above. 11/N

1

0

1

@HenaffMikael

Mikael Henaff

2 years

An alternative idea could be to count handcrafted features extracted from states, but this relies heavily on prior knowledge. We show that while this can be effective in some cases, it is difficult to design a feature extractor which works well across many tasks. [4/N]

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

2 years

Hi! I'm on Mastodon: @mikaelhenaff @sigmoid .social See you there :)

0

0

1

@HenaffMikael

Mikael Henaff

1 year

We develop a new conceptual framework which provides a unifying explanation of our empirical results. Specifically, we define a value function in feature space which depends both on the feature extractor and the structure of the CMDP. 8/N

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

5 years

Excited to share recent work to be presented at #NeurIPS2019 : explicit exploration/exploitation using dynamics models. - Polynomial sample complexity bound in idealized setting, independent of number of states - Practical algorithm using neural networks

Tweet card media

Explicit Explore-Exploit Algorithms in Continuous State Spaces

We present a new model-based algorithm for reinforcement learning (RL) which consists of explicit exploration and exploitation phases, and is applicable in large or infinite state spaces. The...

0

0

1

@HenaffMikael

Mikael Henaff

1 year

To shed light on the strengths & weaknesses of global & episodic bonuses, we first show that for certain CMDPs, as the number of unique contexts |C| increases global bonuses perform increasingly worse. On the other hand, episodic bonuses retain decent performance. 5/N

Tweet media one

2

0

1

@HenaffMikael

Mikael Henaff

6 months

@TongzhouWang That's the premise of a nice short story called The Library of Babel, by Borges.

0

0

1

@HenaffMikael

Mikael Henaff

4 years

@WeLoveDogsHNL @RosettaAtHome @foldingathome That's awesome, 15 years is a lot of number crunching!

0

0

1

@HenaffMikael

Mikael Henaff

1 year

If this value function changes significantly across episodes, the global bonus may get exhausted in areas that later give high reward. In the example below, once the agent has explored some area it will no longer visit it even though the value becomes is high later on. 9/N

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

9 days

I'll be presenting a poster about some of our recent work on LLM-guided exploration and intrinsic motivation at the NY academy of sciences this coming Friday: if you're in the tri-state area, it's a nice event to chat about ML in a relaxed setting.

Tweet card media

15th Annual Machine Learning Symposium - NYAS

October 18, 2024 | 9:00 AM - 6:00 PM Machine Learning, a subfield of computer science, involves the development of mathematical algorithms that discover knowledge from specific data sets, and then...

0

1

7

@HenaffMikael

Mikael Henaff

2 months

@EduTrending @AIatMeta @NetHack_LE @InnerverseAI @firecrawl_dev @neo4j Hi Lindsay! The NLE is still being used for research, yes - stay tuned :) concerning the repo itself, I believe it is currently being maintained by @HeinrichKuttler here:

Tweet card media

GitHub - heiner/nle: The NetHack Learning Environment

The NetHack Learning Environment. Contribute to heiner/nle development by creating an account on GitHub.

0

0

1

@HenaffMikael

Mikael Henaff

2 years

@jordan_t_ash next level

0

0

1

@HenaffMikael

Mikael Henaff

1 year

However, if this value function changes little (as in the case below), the global bonus will make the agent progressively visit different parts of the feature space, eventually finding one which provides high value across all contexts. The task is then close to solved. 10/N

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

2 years

@abreanac I quite liked this one, it covers the main ideas and appeals more to intuitions than rigorous proofs. The book by James Gleick is also great for an even more informal overview and historical perspective.

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

6 months

@TongzhouWang For example: a very short program can generate every possible book of N words, including works of genius and undiscovered scientific truths. But now, finding such needles in the haystack requires inputting information into the system...

2

0

1

@HenaffMikael

Mikael Henaff

1 year

Conversely, we also provide singleton MDP counterexamples where episodic bonuses fail and global bonuses succeed. In the example below, the episodic bonus will not incentivize the agent to visit more than one corridor since it keeps being reset each episode. 6/N

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

1 year

The high-level takeaway of our study is that global bonuses succeed in settings with lots of shared structure across different contexts/episodes, whereas episodic bonuses are better when little structure is shared. Combining the two improves robustness across regimes. 4/N

1

0

1

@HenaffMikael

Mikael Henaff

1 year

Most recent exploration algorithms use some combination of global bonuses (measuring novelty wrt all the agent's experience) and episodic bonuses (measuring novelty wrt the current episode only). However, the use of these has been ad-hoc and poorly understood. 3/N

1

0

1

@HenaffMikael

Mikael Henaff

9 months

@petrenko_ai @CupiaBart It's great to see more interest in NetHack! We also used SF in a couple other repos which ran NetHack/MiniHack. Not sure if you remember but you answered several of my questions on the SF Discord which was very helpful :)

Tweet card media

GitHub - facebookresearch/motif: Intrinsic Motivation from Artificial Intelligence Feedback

Intrinsic Motivation from Artificial Intelligence Feedback - facebookresearch/motif

1

0

1

@HenaffMikael

Mikael Henaff

2 years

@cedcolas @robertarail @MinqiJiang @_rockt Thanks! we didn't compare to NGU but others have reported it not to work well on procgen envs (). A conceptual advantage of the elliptical bonus is that it automatically adjusts the scale over each dimension...

1

0

1

@HenaffMikael

Mikael Henaff

9 months

@TacoCohen Such good news, it's great to have you!

0

0

1

@HenaffMikael

Mikael Henaff

2 years

@martingale_li @MetaAI feel free to apply here and send your cv to mikaelhenaff @meta .com!

0

0

1

@HenaffMikael

Mikael Henaff

1 year

We also conduct pixel-based experiments on Habitat and Montezuma's Revenge, which suggest that the tradeoffs between global & episodic bonuses we identified previously apply more broadly. The combined bonus helps, but less than before - this remains an open area of research. 13/N

Tweet media one

1

0

1

@HenaffMikael

Mikael Henaff

2 years

@EugeneVinitsky Congrats, and good to hear you'll be in NY!

1

0

1

@HenaffMikael

Mikael Henaff

2 years

@ikostrikov @OpenAI Awesome, congrats Ilya!!

0

0

1