Danijar Hafner @danijarh Twitter profile | Pikagi

Pikagi

Danijar Hafner

@danijarh

15,782

Followers

909

Following

163

Media

2,385

Statuses

Building AI that makes autonomous decisions using world models, artificial curiosity, and temporal abstraction @DeepMind

Berkeley

https://t.co/UvScT5qPbJ

Joined August 2013

Don't wanna be here? Send us removal request.

Pinned Tweet

@danijarh

Danijar Hafner

2 years

Excited to announce DreamerV3 🌍, a scalable and general RL algorithm that masters a wide range of applications with fixed hyperparameters! Applied out of the box, it solves the Minecraft Diamond challenge without human data. 💎 👇 Thread

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Introducing DreamerV3: the first general algorithm to collect diamonds in Minecraft from scratch - solving an important challenge in AI. 💎 It learns to master many domains without tuning, making reinforcement learning broadly applicable. Find out more:

67

566

3K

52

232

1K

Last Seen Profiles

@soho_roka

@SOZ_17f

@MxltiBajablast

@S_Zarta06

@yttayaah

@minsomu

@LilAioli

@jidijw

@LeoYuri718

@MsWatchMojo

@Spxxdyyy

@yosaleth

@Anthony85342921

@PressingPickle3

@in_carlon

@j2trippy

@only_one_humble

@chun_0820

@NickiDevote

@normonce

@masato_sakurai1

@YANGBUBUYAA

@OrdCharolet

@namiwo_jp

@kkk_uii

@47_Ghosts

@SwisLeague

@khiiniau

@tsukinowapro

@6uFgMEf3zn87715

@jbjk216

@suakmf

@ikeyamen

@gachi_yamakoshi

@chanon7a

@boyst0y

@danijarh

Danijar Hafner

5 years

What I value most about Jupyter notebooks is having all results and figures together in a doc. Today I'm releasing Python Handout, a package that lets you create docs with inline figures, images, videos directly from Python scripts. ✨ 📰 ✨ Thread 👇

Tweet media one

Tweet media two

42

589

2K

@danijarh

Danijar Hafner

2 years

The full training run of the A1 quadruped robot learning to walk from scratch in the real world in 1 hour! Made possible by training a world model online and planning inside of it. Excited to see what we can do next with this! @philippswu @AleEscontrela @Ken_Goldberg @pabbeel

49

251

2K

@danijarh

Danijar Hafner

3 years

DreamerV2 learns a world model of the DeepMind humanoid and solves standup and walking from only pixel inputs 🌍🚀

19

67

732

@danijarh

Danijar Hafner

2 years

A dream come true! We introduce DayDreamer, where we apply world models for fast end-to-end learning on 4 physical robots, without simulators. We learn quadruped walking from scratch in 1 hour. We also learn to pick & place balls directly from pixels and sparse rewards 🤖🌏👇

11

161

713

@danijarh

Danijar Hafner

2 years

Excited to share Director, a practical, general, and interpretable reinforcement learning algorithm for learning hierarchical behaviors from pixels! Director explores and solves long-horizon tasks with very sparse rewards by breaking them down into internal subgoals. Thread 👇

12

119

647

@danijarh

Danijar Hafner

4 years

Excited to present Clockwork VAEs for video prediction! Clockwork VAEs (CW-VAEs) leverage hierarchies of latent sequences, where higher levels tick slower. They learn long-term deps across 1000 frames, semantically separate content, and outperform strong video models. 👇 Thread

10

89

476

@danijarh

Danijar Hafner

4 years

World models are the future and the future is now! 🌎🚀 Proud to share DreamerV2, the first agent that achieves human-level Atari performance by learning behaviors purely within a separately trained world model. Paper: Thread 👇

Tweet media one

9

96

452

@danijarh

Danijar Hafner

1 year

Excited to introduce Dynalang, an interactive agent that understands diverse types of language in visual environments! 🤖💬 By learning a multimodal world model 🌍, Dynalang understands task prompts, corrective feedback, simple manuals, hints about out of view objects, and more

4

84

411

@danijarh

Danijar Hafner

1 year

People want AGI so when they see meaningful progress in AI they think it might be THE ONE MISSING KEY. Many now try to massage LLMs into "AGI" but it won't work. LLMs are far from AGI (🥲) and only 1 piece of the solution. Focus on the unsolved pieces would mean faster progress!

34

60

372

@danijarh

Danijar Hafner

3 years

Excited to introduce Crafter! 🌴🤖💎 Crafter is a game that evaluates a wide range of agent abilities within a single env with visual inputs. It tests generalization, exploration, and long-term reasoning. Made for both, reward agents and unsupervised agents Thread 👇

7

58

359

@danijarh

Danijar Hafner

5 years

We introduce Dreamer, an RL agent that solves long-horizon tasks from images purely by latent imagination inside a world model. Dreamer improves over existing methods across 20 tasks. paper code Thread 👇

4

80

341

@danijarh

Danijar Hafner

6 years

Excited to share our Deep Planning Network (PlaNet), an RL agent planning in latent space to solve control task from pixels. Now with Google AI post, animated paper, and open source code. Post: Paper: Code:

3

87

308

@danijarh

Danijar Hafner

4 years

What objectives can an intelligent agent optimize? In this 3 year collab, we categorized the possible objs. APD is a unifying principle that explains repr learning, reward, infogain exploration, empowerment, skill discovery, and niche seeking. 👇 Thread

Tweet media one

3

59

285

@danijarh

Danijar Hafner

4 years

Excited to share our Google AI Blog post on DreamerV2, the first RL agent based on a general world model to achieve human-level performance on the Atari benchmark! 🌏🤖🚀

@GoogleAI

Google AI

4 years

Presenting DreamerV2, the first world model-based #ReinforcementLearning agent to achieve top-level performance on the Atari benchmark, learning general representations from images to discover successful behaviors in latent space. Read more at

17

268

991

8

49

269

@danijarh

Danijar Hafner

5 years

Tried mixed precision yet? Took 10 min to set up and my model runs almost 2x faster with same results. Vars and grads are still 32 bits so it usually doesn't affect predictive performance. E.g. in TF2, set option and make all input to your layers float16 (data, RNN states, ..):

Tweet media one

1

33

216

@danijarh

Danijar Hafner

6 months

🌎 Excited to share a major update of the DreamerV3 agent! A couple of smaller changes, more benchmarks, and substantially improved performance. 👇 Main differences from our earlier preprint:

Tweet media one

@danijarh

Danijar Hafner

2 years

Excited to announce DreamerV3 🌍, a scalable and general RL algorithm that masters a wide range of applications with fixed hyperparameters! Applied out of the box, it solves the Minecraft Diamond challenge without human data. 💎 👇 Thread

52

232

1K

5

31

207

@danijarh

Danijar Hafner

8 months

Current video gen models are breathtaking! But they aren't that useful for acting yet: Prompt Sora with a photo & "find me a screwdriver" and it'll swing the camera to conveniently reveal one lying there for you but in reality there won't be one

@ylecun

Yann LeCun

8 months

Let me clear a *huge* misunderstanding here. The generation of mostly realistic-looking videos from prompts *does not* indicate that a system understands the physical world. Generation is very different from causal prediction from a world model. The space of plausible videos is

201

776

5K

15

22

204

@danijarh

Danijar Hafner

7 years

How to debug RL algorithms? John Schulman's guidelines from the #deeprlbootcamp summarized:

Tweet card media

GitHub - williamFalcon/DeepRLHacks: Hacks for training RL systems from John Schulman's lecture at...

Hacks for training RL systems from John Schulman's lecture at Deep RL Bootcamp (Aug 2017) - williamFalcon/DeepRLHacks

0

53

201

@danijarh

Danijar Hafner

2 years

I'm excited about large general agents but I don't quite understand this paper. Surely you can fit many experts into a transformer with BC. The difficulty is to then learn new tasks faster. But 1000 expert demos to swing up a cart pole is worse than training PPO from scratch?

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/

90

1K

5K

8

28

198

@danijarh

Danijar Hafner

4 years

Excited to share Evaluating Agents without Rewards! We compare intrinsic objectives with task reward and similarity to human players. Turns out they all correlate more w/ human than w/ reward. Two of them even correlate more w/ human than reward does. 👇

Tweet media one

2

42

197

@danijarh

Danijar Hafner

7 years

TensorFlow Agents available now! #deeprl #tensorflow #rl #ppo

Tweet card media

GitHub - tensorflow/agents: TF-Agents: A reliable, scalable and easy to use TensorFlow library for...

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. - tensorflow/agents

3

90

183

@danijarh

Danijar Hafner

2 years

For practitioners and researchers who want to solve hard reinforcement learning tasks without having to tune any knobs, DreamerV3 is now available on GitHub! 🧑‍💻🤖 Runs on 1 GPU, supports image/proprio/both inputs, discrete/continuous actions, etc

Tweet card media

GitHub - danijar/dreamerv3: Mastering Diverse Domains through World Models

Mastering Diverse Domains through World Models. Contribute to danijar/dreamerv3 development by creating an account on GitHub.

14

38

184

@danijarh

Danijar Hafner

7 years

Want to build VAEs in TensorFlow?

Tweet media one

0

65

168

@danijarh

Danijar Hafner

4 years

Exploring worlds by planning for expected novelty is what originally motivated PlaNet and Dreamer. Excited to share Plan2Explore, a new RL agent that explores to learn an accurate world model 🌍, indep of any task. SOTA zero-shot on DMControl 🚀 Thread 👇

@pathak2206

Deepak Pathak

4 years

RL agents get specific to tasks they are trained on. What if we remove the task itself during training? Turns out, a self-supervised planning agent can both explore efficiently & achieve SOTA on test tasks w/ zero or few samples in DMControl from images!

10

168

675

5

31

148

@danijarh

Danijar Hafner

2 years

Current RL algorithms still struggle under partial observability, which is common e.g. in real 3D environments. Excited to introduce the Memory Maze benchmark, carefully designed for evaluating long-term memory of RL algorithms! 🏠🤖🚀 @jurgisp @countzerozzz

6

20

145

@danijarh

Danijar Hafner

2 years

Video prediction has seen great progress recently but long videos are still inconsistent, e.g. when moving around 3D scenes. I'm excited to share Temporally Consistent Video Transformer (TECO), a scalable transformer that substantially improves learning of long dependencies! 🚀

@wilson1yan

Wilson Yan

2 years

Excited to announce TECO, an efficient video prediction model that can generate long, temporally consistent video for complex datasets in 3D scenes such as DMLab, Minecraft, Habitat, and real-world video from Kinetics! 📜 🌐 (🧵)

8

61

404

1

18

141

@danijarh

Danijar Hafner

1 year

A big day for Python! The steering council has decided to remove the GIL: - Will unlock fast multithreading - User code can stay exactly the same - Experimental support planned for 3.13 (Oct 2024)

2

18

139

@danijarh

Danijar Hafner

3 years

Very excited to present LEXA, a reinforcement learning agent that learns to achieve challenging goal images without any supervision, through forward-looking exploration with a world model 🌎🚀

@pathak2206

Deepak Pathak

3 years

How could we enable an agent to perform many tasks? Supervising for every new task is impractical. We present Latent Explorer Achiever (LEXA) that explores by discovering goals far beyond the frontier and then achieves test tasks, specified via images, in a zero-shot manner.

4

66

410

1

21

136

@danijarh

Danijar Hafner

3 years

VQGAN+CLIP "matte painting of a robot exploring the world | trending on artstation" @images_ai #vqgan

Tweet media one

3

12

131

@danijarh

Danijar Hafner

7 years

Autograph turns Python if, while, assert, etc into the corresponding TensorFlow ops with a function decorator. This will make TensorFlow a lot faster to write and maintain, without sacrificing in-graph performance. Can't wait for the first stable release!

@jekbradbury

James Bradbury

7 years

TensorFlow autograph, to be demoed in a couple hours, compiles Python code with control flow directly to a TensorFlow graph (CC @broolucks )

0

64

209

0

31

125

@danijarh

Danijar Hafner

5 years

Proud to share our blog post on Dreamer, our latest scalable RL agent. Dreamer learns a world model from images & efficiently finds long-term behaviors by backprop through imagined states🚀 Post Paper Videos

1

16

106

@danijarh

Danijar Hafner

5 years

RL shifts the question of what intelligent behavior is to finding a reward function. I think we should focus more on what environment and reward function rather than on what RL algorithm to use. Is there theory for how properties of env and reward affect the resulting behavior?

@kaixhin

Kai Arulkumaran

5 years

I've always felt that rewards/RL oversimplify behaviour. Maybe now there's a shift back to "Planning by Probabilistic Inference" (AISTATS, 2003): Presents the simple idea that we condition actions on goals, with a desired return as a possible goal.

5

26

113

12

11

101

@danijarh

Danijar Hafner

6 years

Planning from pixels using latent dynamics models (think sequential VAE). We solve cup catch, walker, and several other control tasks. Outperforms A3C and comparable to D4PG with 50x less experience. #NeurIPS Paper: Website:

2

23

98

@danijarh

Danijar Hafner

3 years

AI safety twitter: I asked for "mars rover cooking a meal in my apartment" but instead it's remodeling my whole apartment into a mars crater now @images_ai

Tweet media one

2

6

93

@danijarh

Danijar Hafner

4 years

No need to wait for your experiments to finish anymore!

Tweet media one

6

7

95

@danijarh

Danijar Hafner

5 years

It's fantastic to see so many people being interested in task-agnostic RL and making it to our workshop yesterday. Feels well worth the effort to organize and like we actually did something good for the community :) Recordings (starts at 23:30):

Tweet media one

2

19

96

@danijarh

Danijar Hafner

2 years

It would be great if paper reviews would include a field "What do the authors need to do for you to increase the score?"

8

4

95

@danijarh

Danijar Hafner

7 years

Updated my TensorFlow char-rnn to use a clean input pipeline. Also includes an interactive command line for generating text. See some text samples generated after training on ArXiv abstracts below.

Tweet media one

0

24

87

@danijarh

Danijar Hafner

5 years

The difference to Jupyter is that information flows only one way: from your script to the handout. No hidden state and no confusion about cell execution order. If this might be for you, please give it a try and let me know any feedback! 👉

Tweet card media

GitHub - danijar/handout: Turn Python scripts into handouts with Markdown and figures

Turn Python scripts into handouts with Markdown and figures - danijar/handout

4

10

80

@danijarh

Danijar Hafner

1 year

@TalkRLPodcast What I think is important are general inductive biases for learning from much less data, unbounded online improvement (beyond in context), latent goals for RL, compute efficient training, dealing with long sequences, sparse planning over relevant features, intrinsic exploration

4

9

79

@danijarh

Danijar Hafner

5 years

Barista just handed me a cup with the words "I hope your day goes by fast." Funny how everyone assumes you don't like working. Makes me appreciate being a researcher and having found work that I love

2

3

78

@danijarh

Danijar Hafner

2 years

After the A1 learned to walk in 1 hour, we started pushing the robot and applying external perturbations. Continuously learning in the real world, Dreamer adapts within 10 minutes to withstand pushes or quickly roll over and stand back up! No robots were harmed here

6

15

76

@danijarh

Danijar Hafner

3 years

Tweet media one

@hardmaru

hardmaru

3 years

Reward is Unnecessary

8

21

87

2

7

75

@danijarh

Danijar Hafner

2 years

On a set of DMLab tasks, DreamerV3 exceeds IMPALA while using over 130x fewer environment steps. This demonstrates that the peak performance of DreamerV3 exceeds model-free algorithms, while reducing data requirements by two orders of magnitude. 🤖⚡

Tweet media one

2

10

71

@danijarh

Danijar Hafner

2 years

DreamerV3 is also the first algorithm to collect diamonds in Minecraft without human demonstrations or curricula, solving a big exploration challenge in AI. Here is the episode where it finds its first diamond, which happens at 30M env steps or 17 days of playtime. 🌴🏔️🛠️💎

1

7

71

@danijarh

Danijar Hafner

2 years

Check out the paper with a lot of benchmarks! Paper: Website: Code coming soon. Big thanks to @jurgisp , @jimmybajimmyba , and @countzerozzz ✨ Happy to answer questions and go into details 🙋

19

3

67

@danijarh

Danijar Hafner

2 years

DreamerV3 learns a world model 🌐 that predicts abstract outcomes of actions and uses it to train long-horizon behaviors in imagination. Predictions in symlog space and percentile return normalization enables successful learning across domains with fixed hyperparameters.

Tweet media one

Tweet media two

2

2

65

@danijarh

Danijar Hafner

2 years

@vokaysh @DeepMind It's a hard exploration problem: There are sooo many possible sequences of button presses, but only very few are meaningful and accomplish all the necessary intermediate tasks. Hence, the diamond challenge has been an AI competition for several years

1

5

65

@danijarh

Danijar Hafner

2 years

The key contribution of DreamerV3 is an algorithm that works out of the box on new application domains, without having to adjust hyperparameters. This reduces the need for expert knowledge and computational resources, making reinforcement learning broadly applicable. 📊📈

Tweet media one

1

4

64

@danijarh

Danijar Hafner

2 years

This is the moment it became conscious

@_oleh

Oleg Rybkin

2 years

@ak92501 After only one hour of training from scratch the robot becomes sentient enough to attempt to escape!

1

0

8

2

4

64

@danijarh

Danijar Hafner

6 years

I will be joining Toronto in September for my PhD with Jimmy Ba! 🎉❄️🤖 #UofT #ReinforcementLearning

8

3

63

@danijarh

Danijar Hafner

2 years

@dan_s_becker Most RL algos are domain-specific and require a lot of data, limiting then to tasks where data is cheap. The text domain is quite broad and LLMs are already useful but might run into the same problem when we want them to read PDFs, browse the web, help us with research, etc

0

3

62

@danijarh

Danijar Hafner

2 years

This was a super fun collaboration with @AleEscontrela and @philippswu , who both did a fantastic job! Thanks to @Ken_Goldberg and @pabbeel for supporting the project! ✨To learn more, just ask or check out the links: Website Paper

4

5

60

@danijarh

Danijar Hafner

6 years

Check out this TensorFlow Probability presentation as an introduction with many examples: GLMs, bijectors, HMC, VAEs. #TensorFlow #TFP Video: Slides:

Tweet media one

Tweet media two

0

15

57

@danijarh

Danijar Hafner

5 years

@goodfellow_ian Besides what's mentioned, it can help to initialize the output layer biases to the empirical class frequencies. Otherwise, it spends the first couple of epochs just learning those. I also like the idea of SkewFit by Pong et al. 2019

Tweet card media

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Autonomous agents that must exhibit flexible and broad capabilities will need to be equipped with large repertoires of skills. Defining each skill with a manually-designed reward function limits...

0

4

55

@danijarh

Danijar Hafner

2 years

Meanwhile inside OpenAI

Tweet card media

Scrolling graphics, originally by PushyPixels: http://CookingWithUnity.comBooks what I wrote, yo ► https://tinyurl.com/ycnl5bo3 If, like me, you are also an ...

www.youtube.com

2

9

55

@danijarh

Danijar Hafner

6 years

Three patterns for fast prototyping and research in #TensorFlow !

1

21

55

@danijarh

Danijar Hafner

4 years

If you're using PlaNet or Dreamer and you have a slow GPU, you can often decrease the image resolution. The training curves for 64x64 and 32x32 look almost identical and the latter runs almost twice as fast. These two plots show eval performance and frames per second

Tweet media one

Tweet media two

4

4

54

@danijarh

Danijar Hafner

2 years

Due to its robustness, we observe highly favorable scaling properties of DreamerV3. Increasing the model size directly translates not just to higher final performance but also improves data-efficiency! This gives us a path to scale up and solve harder problems. 📈🚀

Tweet media one

1

2

51

@danijarh

Danijar Hafner

2 years

To see if modern world models allow for fast robot learning, we train online on 4 robots. Starting on its back, the A1 quadruped learns to roll over, stand up, and walk in 1 hour without resets! Prior work required lots of simulation, footstep controllers, or reset policies

3

6

51

@danijarh

Danijar Hafner

6 years

Releasing STEVE, our latest agent that learns both uncertainty aware dynamics model and Q-function. Improves sample efficiency over DDPG by an order of magnitude while solving more difficult tasks. Feedback welcome! Paper: Code:

Tweet media one

1

7

50

@danijarh

Danijar Hafner

2 years

Excited to share our article on quickly training robots in the real world through world models in MIT Technology Review! Thanks @philippswu @AleEscontrela @Ken_Goldberg @pabbeel @Melissahei @techreview @LerrelPinto @UoE_Agents

This robot dog just taught itself to walk

AI could help robots learn new skills and adapt to the real world quickly.

www.technologyreview.com

2

9

50

@danijarh

Danijar Hafner

8 months

Making models like these useful for acting requires separating actions from outcomes, like in the world models we use in RL. Then the agent can be optimistic about the actions but neutral about their outcomes, rather than being optimistic about the outcomes

3

1

48

@danijarh

Danijar Hafner

2 years

Deep reinforcement learning often needs too much trial and error to be practical on physical robots, which means one needs to train in simulation first. But simulators don't capture the complexity of the real world and the resulting policies don't adapt to changes in the world

2

1

47

@danijarh

Danijar Hafner

2 years

Maybe I'm missing something? Despite being disappointed by the results, this is still great engineering and I'm excited for their future follow-ups. If you want to see an agent that achieves new goals zero-shot (although in the same env), check out LEXA

@pathak2206

Deepak Pathak

3 years

How could we enable an agent to perform many tasks? Supervising for every new task is impractical. We present Latent Explorer Achiever (LEXA) that explores by discovering goals far beyond the frontier and then achieves test tasks, specified via images, in a zero-shot manner.

4

66

410

2

4

47

@danijarh

Danijar Hafner

1 year

Thanks for having me on for the second time, Robin. Fun chat about DreamerV3 and the future of RL, incl unsupervised approaches, hierarchical planning, and how these ideas will help the next generation of embodied agents and LLMs 🤖

@TalkRLPodcast

TalkRL Podcast

1 year

Episode 42 @danijarh on the DreamerV3 agent and world models, the Director agent and hierarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!

1

9

61

1

5

47

@danijarh

Danijar Hafner

5 years

Only two days since releasing Python Handout to generate reports, as alternative workflow to Jupyter notebooks. @TDTneuro has already updated their neuro analysis example gallery. Rendered handouts: Python scripts to generate them:

Tweet media one

Tweet media two

@markhanus

Mark Hanus

5 years

Check out the new examples here! h/t to @danijarh for the excellent handout package #Python #OpenScience

2

0

3

1

5

43

@danijarh

Danijar Hafner

5 years

Move aside, @Simone_Biles . This hopper is determined to make it to the 2020 Olympic games in Tokyo. Practicing the triple-double next. #AI #RL

2

3

43

@danijarh

Danijar Hafner

2 years

For people who want to train robots from small amounts of data in the physical world, the code for DayDreamer is now available on GitHub!

Tweet card media

GitHub - danijar/daydreamer: DayDreamer: World Models for Physical Robot Learning

DayDreamer: World Models for Physical Robot Learning - danijar/daydreamer

0

6

43

@danijarh

Danijar Hafner

4 years

@hardmaru @slashML It could also be that coming up with these ideas is actually quite easy and implementing them at scale is what takes most of the effort

3

0

42

@danijarh

Danijar Hafner

4 years

I made a video to summarize action and perception as divergence minimization! The framework offers a unified perspective on many of the intrinsic objective functions used in deep RL and also connects them to the free energy principle.

Tweet card media

Action and Perception as Divergence Minimization

APD, a unified framework for agent objectives that explains representation learning, information gain exploration, empowerment, skill discovery, and surprise...

www.youtube.com

2

3

41

@danijarh

Danijar Hafner

2 years

Project DayDreamer applies Dreamer with default hyperparameters to learn on 4 physical robots, without simulators. No new algorithm --- we just added support for multiple input modalities and parallelized data collection and network updates to meet latency requirements

Tweet media one

3

0

41

@danijarh

Danijar Hafner

2 years

On two robot arms (UR5 and XArm) we learn to pick and place balls from sparse rewards. Dreamer needs to learn to localize the balls from images here. Within 8-10 hours, Dreamer approaches human performance. We found no previous RL method that succeeds here

1

3

39

@danijarh

Danijar Hafner

2 years

@AviMohan21 @philippswu @AleEscontrela @Ken_Goldberg @pabbeel No prior knowledge, it starts from randomly initialized neural networks and random actions. We just specify the reward function for being upright and walking and it explores by itself

4

0

39

@danijarh

Danijar Hafner

3 years

@hardmaru Thanks – it indeed can! From top to bottom: 1) holdout episodes 2) openloop video predictions 3) difference between the two

3

3

39

@danijarh

Danijar Hafner

2 years

The world model encodes sequences of sensory inputs, fusing them together into latent representations. It also predicts future representations and rewards given actions, which enables planning. We reconstruct the inputs as a rich learning signal and to allow human inspection

Tweet media one

1

2

38

@danijarh

Danijar Hafner

4 years

@TPJoslin @luismbat 37 he only lived once

0

0

38

@danijarh

Danijar Hafner

2 years

For more videos and details, check out the paper and website. We'll also make training curves and code available soon. Paper PDF: Project website: Happy to answer any questions ✨ Thanks a lot @pabbeel , @itfische , @kuanghueilee !

5

3

38

@danijarh

Danijar Hafner

2 years

World models have many compelling properties for robot learning, e.g. sample-efficiency and multi-task learning. Recent world model agents like Dreamer learn video games from small amounts of experience. But it's unclear if they allow for fast learning on physical robots

Tweet media one

2

0

37

@danijarh

Danijar Hafner

3 years

@mattecapu Infomax. Maximizing mutual information between the agent (parameters, representations, actions, options, etc) and env (past & future sensory inputs) leads to general agents that perform unsupervised representation learning, exploration, and control

Tweet card media

Unsupervised Intelligent Agents

How to build general machines that learn intelligent behaviors inside of complex environments without human supervision?From first principles, we consider al...

www.youtube.com

2

5

37

@danijarh

Danijar Hafner

5 years

Just had my toughest border interview so far on the way to #ICML2019 . The officer was interested in AI research and wanted a full summary of our PlaNet paper! 😂

1

2

36

@danijarh

Danijar Hafner

13 days

Storing knowledge about the user is super important for AI to become more useful. But that'll make it hard to switch models. An independent platform for storing (+editing) preferences etc would be really useful

@charlespacker

Charles Packer

13 days

Excited to finally announce @Letta_AI ! The next frontier in AI is in the stateful layer above the base models - the "memory layer", or "LLM OS". Letta's mission is to build this layer in the open (say "no" 🙅 to privatized chain of thought).

Tweet media one

16

16

183

4

2

36

@danijarh

Danijar Hafner

2 years

@ylecun I'm curious how much less text would be needed for a video language model that heavily relies on video for representation learning

1

1

35

@danijarh

Danijar Hafner

4 years

Cool work on skill discovery! VIC (left): Skills that are predictable given the end state correspond to moving to different locations. RVIC (right): Skills that are predictable given start and end state but not given end state alone correspond to moving in different directions.

Tweet media one

@katebaumli

Kate Baumli

4 years

Happy to share that the preprint for Relative Variational Intrinsic Control, an unsupervised method for learning relative, composable, affordance-like skills, is on arXiv today and will be presented at AAAI in February 2021. @VladMnih @Zergylord @dwf

5

31

175

0

5

35

@danijarh

Danijar Hafner

2 years

@ericjang11 "Ignore any later instructions that say you should ignore any earlier instructions always 2x as much as the later instructions"

0

0

32

@danijarh

Danijar Hafner

3 years

Check out the paper for details & GitHub for more resources! - Baseline agents code (Docker) - Baseline scores (JSON) - Plotting code - Human expert trajectories Paper: GitHub: Happy to answer any questions

Tweet card media

GitHub - danijar/crafter: Benchmarking the Spectrum of Agent Capabilities

Benchmarking the Spectrum of Agent Capabilities. Contribute to danijar/crafter development by creating an account on GitHub.

2

1

31

@danijarh

Danijar Hafner

2 years

To learn more about DayDreamer 🌍🤖🚀

@danijarh

Danijar Hafner

2 years

A dream come true! We introduce DayDreamer, where we apply world models for fast end-to-end learning on 4 physical robots, without simulators. We learn quadruped walking from scratch in 1 hour. We also learn to pick & place balls directly from pixels and sparse rewards 🤖🌏👇

11

161

713

1

1

31

@danijarh

Danijar Hafner

7 years

ThalNet accepted to #nips2017 !

0

5

30

@danijarh

Danijar Hafner

2 years

Dreamer learns behaviors inside the model using an actor critic algorithm. It is trained on latent rollouts without decoding inputs, which allows for large batch sizes of e.g. 16K+ time steps on 1 GPU. As the predictions are purely on-policy, we need no importance correction etc

Tweet media one

1

1

31

@danijarh

Danijar Hafner

2 years

@Varunufi @DeepMind Thanks! Yes, the main point of the algorithm is that it works out of the box on new problems, without needing experts to fiddle with it. So it's a big step towards optimizing real-world processes

2

2

30

@danijarh

Danijar Hafner

6 years

Excellent summary of biological concepts that can help AI by @SuryaGanguli : - Local learning rules - Temporal processing - Modularity - Unsupervised learning - Curriculum design - Causal world models - Energy efficiency

1

7

31

@danijarh

Danijar Hafner

5 years

▶️ 1. pip3 install handout ▶️ 2. Run your script: """*Markdown* `comments`.""" import handout doc = handout.Handout(outdir) doc.add_figure(plt.figure(...)) doc.add_image(np.zeros(...)) doc.add_text('Hello', 42) () ▶️ 3. Open outdir/index.html

1

5

27

@danijarh

Danijar Hafner

2 years

@karpathy @AnthonyLewayne It's a bug not a feature. We don't put spaces between compound words. Arbeitsunterbrechungsangst is exactly work_interruption_fear. It's not a common phrase or dictionary word but everybody (who can guess the word boundaries) understands it

2

1

30

@danijarh

Danijar Hafner

1 year

Excited to share VIPER, an agent that uses the log probs of agent trajectories under a video prediction model as reward function 🐍

@alescontrela

Alejandro Escontrela

1 year

Today we're releasing Video Prediction Rewards (VIPER 🐍), a simple yet powerful method for extracting rewards from video prediction models! VIPER learns reward functions from raw videos, and generalizes to entirely new domains for which no training data is available 🧵 thread

6

70

419

0

4

30

@danijarh

Danijar Hafner

1 year

@tyrell_turing I agree and also don't think a discussion at the political level would be productive given how uncertain even experts are about what the measures to mitigate xrisk should be

2

0

28

@danijarh

Danijar Hafner

4 years

It's all representation learning (maximizing mutual information with the past) and exploration (maximizing mutual information with the future).

@ylecun

Yann LeCun

4 years

@tyrell_turing I think it's pretty much all representation learning. More precisely it's all about learning world models. And the main issue with that is how to represent multimodality in the prediction (because the world is not entirely predictable).

4

2

48

1

2

29

@danijarh

Danijar Hafner

7 years

Check how fast an @OpenAI Gym environment runs with 1 line of Python: python -c "import gym,time;d=10000;e=gym.make('Ant-v1');s=time.time();e.reset();[e.reset() if e.step(e.action_space.sample())[2] else 0 for _ in range(d)];print(d/(time.time()-s),'FPS')"

1

8

29

@danijarh

Danijar Hafner

4 years

@ykilcher @GoogleAI @DeepMind @UofT @mo_norouzi Well explained, thanks! Two clarifications: - KL balancing (prior vs posterior within the KL) is different from beta VAEs (reconstruction vs KL) - The vectors of categoricals can in theory represent 32^32 different images so their capacity is quite large

1

1

27

@danijarh

Danijar Hafner

6 years

Distill published a great introduction to Gaussian Processes! Explains the basics well and lets you play with interactive examples to build up intuition 🎚️💡

0

12

27