Karl Pertsch @KarlPertsch Twitter profile | Pikagi

Pikagi

Karl Pertsch

@KarlPertsch

1,657

Followers

227

Following

70

Media

238

Statuses

Robot Foundation Models @ UC Berkeley & Stanford | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

https://t.co/uiUH4d0eji

Joined July 2015

Don't wanna be here? Send us removal request.

Pinned Tweet

@KarlPertsch

Karl Pertsch

2 months

Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X) ⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA 💻 Open-source PyTorch codebase 🤗 Models on HuggingFace 1/

3

62

375

Last Seen Profiles

@KatalyFdn

@49lli

@MoonRovingBTC

@JordanHamel_

@yachipokeca

@epzovb

@nuko_marumaru1

@DigreDigre

@bkstarbass13

@Jis4i__

@earthonsun

@OrthoXYZ

@BloodRaven55

@ouchi_h

@bokeplokalmalam

@HOTRAkureCity

@bokeplokalmalam

@Official_ACJOAU

@ZillaVader

@Official_ACJOAU

@Madac_School

@TheChefz_

@SecretNarc

@empiricalchoir

@syrebi

@BCarvana29311

@seenloner

@amy_xinm

@christinaemory0

@Dhan152

@anhsr777

@ArvinthaB

@0xn8o

@bcurrentLIVE

@thanhmaivan012

@YGagnonCAQ

@KarlPertsch

Karl Pertsch

11 months

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15

8

91

452

@KarlPertsch

Karl Pertsch

8 months

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵

10

90

373

@KarlPertsch

Karl Pertsch

5 months

Access to *diverse* training data is a major bottleneck in robot learning. We're releasing DROID, a large-scale in-the-wild manipulation dataset. 76k trajectories, 500+ scenes, multi-view stereo, language annotations etc Check it out & download today! 💻:

8

60

194

@KarlPertsch

Karl Pertsch

1 month

Our OpenVLA model has been downloaded more than 20k times in less than a month -- the most for any robotics model on the 🤗 hub by a long shot! Here is a little "cook book" for people who want to get started using OpenVLA! 🧑‍🍳 1/🧵

Tweet media one

2

16

163

@KarlPertsch

Karl Pertsch

3 months

Our OpenX paper won best paper at ICRA! Congrats to all my co-authors! 🎉🎉 This is an ongoing effort, we recently added new datasets from the community that double the size of the OpenX dataset -- keep 'em coming! :) Check datasets & how to contribute:

Tweet media one

3

14

104

@KarlPertsch

Karl Pertsch

3 months

Octo has been accepted to RSS and we finally arxiv'd the paper! 🐙 Many small updates vs the December release: more ablations, new checkpoints, code fixes etc 👇

@_akhaliq

AK

3 months

Octo An Open-Source Generalist Robot Policy Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a

Tweet media one

7

22

157

4

11

97

@KarlPertsch

Karl Pertsch

10 months

It was fun to present Open X-Embodiment & RT-X at CoRL today with @QuanVng ! We were very excited about the initial release of the Open X-Embodiment dataset, but it's just the start! We covered lots of open problems in the talk as well👇

Tweet media one

1

7

73

@KarlPertsch

Karl Pertsch

2 years

Excited to present STAR, our work on cross-domain imitation @corl_conf ! Our goal: use demonstrations across domains, e.g. from robot in kitchen A to robot in kitchen B, or even from human to robot. With STAR I can teach a robot new tasks with videos recorded in my kitchen! 🧵👇

1

18

69

@KarlPertsch

Karl Pertsch

11 months

It's awesome to see the positive community response to our release! We're getting inquiries from around the world to contribute more data -- wheeled robots, drones, humanoids etc! 🚀🚀🚀 Please keep them coming 🙂 open-x-embodiment @googlegroups .com

@KarlPertsch

Karl Pertsch

11 months

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15

8

91

452

0

7

57

@KarlPertsch

Karl Pertsch

3 months

Evaluation of robot foundation models is a huge challenge: imagine running robot rollouts across 100s of scenes + tasks + embodiments. How can we make eval keep up w/ model improvements? Introducing SIMPLER: sim eval envs for your favorite real robot foundation models! Short 🧵

@XuanlinLi2

Xuanlin Li (Simon)

3 months

Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior? Presenting SIMPLER!👇

3

23

133

1

6

42

@KarlPertsch

Karl Pertsch

1 month

Excited to release our work on Embodied Chain-of-Thought Reasoning today! We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data! The key: simply think before you act! 1/

@MiZawalski

Michał Zawalski

1 month

🤖Can robots think through complex tasks step-by-step like language models? We present Embodied Chain-of-Thought Reasoning (ECoT): enabling robots to reason about plans and actions for better performance🎯, interpretability🧐, and generalization🌎. See .

2

19

63

1

8

40

@KarlPertsch

Karl Pertsch

1 year

Robot learning needs data, but collecting it is expensive. How can we make the most of existing datasets? In SPRINT, we use LLMs to auto-augment language instructions on robot datasets. Our agents learn a lot more tasks during pre-training *for free*! See Jesse’s 🧵for details!👇

@Jesse_Y_Zhang

Jesse Zhang

1 year

Having humans annotate data to pre-train robots is expensive and time-consuming! Introducing SPRINT: A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort! URL: 🧵👇

2

29

122

1

2

39

@KarlPertsch

Karl Pertsch

3 years

New paper on *Skill-based Learning with Demonstrations (SkiLD)*! While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks! 1/N

1

5

34

@KarlPertsch

Karl Pertsch

4 months

Shoutout to the folks at Rerun who built a visualizer for our DROID dataset -- looks very cool! Allows you to visualize the point cloud from our multi-view stereo cams as well! And should work for any new dataset collected on the DROID robot platform! Thanks @rerundotio :)

@rerundotio

Rerun

4 months

A Rerun Viewer for the DROID Dataset! DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset is a robot manipulation dataset by @SashaKhazatsky et al. with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks.

2

22

124

1

2

32

@KarlPertsch

Karl Pertsch

2 years

New work on scaling robot learning from the team I work with at Google! Especially excited about RT1’s capability to ingest data from diverse sources, eg sim or even experience from other robots + demonstrate transfer -- very useful for scaling robotic dataset size & diversity!

@hausman_k

Karol Hausman

2 years

Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate! Generalizes to new tasks✅ Robust to new environments and objects✅ Fast inference for real time control✅ Can absorb multi-robot data✅ Powers SayCan✅ 🧵👇

62

550

2K

1

0

30

@KarlPertsch

Karl Pertsch

4 years

Grateful to be awarded the best paper presentation award @corl_conf ! 🎉 Huge credit goes to all my lab mates @ CLVR lab, particularly to my co-author @YoungwoonLee , for all the tireless feedback that greatly improved the talk! :) Talk recording:

Tweet media one

3

2

29

@KarlPertsch

Karl Pertsch

2 years

Data collection is a major bottleneck in robot learning: it’s mostly done w/ tedious & expensive human teleoperation. Can we use learning to make data collection itself more efficient? Introducing PATO, our approach for scalable robot data collection w/ learned assistive policies

@ShivinDass

Shivin Dass

2 years

Excited to present PATO: Policy Assisted TeleOperation, our recent work on scaling robot data collection! PATO uses a policy trained on prior data to assist the user during data collection, making teleop easier and even allows to teleop multiple robots simultaneously. 🧵👇

1

10

47

1

4

29

@KarlPertsch

Karl Pertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)

2

11

24

@KarlPertsch

Karl Pertsch

3 months

This looks awesome! Simulation can be a valuable tool for robot data scaling & eval, but the hard part is building diverse simulation envs AND datasets. Glad to see Soroush et al's sim data line of work expanded to more diverse envs! Excited to give this a try!

@snasiriany

Soroush Nasiriany

3 months

I’m excited to introduce RoboCasa, a large-scale simulation framework for everyday tasks. Scaling is the key driving force to unlocking generalist robots, and RoboCasa leverages simulation to take scaling to a whole new level. A short 🧵

10

50

259

2

3

24

@KarlPertsch

Karl Pertsch

11 months

If you want to browse through the Open X-Embodiment data, but don't like fiddling with Colabs, check out this neat website @its_dibya built that gives you a quick overview of all datasets!

@its_dibya

Dibya Ghosh

11 months

Got a chance to dig through the big robot X-embodiment dataset released last week, and hacked together a little website for others to look through the data. Check it out! There's some pretty random and diverse robot data in there

0

37

173

0

2

24

@KarlPertsch

Karl Pertsch

5 months

Check out Lucy's new project! Finally, every roboticist's favorite pastime, "yelling at your robot", can be useful for once! Bonus: lots of ALOHA trail mix in the lab! 😍

@lucy_x_shi

Lucy Shi

5 months

Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w @Stanford and @UCBerkeley 🤖 We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables

19

79

468

1

0

24

@KarlPertsch

Karl Pertsch

1 year

Glad to see RT-2 out! We show that VLM backbones are a great way to equip policies with robustness from internet-scale data. RT-2 strongly improves the generalization ability of existing skills (eg new scenes / objects) -- learning new low-level behaviors is the next frontier!

@hausman_k

Karol Hausman

1 year

PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions? Introducing RT-2: our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions!

19

117

600

1

2

21

@KarlPertsch

Karl Pertsch

4 years

Results and code: w/ @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman , @svlevine

Tweet card media

Long-Horizon Visual Planning with Goal-Conditioned Hierarchical...

Pertsch, Rybkin, Ebert, Finn, Jayaraman, Levine. Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors. 2020.

orybkin.github.io

0

1

20

@KarlPertsch

Karl Pertsch

3 months

Big FOMO! -- but you guys will rock the presentation :) If you're @ ICRA, check out Quan's presentation of our Open X-Embodiment project today, nominated for a best paper award 🎉 Room: CC-Main Hall Time: 10:30-12:00

@QuanVng

Quan Vuong

3 months

Wish @KarlPertsch was at ICRA for Open X-Embodiment 🥲

0

0

10

1

0

20

@KarlPertsch

Karl Pertsch

10 months

Check out @Jesse_Y_Zhang 's CoRL oral on LLM-guided skill learning. Simple recipe: start from a base set of skills —> use LLM to guide exploration towards meaningful skill chains —> expand the skill library w/ RL. We show that this "skill bootstrapping" phase helps downstream RL!

@Jesse_Y_Zhang

Jesse Zhang

10 months

How can our robots autonomously practice **new tasks** in **new environments**? Introducing BOSS: A reinforcement learning (RL) framework that trains agents to solve new tasks in new environments with LLM guidance! **CoRL 2023 Oral** 🧵👇

5

30

150

1

2

17

@KarlPertsch

Karl Pertsch

2 months

Cool use of a fine-tuned VLM for autonomous driving! Appreciate all the ablations in the paper + focus on speeding up inference on edge compute!

@zhaohang0124

Hang Zhao

2 months

Introducing 𝐃𝐫𝐢𝐯𝐞𝐕𝐋𝐌, VLM meets Autonomous Driving. We propose a dual system that drives a car autonomously in complex driving scenarios. - Slow system: VLM - Fast system: classical AD pipeline Enjoy our onboard demo! Project Page:

1

38

160

0

2

18

@KarlPertsch

Karl Pertsch

4 years

Excited to be presenting SPiRL as an oral talk at today's plenary session on RL @corl_conf ! Join to learn about skill priors for accelerated RL on new tasks! Oral: Wed (today), 8:15am PST Interactive: Wed, 12:30pm PST w/ @YoungwoonLee & @JosephLim_AI

@KarlPertsch

Karl Pertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)

2

11

24

1

4

18

@KarlPertsch

Karl Pertsch

7 months

@chris_j_paxton @_ericrosen Indeed existing x-embodiment models like RT-X/Octo don't align action spaces or condition on action space definition/URDF -- that's a major reason why they don't usually work 0-shot on new robot setups: they don't know what action space to use -- we're hoping to fix that soon! :)

3

3

17

@KarlPertsch

Karl Pertsch

6 months

Super cool work from Cheng et al! Robot data collection in the wild without the pain of moving robots around! Before we deploy robots at scale + in the wild, this can greatly increase diversity of robot data + help overcome activation energy for getting generalizable policies

@chichengcc

Cheng Chi

6 months

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

44

352

2K

1

1

17

@KarlPertsch

Karl Pertsch

3 years

Interested in large task-agnostic datasets in robotics? We show how to effectively combine them w/ demonstrations for sample efficient learning of new tasks! Presenting @corl_conf poster session 4 (Wed 11.30-12.30 GMT)! 📜: 💻:

@KarlPertsch

Karl Pertsch

3 years

New paper on *Skill-based Learning with Demonstrations (SkiLD)*! While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks! 1/N

1

5

34

2

3

17

@KarlPertsch

Karl Pertsch

4 years

Check out our new work on visual planning and control! Our model uses a divide-and-conquer strategy to break long-horizon planning problems into easier sub-problems, allowing us to solve tasks that require planning over hundreds of time steps!

@svlevine

Sergey Levine

4 years

Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images! @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman

4

39

202

1

2

16

@KarlPertsch

Karl Pertsch

1 month

This should be a great tutorial by Lerrel, @notmahi and @RussTedrake for anyone wanting to catch up on modern techniques for imitation learning! Lots of the practical tips should transfer to fine-tuning of large pre-trained models too! (see zoom link in Lerrel's thread)

@LerrelPinto

Lerrel Pinto

1 month

This #RSS2024 on July 19, we are organizing a tutorial on supervised policy learning for real world robots! Talks by @notmahi & @RussTedrake will cover the fundamentals of imitation, recent algorithms, walk-through code, and practical considerations.

Tweet media one

4

23

123

0

0

15

@KarlPertsch

Karl Pertsch

2 years

Excited to present two papers w/ co-authors at ICLR this week! 1⃣ Task-Induced Representation Learning: We investigate representation learning in visually complex environments. Q: How can we learn to represent important info & ignore distractors? A: Use prior task experience!

1

2

14

@KarlPertsch

Karl Pertsch

2 years

Check out Lucy's and @YoungwoonLee 's cool work on combining learned skills and model-based RL! Enables more sample efficient learning than model-free skill-RL approaches like SPiRL! + first skill-based RL results on the new CALVIN benchmark! Lucy's first paper -- well done! :)

@lucy_x_shi

Lucy Shi

2 years

Can robots be farsighted? We introduce SkiMo (Skill + Model-based RL), which allows more accurate and efficient long-horizon planning through temporal abstraction. SkiMo learns temporally-extended, sparse-reward tasks with 5x fewer samples! 🧵👇

3

26

127

1

1

14

@KarlPertsch

Karl Pertsch

10 months

2D trajectories for task specification are more grounded than language, but easier to provide than goal images, eg by crowd workers / VLMs. + easy to relabel in hindsight + transfer nicely from human video! Very cool work @Jiayuan_Gu @xiao_ted et al!

@xiao_ted

Ted Xiao

10 months

Instead of just telling robots “what to do”, can we also guide robots by telling them “how to do” tasks? Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! 🧵⬇️

3

48

253

0

2

13

@KarlPertsch

Karl Pertsch

5 years

(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh , in collaboration with @yjy0625 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )

Tweet media one

1

1

12

@KarlPertsch

Karl Pertsch

8 months

Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU! 2/

1

1

12

@KarlPertsch

Karl Pertsch

2 years

By training on in-the-wild human videos, we can use demonstrations from *unseen* environments, e.g. 3 mins of video recorded in my kitchen substantially accelerates RL in a new robot env in our experiments.

1

4

11

@KarlPertsch

Karl Pertsch

11 months

To show that the data is useful for learning, we trained a series of large-scale policies (RT-1-X, RT-2-X) & found co-training with our data to improve performance substantially! We’re releasing model checkpoints too, check Quan’s tweets for details! 11/

@QuanVng

Quan Vuong

11 months

RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. 🔥🥳🧵 Project website:

7

144

619

1

2

9

@KarlPertsch

Karl Pertsch

5 years

We will present our work on keyframe-based video prediction in the workshop on Task-agnostic RL (TARL) tomorrow afternoon. If you're at ICLR, come see us at our poster! (joint work with @_oleh , @yiy0602 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )

@KarlPertsch

Karl Pertsch

5 years

(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh , in collaboration with @yjy0625 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )

Tweet media one

1

1

12

1

6

10

@KarlPertsch

Karl Pertsch

11 months

Creating this dataset was a huge community effort (look at that author list 😀)! I led the dataset construction and had calls with countless labs & everybody was very excited to contribute data — there is a lot of momentum in the community towards sharing & reusing data 🙂 12/

Tweet media one

1

0

8

@KarlPertsch

Karl Pertsch

4 years

@_oleh and I are presenting our work on hierarchical models for long-horizon prediction and planning at the #BIGICML workshop today, start is at 10.40PT. Come join us to chat about predictive models and model-based RL!

@svlevine

Sergey Levine

4 years

Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images! @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman

4

39

202

0

2

8

@KarlPertsch

Karl Pertsch

11 months

Here are the dataset resource links: ✅Colab (vis / download / data loaders): ✅Overview Sheet (filtering): All data is fully open-source under a commercially usable CC-BY 4.0 license! 10/

Tweet card media

Open X-Embodiment Dataset Overview

docs.google.com

1

2

8

@KarlPertsch

Karl Pertsch

11 months

I’m very excited to see how the community will use this dataset! Let me know if you have any questions! 🙂 💻Project Website: 15/15

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Project page for Open X-Embodiment: Robotic Learning Datasets and RT-X Models

robotics-transformer-x.github.io

1

1

8

@KarlPertsch

Karl Pertsch

2 months

Great work! 💯 lots of room to improve on the vision side of VLMs — robotics could be a great test bed too! For VLA training (VLM+action) we found existing vision encoders need lots of fine-tuning to work well for robot control, though admittedly 🤖 eval isn’t straightforward 🥲

@sainingxie

Saining Xie

2 months

Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]

Tweet media one

17

257

1K

0

1

9

@KarlPertsch

Karl Pertsch

11 months

We assembled the dataset by pooling *existing* robot datasets from our collaborators @ Google and many many academic labs (34!). In total we included 60 individual datasets with 22 different robot embodiments — many robot arms, bi-manual robots, quadrupeds, wheeled robots etc. 2/

Tweet media one

1

2

7

@KarlPertsch

Karl Pertsch

11 months

The full dataset download is ~4.5 TB. We also provide a sheet that allows you to filter the data along many attributes, e.g. if you only want to download Franka robot data or only data with wrist cams, natural language instructions etc! Tailor the data to your use case! 9/

Tweet media one

1

0

7

@KarlPertsch

Karl Pertsch

2 months

Check out Sidd's thread about OpenVLA and some key open questions for VLA research!

@siddkaramcheti

Siddharth Karamcheti

@siddkaramcheti

2 months

Thrilled to announce OpenVLA () – a vision-language-action policy for robotic control! Shout out to my co-leads @moo_jin_kim & @KarlPertsch ; see their threads for overviews of our work. Here though, I want to talk about observations & next steps! 🧵⬇️

2

12

66

0

0

8

@KarlPertsch

Karl Pertsch

2 months

How to use it? It’s all on HuggingFace — two lines to load the model, no code install needed. We also open-source our full PyTorch training code & data. Scales from fine-tuning on 1 GPU to training billion-parameter VLAs on distributed clusters! 5/

Tweet media one

1

1

8

@KarlPertsch

Karl Pertsch

8 months

This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU! I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine

2

0

6

@KarlPertsch

Karl Pertsch

4 years

Excited to present SPiRL in contributed talks at the Deep RL and Robot Learning workshops @NeurIPSConf ! Join us during the poster sessions to chat about all things skill learning & transfer! DRL Poster: Room F, A1 Robot Learning Poster: C3 w/ @YoungwoonLee & @JosephLim_AI

@KarlPertsch

Karl Pertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)

2

11

24

0

1

7

@KarlPertsch

Karl Pertsch

2 months

How does it work? We take a strong open-source VLM, Prismatic 7B, and fine-tune it to predict robot actions, using a curated dataset of 970k robot demonstrations. This recipe scales, and allows robotics to reuse pretrained models from the community (SigLIP, DinoV2, Llama2) 🚀 2/

Tweet media one

2

0

7

@KarlPertsch

Karl Pertsch

8 months

Last but not least: Octo is your one-stop-shop for training on OpenX data! We’re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split! 7/

@KarlPertsch

Karl Pertsch

11 months

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15

8

91

452

2

0

7

@KarlPertsch

Karl Pertsch

11 months

We analyzed the properties of the combined dataset! First, the number of datasets per robot embodiment: many academic labs use Franka robot arms, so we have many (smaller) Franka datasets and a long-tail of other robot embodiments! 3/

Tweet media one

1

3

5

@KarlPertsch

Karl Pertsch

1 month

This is great work! 38 fine-tuning tasks for every eval 🤯 thanks for sharing many ablations @giffmana and team! Also confirms our finding that vis encoder fine-tuning is required for finegrained spatial tasks like robot control! Any plans to release larger PaliGemma models? :)

@giffmana

Lucas Beyer (bl16)

1 month

✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye." So here's some of the many interesting ablations we did, check the paper tomorrow for more! 🧶

Tweet media one

20

117

857

1

0

6

@KarlPertsch

Karl Pertsch

11 months

Using the data is easy! All data is stored in tfrecords & we made a colab for visualizing & downloading the data (w/ examples for efficient data loaders)! Each dataset stores observations/actions in its “native” format & resolution, but it's easy to align&mix them on-the-fly! 8/

Tweet media one

1

0

5

@KarlPertsch

Karl Pertsch

10 months

We plan to expand the dataset over time and e.g. add more mobile manipulation and simulation data. If you have data that would be good to integrate, simulated or real, please fill out the form:

Tweet media one

0

0

6

@KarlPertsch

Karl Pertsch

8 months

We’re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines! Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this: 5/

Tweet media one

1

0

5

@KarlPertsch

Karl Pertsch

3 months

Big shoutout to Xuanlin ( @XuanlinLi2 ), Kyle ( @kylehkhsu ) and Jiayuan ( @Jiayuan_Gu ) for leading this project in a UCSD x Stanford x Google collab! For more details about our approach and results, please check out Kyle’s thread below!

@kylehkhsu

Kyle Hsu

3 months

[1/14] Real robot rollouts are the gold standard for evaluating generalist manipulation policies, but is there a less painful way to get good signal for iterating on your design decisions? Let’s take a deep dive on SIMPLER 🧵👇 (or see quoted video)!

2

14

56

1

0

5

@KarlPertsch

Karl Pertsch

11 months

We’re hoping to continue this momentum and keep growing the dataset 🚀! We’re still figuring out the details, but if you or your lab have data you’d like to contribute feel free to shoot an email to open-x-embodiment @googlegroups .com and we will get back to you! :) 13/

1

1

4

@KarlPertsch

Karl Pertsch

4 years

Bonus: with slight tweaks to our model we can make it predict semantic bottlenecks between start and goal. In this case our model learns to predict the subgoal *and* its temporal placement, allowing for non-even splits of the long-horizon problem.

1

0

5

@KarlPertsch

Karl Pertsch

2 months

The HuggingFace integration also means that OpenVLA supports all 🤗 magic out of the box, like LoRA fine-tuning, quantized inference etc etc (see paper for detailed analysis of these)! This makes billion-param models much more accessible in robotics, like it has in NLP! 6/

Tweet media one

1

0

5

@KarlPertsch

Karl Pertsch

11 months

The distribution of objects is diverse & reflective of objects a robot would encounter "in the wild”, like common furniture pieces, food items, appliances etc. There is still a long way towards real world diversity, but we hope that this dataset can build a good foundation! 7/

Tweet media one

1

0

4

@KarlPertsch

Karl Pertsch

2 months

Big shoutout to my co-leads @moo_jin_kim and @siddkaramcheti , and thanks to my advisors @chelseabfinn and @svlevine , and many others involved! Also thanks to @ToyotaResearch for providing the compute to enable this kind of open-source research! 9/9

1

0

5

@KarlPertsch

Karl Pertsch

5 months

In all seriousness though, being able to "program" *and* "debug" your robot in natural language will be tremendously useful when the job of teaching robots new skills is no longer done by machine learning experts in labs but end users in homes! Great job Lucy!! :)

1

1

5

@KarlPertsch

Karl Pertsch

2 months

Please check out Moo Jin’s thread for more details about OpenVLA — Moo Jin really carried the torch in this project, which was the first project in his PhD! Way to go Moo Jin! :)

@moo_jin_kim

Moo Jin Kim

2 months

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

18

164

678

1

0

5

@KarlPertsch

Karl Pertsch

1 month

When collecting your fine-tuning data, start with little variation in terms of objects, positions, scenes, backgrounds, camera angles, etc. It's easier to catch bugs in your robot pipeline this way. But, for best policy generalization, collect more diverse demo data later! 3/

1

0

5

@KarlPertsch

Karl Pertsch

4 years

Jun will present our work on augmenting RL w/ motion planners at @corl_conf today. Our RL agents learn to use motion planners for solving challenging manipulation tasks w/ many obstacles! Interactive Session: today, 11.10am PST. Led jointly by Jun ( @junjungoal ) & @YoungwoonLee .

0

1

5

@KarlPertsch

Karl Pertsch

8 months

Octo is only the first step towards building generalist robot policies and we’re planning to improve the models over time — larger sizes, more robot morphologies, RL etc etc — really excited to see how folks will use Octo! :) 8/

1

0

5

@KarlPertsch

Karl Pertsch

3 years

Enjoyed reading this paper! Still lots of work to do to get sufficiently diverse task distributions in more realistic domains like robotics, but many of the ideas on ranking multi-task agents & automated task curricula seem generally applicable!

@maxjaderberg

Max Jaderberg

3 years

Very excited to release our new work: Open-Ended Learning Leads to Generally Capable Agents. tldr; algorithm that dynamically shapes task distributions to train agents on huge task space, resulting in surprisingly general behaviour Thread: (1/n)

10

216

874

0

1

5

@KarlPertsch

Karl Pertsch

4 years

Join us in today's 9am PT poster session @NeurIPSConf to chat about hierarchical planning w/ goal-conditioned prediction models!

@svlevine

Sergey Levine

4 years

Tmrw (Tue 9/8 at 9 am PT) check out HEDGE at @NeurIPSConf : hierarchical planning with learned tree-structured models enables planning complex behaviors one subgoal at a time. w/ @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman more->

Tweet media one

1

10

39

0

0

5

@KarlPertsch

Karl Pertsch

2 years

Joseph will talk about a lot of our skill-based learning works in the PRL workshop @corl_conf today! Starting at 11.30am NZ time (in ~30 mins) — join with this zoom link:

@JosephLim_AI

Joseph Lim

2 years

On my way to #CoRL2022 ! Poke me if you want to chat :) Also, drop by my talks today if you are interested in Skill-based Robot Learning!

2

1

21

0

0

5

@KarlPertsch

Karl Pertsch

4 years

We are presenting our work on keyframe-based prediction and planning at #L4DC today! We also made a video, so it will take you only 5mins to get a gist of the paper!👇 Website: Paper: Video:

Tweet card media

Keyframing the Future: Keyframe Discovery for Visual Prediction and...

Karl Pertsch*, Oleh Rybkin*, Jingyun Yang, Shenghao Zhou, Konstantinos G. Derpanis, Kostas Daniilidis, Joseph J. Lim, Andrew JaegleArXiv: https://arxiv.org/a...

www.youtube.com

1

0

5

@KarlPertsch

Karl Pertsch

1 month

Most importantly: 99% of applications will require *fine-tuning*, i.e. collect a small dataset <100 robot demos in your target domain & fine-tune OpenVLA on it. Why? OpenVLA needs to learn your robot's action space, camera setup etc. More on 0-shot usage at the end! 2/

1

0

4

@KarlPertsch

Karl Pertsch

1 month

Final thought re 0-shot: we want models that can control your robot out-of-the-box, just like downloading an LLM checkpoint. Current limitation: models are not conditioned on robot URDF / action space definition etc, so they need to learn it implicitly via fine-tuning. 9/

1

0

4

@KarlPertsch

Karl Pertsch

8 months

@yuramel2000 @berkeley_ai You can run inference comfortably on standard consumer GPUs. A 4090 will run our biggest model at ~13it/sec, a 2080Ti still at ~5hz I believe! The smaller model is even faster if you need high control frequency

0

0

3

@KarlPertsch

Karl Pertsch

6 months

Other great examples of tool-based collect: @SongShuran 's earlier work: @LerrelPinto @notmahi et al: , Great execution by @chichengcc et al as expected :)

Tweet card media

Dobb·E: On Bringing Robots Home

Dobb·E is an open-source framework for teaching robots new household tasks in 20 minutes via imitation learning.

0

0

4

@KarlPertsch

Karl Pertsch

11 months

Many authors were involved in this project! Special thanks to @QuanVng for leading the overall project and managing everything masterfully! Tagging a few more co-authors @pannag_ @hausman_k @chelseabfinn @svlevine 14/

2

0

3

@KarlPertsch

Karl Pertsch

2 months

For more details about OpenVLA, please check out our paper, example scripts and codebase. And give the model a try :) Paper: Website & Code:

OpenVLA: An Open-Source Vision-Language-Action Model

OpenVLA: An Open-Source Vision-Language-Action Model

openvla.github.io

1

0

4

@KarlPertsch

Karl Pertsch

3 months

If you’re working on real robot foundation models, give SIMPLER a spin! Real robot datasets (OpenX): SIMPLER Paper: SIMPLER Website: 6/

Tweet card media

Evaluating Real-World Robot Manipulation Policies in Simulation

Project page for Evaluating Real-World Robot Manipulation Policies in Simulation

simpler-env.github.io

1

0

4

@KarlPertsch

Karl Pertsch

11 months

Finally, we analyzed the distribution of skills & objects in the dataset based on the language annotations (~60% of the datasets have language instructions). While common pick-place tasks are most frequent, there is a long tail of interesting skills like wiping, assembling etc 6/

Tweet media one

1

1

3

@KarlPertsch

Karl Pertsch

3 months

@chris_j_paxton Not sure these scaling laws are meaningful since they aggregate across v/ different tasks. "Curve goes up" is also not surprising since they filtered the raw data for "curve goes up". We need scaling laws but a meta-study is likely not the way to do it w/o agreed upon benchmarks?

1

0

4

@KarlPertsch

Karl Pertsch

2 months

How does it perform? Better than any previous generalist robot policy: we test OpenVLA for controlling multiple robots “out-of-the-box” & outperform our own Octo model across the board. We also match or outperform RT-2-X, a 55B closed VLA — the key: a larger robot dataset. 3/

Tweet media one

Tweet media two

1

0

4

@KarlPertsch

Karl Pertsch

7 months

@ericjang11 Great video! Specializing the generalist models to shorten finetuning + data iteration cycles is clever! One Q: is the data that finetunes the specialized model well also sufficient to upstream the skill into the generalist? Do you need to collect more of the same "type" of data?

0

0

4

@KarlPertsch

Karl Pertsch

1 month

Once trained, you can load your fine-tuned OpenVLA model for inference easily via 🤗 AutoModel. NVIDIA 4090s offer the best inference speed / $$$. We also provide code for serving fine-tuned OpenVLA models remotely & query via API, if your fastest GPU isn't near the robot! 6/

Tweet media one

1

0

3

@KarlPertsch

Karl Pertsch

8 months

If we want to build truly “foundational” models for robotics we need to support the diversity of real robot setups! Despite the added flexibility, we find Octo's performance to be strong compared to RT-1X and even RT-2X + great during finetuning! 3/

Tweet media one

1

0

3

@KarlPertsch

Karl Pertsch

11 months

@jeasinema Agreed! While the initial models we trained are 2D-only, the dataset actually has a lot of multi-view / depth cam data, so lots of opportunity for better policies! No tactile data yet, but would be great to include going forward! Maybe @LerrelPinto can help with that? 😉

2

0

3

@KarlPertsch

Karl Pertsch

11 months

👆We also estimated the # of visually distinct scenes per dataset & find that this metric is well distributed across robots, w/ many embodiments contributing a significant fraction of the scene diversity. Ultimately, scene diversity may be more important than trajectory count. 5/

1

1

2

@KarlPertsch

Karl Pertsch

10 months

Shoutout to @Jesse_Y_Zhang for leading this collaboration & special kudos to @JiahuiZhang_32 and @ZiyiLiu21755818 for their first conference publication 🎉 Website: Paper:

Tweet card media

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large...

We propose BOSS, an approach that automatically learns to solve new long-horizon tasks by growing a learned skill library via LLM-guided exploration.

0

0

3

@KarlPertsch

Karl Pertsch

3 months

Main technical delta vs December-Octo is that we use GPT to generate paraphrases of all language instructions in our OpenX data mix for better language grounding (thanks @oier_mees for leading implementation of this feature!) We also show Octo finetuning to bimanual ALOHA now!

1

0

2

@KarlPertsch

Karl Pertsch

3 months

@SOTA_kke @XuanlinLi2 @kylehkhsu @Jiayuan_Gu If there was bad correlation, it would likely be the other way around: the policy works in real but doesn't in SIMPLER bc of real-to-sim gap. SIMPLER worked well for the policies we tested, but may not always hold eg for policies that are sensitive to visual imperfections.

1

0

3

@KarlPertsch

Karl Pertsch

3 months

@RemiCadene 💯 -- Thanks for the shoutout Remi! Small correction: we don't need to modify the policy at all, we're using open-source checkpoints in our exp, some straight from HuggingFace 😉. We simply make sure that the sim is realistic enough so policies trained on real data work in it!

1

0

2

@KarlPertsch

Karl Pertsch

1 year

And @ZoeyC17 and @Vikashplus 's GenAug

@Vikashplus

Vikash Kumar

2 years

Lack of scale & diversity in robot datasets is demanding a change towards scalable alternatives- LLMs, Sim2Real, etc. GenAug presents a powerful recipe for using text2image generative models to demonstrate widespread generalization of robot behaviors to novel scenarios. 🧵(1/N)

2

45

195

1

0

3

@KarlPertsch

Karl Pertsch

1 month

So lots of potential for making these models easier to use! Thanks for sticking around! Check out the OpenVLA announcement below if you missed it & give the model a try! Huge thanks again to my co-leads @moo_jin_kim and @siddkaramcheti ❤️

@KarlPertsch

Karl Pertsch

2 months

Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X) ⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA 💻 Open-source PyTorch codebase 🤗 Models on HuggingFace 1/

3

62

375

0

0

3

@KarlPertsch

Karl Pertsch

8 months

@GlenBerseth Jup! We use 128 TPUv4 for ~14 hours for pre-training of Octo-Base, so depending on how many A100 you can use it may take a bit longer but should definitely be possible.

0

0

1

@KarlPertsch

Karl Pertsch

3 months

@RemiCadene Key is that we only modify *the sim environment* to mitigate control&visual gap, the policy is 1:1 the same we are using on the real robot & was trained long before this project even started :) So in some sense this sim eval is "for free", but requires care in setting up the sim

1

0

3

@KarlPertsch

Karl Pertsch

1 month

Before you roll out OpenVLA on your robot, it's very helpful to feed in a few training images & make sure the predicted actions match the training data. This helps to catch silent bugs in your inference pipeline, one of the most common sources of error! 7/

1

0

2

@KarlPertsch

Karl Pertsch

3 months

If you're interested in all the Octo details, see the release thread below All models and the new OpenX language paraphrases are on HuggingFace :)

@KarlPertsch

Karl Pertsch

8 months

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵

10

90

373

0

0

2

@KarlPertsch

Karl Pertsch

8 months

We’re releasing a tech report with lots of details on what worked and, importantly, what didn’t -- go check it out! 📜: 6/

Tweet card media

Octo: An Open-Source Generalist Robot Policy

octo-models.github.io

1

0

2