Karl Pertsch Profile
Karl Pertsch

@KarlPertsch

1,657
Followers
227
Following
70
Media
238
Statuses

Robot Foundation Models @ UC Berkeley & Stanford | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

Joined July 2015
Don't wanna be here? Send us removal request.
Pinned Tweet
@KarlPertsch
Karl Pertsch
2 months
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X) ⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA 💻 Open-source PyTorch codebase 🤗 Models on HuggingFace 1/
3
62
375
@KarlPertsch
Karl Pertsch
11 months
Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15
8
91
452
@KarlPertsch
Karl Pertsch
8 months
3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵
10
90
373
@KarlPertsch
Karl Pertsch
5 months
Access to *diverse* training data is a major bottleneck in robot learning. We're releasing DROID, a large-scale in-the-wild manipulation dataset. 76k trajectories, 500+ scenes, multi-view stereo, language annotations etc Check it out & download today! 💻:
8
60
194
@KarlPertsch
Karl Pertsch
1 month
Our OpenVLA model has been downloaded more than 20k times in less than a month -- the most for any robotics model on the 🤗 hub by a long shot! Here is a little "cook book" for people who want to get started using OpenVLA! 🧑‍🍳 1/🧵
Tweet media one
2
16
163
@KarlPertsch
Karl Pertsch
3 months
Our OpenX paper won best paper at ICRA! Congrats to all my co-authors! 🎉🎉 This is an ongoing effort, we recently added new datasets from the community that double the size of the OpenX dataset -- keep 'em coming! :) Check datasets & how to contribute:
Tweet media one
3
14
104
@KarlPertsch
Karl Pertsch
3 months
Octo has been accepted to RSS and we finally arxiv'd the paper! 🐙 Many small updates vs the December release: more ablations, new checkpoints, code fixes etc 👇
@_akhaliq
AK
3 months
Octo An Open-Source Generalist Robot Policy Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a
Tweet media one
7
22
157
4
11
97
@KarlPertsch
Karl Pertsch
10 months
It was fun to present Open X-Embodiment & RT-X at CoRL today with @QuanVng ! We were very excited about the initial release of the Open X-Embodiment dataset, but it's just the start! We covered lots of open problems in the talk as well👇
Tweet media one
1
7
73
@KarlPertsch
Karl Pertsch
2 years
Excited to present STAR, our work on cross-domain imitation @corl_conf ! Our goal: use demonstrations across domains, e.g. from robot in kitchen A to robot in kitchen B, or even from human to robot. With STAR I can teach a robot new tasks with videos recorded in my kitchen! 🧵👇
1
18
69
@KarlPertsch
Karl Pertsch
11 months
It's awesome to see the positive community response to our release! We're getting inquiries from around the world to contribute more data -- wheeled robots, drones, humanoids etc! 🚀🚀🚀 Please keep them coming 🙂 open-x-embodiment @googlegroups .com
@KarlPertsch
Karl Pertsch
11 months
Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15
8
91
452
0
7
57
@KarlPertsch
Karl Pertsch
3 months
Evaluation of robot foundation models is a huge challenge: imagine running robot rollouts across 100s of scenes + tasks + embodiments. How can we make eval keep up w/ model improvements? Introducing SIMPLER: sim eval envs for your favorite real robot foundation models! Short 🧵
@XuanlinLi2
Xuanlin Li (Simon)
3 months
Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior? Presenting SIMPLER!👇
3
23
133
1
6
42
@KarlPertsch
Karl Pertsch
1 month
Excited to release our work on Embodied Chain-of-Thought Reasoning today! We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data! The key: simply think before you act! 1/
@MiZawalski
Michał Zawalski
1 month
🤖Can robots think through complex tasks step-by-step like language models? We present Embodied Chain-of-Thought Reasoning (ECoT): enabling robots to reason about plans and actions for better performance🎯, interpretability🧐, and generalization🌎. See .
2
19
63
1
8
40
@KarlPertsch
Karl Pertsch
1 year
Robot learning needs data, but collecting it is expensive. How can we make the most of existing datasets? In SPRINT, we use LLMs to auto-augment language instructions on robot datasets. Our agents learn a lot more tasks during pre-training *for free*! See Jesse’s 🧵for details!👇
@Jesse_Y_Zhang
Jesse Zhang
1 year
Having humans annotate data to pre-train robots is expensive and time-consuming! Introducing SPRINT: A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort! URL: 🧵👇
2
29
122
1
2
39
@KarlPertsch
Karl Pertsch
3 years
New paper on *Skill-based Learning with Demonstrations (SkiLD)*! While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks! 1/N
1
5
34
@KarlPertsch
Karl Pertsch
4 months
Shoutout to the folks at Rerun who built a visualizer for our DROID dataset -- looks very cool! Allows you to visualize the point cloud from our multi-view stereo cams as well! And should work for any new dataset collected on the DROID robot platform! Thanks @rerundotio :)
@rerundotio
Rerun
4 months
A Rerun Viewer for the DROID Dataset! DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset is a robot manipulation dataset by @SashaKhazatsky et al. with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks.
2
22
124
1
2
32
@KarlPertsch
Karl Pertsch
2 years
New work on scaling robot learning from the team I work with at Google! Especially excited about RT1’s capability to ingest data from diverse sources, eg sim or even experience from other robots + demonstrate transfer -- very useful for scaling robotic dataset size & diversity!
@hausman_k
Karol Hausman
2 years
Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate! Generalizes to new tasks✅ Robust to new environments and objects✅ Fast inference for real time control✅ Can absorb multi-robot data✅ Powers SayCan✅ 🧵👇
62
550
2K
1
0
30
@KarlPertsch
Karl Pertsch
4 years
Grateful to be awarded the best paper presentation award @corl_conf ! 🎉 Huge credit goes to all my lab mates @ CLVR lab, particularly to my co-author @YoungwoonLee , for all the tireless feedback that greatly improved the talk! :) Talk recording:
Tweet media one
3
2
29
@KarlPertsch
Karl Pertsch
2 years
Data collection is a major bottleneck in robot learning: it’s mostly done w/ tedious & expensive human teleoperation. Can we use learning to make data collection itself more efficient? Introducing PATO, our approach for scalable robot data collection w/ learned assistive policies
@ShivinDass
Shivin Dass
2 years
Excited to present PATO: Policy Assisted TeleOperation, our recent work on scaling robot data collection! PATO uses a policy trained on prior data to assist the user during data collection, making teleop easier and even allows to teleop multiple robots simultaneously. 🧵👇
1
10
47
1
4
29
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)
2
11
24
@KarlPertsch
Karl Pertsch
3 months
This looks awesome! Simulation can be a valuable tool for robot data scaling & eval, but the hard part is building diverse simulation envs AND datasets. Glad to see Soroush et al's sim data line of work expanded to more diverse envs! Excited to give this a try!
@snasiriany
Soroush Nasiriany
3 months
I’m excited to introduce RoboCasa, a large-scale simulation framework for everyday tasks. Scaling is the key driving force to unlocking generalist robots, and RoboCasa leverages simulation to take scaling to a whole new level. A short 🧵
10
50
259
2
3
24
@KarlPertsch
Karl Pertsch
11 months
If you want to browse through the Open X-Embodiment data, but don't like fiddling with Colabs, check out this neat website @its_dibya built that gives you a quick overview of all datasets!
@its_dibya
Dibya Ghosh
11 months
Got a chance to dig through the big robot X-embodiment dataset released last week, and hacked together a little website for others to look through the data. Check it out! There's some pretty random and diverse robot data in there
0
37
173
0
2
24
@KarlPertsch
Karl Pertsch
5 months
Check out Lucy's new project! Finally, every roboticist's favorite pastime, "yelling at your robot", can be useful for once! Bonus: lots of ALOHA trail mix in the lab! 😍
@lucy_x_shi
Lucy Shi
5 months
Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w @Stanford and @UCBerkeley 🤖 We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables
19
79
468
1
0
24
@KarlPertsch
Karl Pertsch
1 year
Glad to see RT-2 out! We show that VLM backbones are a great way to equip policies with robustness from internet-scale data. RT-2 strongly improves the generalization ability of existing skills (eg new scenes / objects) -- learning new low-level behaviors is the next frontier!
@hausman_k
Karol Hausman
1 year
PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions? Introducing RT-2: our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions!
19
117
600
1
2
21
@KarlPertsch
Karl Pertsch
3 months
Big FOMO! -- but you guys will rock the presentation :) If you're @ ICRA, check out Quan's presentation of our Open X-Embodiment project today, nominated for a best paper award 🎉 Room: CC-Main Hall Time: 10:30-12:00
@QuanVng
Quan Vuong
3 months
Wish @KarlPertsch was at ICRA for Open X-Embodiment 🥲
0
0
10
1
0
20
@KarlPertsch
Karl Pertsch
10 months
Check out @Jesse_Y_Zhang 's CoRL oral on LLM-guided skill learning. Simple recipe: start from a base set of skills —> use LLM to guide exploration towards meaningful skill chains —> expand the skill library w/ RL. We show that this "skill bootstrapping" phase helps downstream RL!
@Jesse_Y_Zhang
Jesse Zhang
10 months
How can our robots autonomously practice **new tasks** in **new environments**? Introducing BOSS: A reinforcement learning (RL) framework that trains agents to solve new tasks in new environments with LLM guidance! **CoRL 2023 Oral** 🧵👇
5
30
150
1
2
17
@KarlPertsch
Karl Pertsch
2 months
Cool use of a fine-tuned VLM for autonomous driving! Appreciate all the ablations in the paper + focus on speeding up inference on edge compute!
@zhaohang0124
Hang Zhao
2 months
Introducing 𝐃𝐫𝐢𝐯𝐞𝐕𝐋𝐌, VLM meets Autonomous Driving. We propose a dual system that drives a car autonomously in complex driving scenarios. - Slow system: VLM - Fast system: classical AD pipeline Enjoy our onboard demo! Project Page:
1
38
160
0
2
18
@KarlPertsch
Karl Pertsch
4 years
Excited to be presenting SPiRL as an oral talk at today's plenary session on RL @corl_conf ! Join to learn about skill priors for accelerated RL on new tasks! Oral: Wed (today), 8:15am PST Interactive: Wed, 12:30pm PST w/ @YoungwoonLee & @JosephLim_AI
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)
2
11
24
1
4
18
@KarlPertsch
Karl Pertsch
7 months
@chris_j_paxton @_ericrosen Indeed existing x-embodiment models like RT-X/Octo don't align action spaces or condition on action space definition/URDF -- that's a major reason why they don't usually work 0-shot on new robot setups: they don't know what action space to use -- we're hoping to fix that soon! :)
3
3
17
@KarlPertsch
Karl Pertsch
6 months
Super cool work from Cheng et al! Robot data collection in the wild without the pain of moving robots around! Before we deploy robots at scale + in the wild, this can greatly increase diversity of robot data + help overcome activation energy for getting generalizable policies
@chichengcc
Cheng Chi
6 months
Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
44
352
2K
1
1
17
@KarlPertsch
Karl Pertsch
3 years
Interested in large task-agnostic datasets in robotics? We show how to effectively combine them w/ demonstrations for sample efficient learning of new tasks! Presenting @corl_conf poster session 4 (Wed 11.30-12.30 GMT)! 📜: 💻:
@KarlPertsch
Karl Pertsch
3 years
New paper on *Skill-based Learning with Demonstrations (SkiLD)*! While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks! 1/N
1
5
34
2
3
17
@KarlPertsch
Karl Pertsch
4 years
Check out our new work on visual planning and control! Our model uses a divide-and-conquer strategy to break long-horizon planning problems into easier sub-problems, allowing us to solve tasks that require planning over hundreds of time steps!
@svlevine
Sergey Levine
4 years
Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images! @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman
4
39
202
1
2
16
@KarlPertsch
Karl Pertsch
1 month
This should be a great tutorial by Lerrel, @notmahi and @RussTedrake for anyone wanting to catch up on modern techniques for imitation learning! Lots of the practical tips should transfer to fine-tuning of large pre-trained models too! (see zoom link in Lerrel's thread)
@LerrelPinto
Lerrel Pinto
1 month
This #RSS2024 on July 19, we are organizing a tutorial on supervised policy learning for real world robots! Talks by @notmahi & @RussTedrake will cover the fundamentals of imitation, recent algorithms, walk-through code, and practical considerations.
Tweet media one
4
23
123
0
0
15
@KarlPertsch
Karl Pertsch
2 years
Excited to present two papers w/ co-authors at ICLR this week! 1⃣ Task-Induced Representation Learning: We investigate representation learning in visually complex environments. Q: How can we learn to represent important info & ignore distractors? A: Use prior task experience!
1
2
14
@KarlPertsch
Karl Pertsch
2 years
Check out Lucy's and @YoungwoonLee 's cool work on combining learned skills and model-based RL! Enables more sample efficient learning than model-free skill-RL approaches like SPiRL! + first skill-based RL results on the new CALVIN benchmark! Lucy's first paper -- well done! :)
@lucy_x_shi
Lucy Shi
2 years
Can robots be farsighted? We introduce SkiMo (Skill + Model-based RL), which allows more accurate and efficient long-horizon planning through temporal abstraction. SkiMo learns temporally-extended, sparse-reward tasks with 5x fewer samples! 🧵👇
3
26
127
1
1
14
@KarlPertsch
Karl Pertsch
10 months
2D trajectories for task specification are more grounded than language, but easier to provide than goal images, eg by crowd workers / VLMs. + easy to relabel in hindsight + transfer nicely from human video! Very cool work @Jiayuan_Gu @xiao_ted et al!
@xiao_ted
Ted Xiao
10 months
Instead of just telling robots “what to do”, can we also guide robots by telling them “how to do” tasks? Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! 🧵⬇️
3
48
253
0
2
13
@KarlPertsch
Karl Pertsch
5 years
(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh , in collaboration with @yjy0625 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )
Tweet media one
1
1
12
@KarlPertsch
Karl Pertsch
8 months
Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU! 2/
1
1
12
@KarlPertsch
Karl Pertsch
2 years
By training on in-the-wild human videos, we can use demonstrations from *unseen* environments, e.g. 3 mins of video recorded in my kitchen substantially accelerates RL in a new robot env in our experiments.
1
4
11
@KarlPertsch
Karl Pertsch
11 months
To show that the data is useful for learning, we trained a series of large-scale policies (RT-1-X, RT-2-X) & found co-training with our data to improve performance substantially! We’re releasing model checkpoints too, check Quan’s tweets for details! 11/
@QuanVng
Quan Vuong
11 months
RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. 🔥🥳🧵 Project website:
7
144
619
1
2
9
@KarlPertsch
Karl Pertsch
5 years
We will present our work on keyframe-based video prediction in the workshop on Task-agnostic RL (TARL) tomorrow afternoon. If you're at ICLR, come see us at our poster! (joint work with @_oleh , @yiy0602 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )
@KarlPertsch
Karl Pertsch
5 years
(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh , in collaboration with @yjy0625 , @CSProfKGD , Joseph Lim, @KostasPenn , @drew_jaegle )
Tweet media one
1
1
12
1
6
10
@KarlPertsch
Karl Pertsch
11 months
Creating this dataset was a huge community effort (look at that author list 😀)! I led the dataset construction and had calls with countless labs & everybody was very excited to contribute data — there is a lot of momentum in the community towards sharing & reusing data 🙂 12/
Tweet media one
1
0
8
@KarlPertsch
Karl Pertsch
4 years
@_oleh and I are presenting our work on hierarchical models for long-horizon prediction and planning at the #BIGICML workshop today, start is at 10.40PT. Come join us to chat about predictive models and model-based RL!
@svlevine
Sergey Levine
4 years
Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images! @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman
4
39
202
0
2
8
@KarlPertsch
Karl Pertsch
11 months
Here are the dataset resource links: ✅Colab (vis / download / data loaders):  ✅Overview Sheet (filtering):  All data is fully open-source under a commercially usable CC-BY 4.0 license! 10/
1
2
8
@KarlPertsch
Karl Pertsch
11 months
I’m very excited to see how the community will use this dataset! Let me know if you have any questions! 🙂 💻Project Website: 15/15
1
1
8
@KarlPertsch
Karl Pertsch
2 months
Great work! 💯 lots of room to improve on the vision side of VLMs — robotics could be a great test bed too! For VLA training (VLM+action) we found existing vision encoders need lots of fine-tuning to work well for robot control, though admittedly 🤖 eval isn’t straightforward 🥲
@sainingxie
Saining Xie
2 months
Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]
Tweet media one
17
257
1K
0
1
9
@KarlPertsch
Karl Pertsch
11 months
We assembled the dataset by pooling *existing* robot datasets from our collaborators @ Google and many many academic labs (34!). In total we included 60 individual datasets with 22 different robot embodiments — many robot arms, bi-manual robots, quadrupeds, wheeled robots etc. 2/
Tweet media one
1
2
7
@KarlPertsch
Karl Pertsch
11 months
The full dataset download is ~4.5 TB. We also provide a sheet that allows you to filter the data along many attributes, e.g. if you only want to download Franka robot data or only data with wrist cams, natural language instructions etc! Tailor the data to your use case! 9/
Tweet media one
1
0
7
@KarlPertsch
Karl Pertsch
2 months
Check out Sidd's thread about OpenVLA and some key open questions for VLA research!
@siddkaramcheti
Siddharth Karamcheti
2 months
Thrilled to announce OpenVLA () – a vision-language-action policy for robotic control! Shout out to my co-leads @moo_jin_kim & @KarlPertsch ; see their threads for overviews of our work. Here though, I want to talk about observations & next steps! 🧵⬇️
2
12
66
0
0
8
@KarlPertsch
Karl Pertsch
2 months
How to use it? It’s all on HuggingFace — two lines to load the model, no code install needed. We also open-source our full PyTorch training code & data. Scales from fine-tuning on 1 GPU to training billion-parameter VLAs on distributed clusters! 5/
Tweet media one
1
1
8
@KarlPertsch
Karl Pertsch
8 months
This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU! I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine
2
0
6
@KarlPertsch
Karl Pertsch
4 years
Excited to present SPiRL in contributed talks at the Deep RL and Robot Learning workshops @NeurIPSConf ! Join us during the poster sessions to chat about all things skill learning & transfer! DRL Poster: Room F, A1 Robot Learning Poster: C3 w/ @YoungwoonLee & @JosephLim_AI
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills! Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors! 📄Paper: 💻Website & Code: Thread👇(1/8)
2
11
24
0
1
7
@KarlPertsch
Karl Pertsch
2 months
How does it work? We take a strong open-source VLM, Prismatic 7B, and fine-tune it to predict robot actions, using a curated dataset of 970k robot demonstrations. This recipe scales, and allows robotics to reuse pretrained models from the community (SigLIP, DinoV2, Llama2) 🚀 2/
Tweet media one
2
0
7
@KarlPertsch
Karl Pertsch
8 months
Last but not least: Octo is your one-stop-shop for training on OpenX data! We’re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split! 7/
@KarlPertsch
Karl Pertsch
11 months
Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step! There’s lots to unpack here, so let’s do a deep dive into the dataset! 🧵1/15
8
91
452
2
0
7
@KarlPertsch
Karl Pertsch
11 months
We analyzed the properties of the combined dataset! First, the number of datasets per robot embodiment: many academic labs use Franka robot arms, so we have many (smaller) Franka datasets and a long-tail of other robot embodiments! 3/
Tweet media one
1
3
5
@KarlPertsch
Karl Pertsch
1 month
This is great work! 38 fine-tuning tasks for every eval 🤯 thanks for sharing many ablations @giffmana and team! Also confirms our finding that vis encoder fine-tuning is required for finegrained spatial tasks like robot control! Any plans to release larger PaliGemma models? :)
@giffmana
Lucas Beyer (bl16)
1 month
✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye." So here's some of the many interesting ablations we did, check the paper tomorrow for more! 🧶
Tweet media one
20
117
857
1
0
6
@KarlPertsch
Karl Pertsch
11 months
Using the data is easy! All data is stored in tfrecords & we made a colab for visualizing & downloading the data (w/ examples for efficient data loaders)! Each dataset stores observations/actions in its “native” format & resolution, but it's easy to align&mix them on-the-fly! 8/
Tweet media one
1
0
5
@KarlPertsch
Karl Pertsch
10 months
We plan to expand the dataset over time and e.g. add more mobile manipulation and simulation data. If you have data that would be good to integrate, simulated or real, please fill out the form:
Tweet media one
0
0
6
@KarlPertsch
Karl Pertsch
8 months
We’re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines! Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this: 5/
Tweet media one
1
0
5
@KarlPertsch
Karl Pertsch
3 months
Big shoutout to Xuanlin ( @XuanlinLi2 ), Kyle ( @kylehkhsu ) and Jiayuan ( @Jiayuan_Gu ) for leading this project in a UCSD x Stanford x Google collab! For more details about our approach and results, please check out Kyle’s thread below!
@kylehkhsu
Kyle Hsu
3 months
[1/14] Real robot rollouts are the gold standard for evaluating generalist manipulation policies, but is there a less painful way to get good signal for iterating on your design decisions? Let’s take a deep dive on SIMPLER 🧵👇 (or see quoted video)!
2
14
56
1
0
5
@KarlPertsch
Karl Pertsch
11 months
We’re hoping to continue this momentum and keep growing the dataset 🚀! We’re still figuring out the details, but if you or your lab have data you’d like to contribute feel free to shoot an email to open-x-embodiment @googlegroups .com and we will get back to you! :) 13/
1
1
4
@KarlPertsch
Karl Pertsch
4 years
Bonus: with slight tweaks to our model we can make it predict semantic bottlenecks between start and goal. In this case our model learns to predict the subgoal *and* its temporal placement, allowing for non-even splits of the long-horizon problem.
1
0
5
@KarlPertsch
Karl Pertsch
2 months
The HuggingFace integration also means that OpenVLA supports all 🤗 magic out of the box, like LoRA fine-tuning, quantized inference etc etc (see paper for detailed analysis of these)! This makes billion-param models much more accessible in robotics, like it has in NLP! 6/
Tweet media one
1
0
5
@KarlPertsch
Karl Pertsch
11 months
The distribution of objects is diverse & reflective of objects a robot would encounter "in the wild”, like common furniture pieces, food items, appliances etc. There is still a long way towards real world diversity, but we hope that this dataset can build a good foundation! 7/
Tweet media one
1
0
4
@KarlPertsch
Karl Pertsch
2 months
Big shoutout to my co-leads @moo_jin_kim and @siddkaramcheti , and thanks to my advisors @chelseabfinn and @svlevine , and many others involved! Also thanks to @ToyotaResearch for providing the compute to enable this kind of open-source research! 9/9
1
0
5
@KarlPertsch
Karl Pertsch
5 months
In all seriousness though, being able to "program" *and* "debug" your robot in natural language will be tremendously useful when the job of teaching robots new skills is no longer done by machine learning experts in labs but end users in homes! Great job Lucy!! :)
1
1
5
@KarlPertsch
Karl Pertsch
2 months
Please check out Moo Jin’s thread for more details about OpenVLA — Moo Jin really carried the torch in this project, which was the first project in his PhD! Way to go Moo Jin! :)
@moo_jin_kim
Moo Jin Kim
2 months
✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇
18
164
678
1
0
5
@KarlPertsch
Karl Pertsch
1 month
When collecting your fine-tuning data, start with little variation in terms of objects, positions, scenes, backgrounds, camera angles, etc. It's easier to catch bugs in your robot pipeline this way. But, for best policy generalization, collect more diverse demo data later! 3/
1
0
5
@KarlPertsch
Karl Pertsch
4 years
Jun will present our work on augmenting RL w/ motion planners at @corl_conf today. Our RL agents learn to use motion planners for solving challenging manipulation tasks w/ many obstacles! Interactive Session: today, 11.10am PST. Led jointly by Jun ( @junjungoal ) & @YoungwoonLee .
0
1
5
@KarlPertsch
Karl Pertsch
8 months
Octo is only the first step towards building generalist robot policies and we’re planning to improve the models over time — larger sizes, more robot morphologies, RL etc etc — really excited to see how folks will use Octo! :) 8/
1
0
5
@KarlPertsch
Karl Pertsch
3 years
Enjoyed reading this paper! Still lots of work to do to get sufficiently diverse task distributions in more realistic domains like robotics, but many of the ideas on ranking multi-task agents & automated task curricula seem generally applicable!
@maxjaderberg
Max Jaderberg
3 years
Very excited to release our new work: Open-Ended Learning Leads to Generally Capable Agents. tldr; algorithm that dynamically shapes task distributions to train agents on huge task space, resulting in surprisingly general behaviour Thread: (1/n)
10
216
874
0
1
5
@KarlPertsch
Karl Pertsch
4 years
Join us in today's 9am PT poster session @NeurIPSConf to chat about hierarchical planning w/ goal-conditioned prediction models!
@svlevine
Sergey Levine
4 years
Tmrw (Tue 9/8 at 9 am PT) check out HEDGE at @NeurIPSConf : hierarchical planning with learned tree-structured models enables planning complex behaviors one subgoal at a time. w/ @KarlPertsch , @_oleh , @febert8888 , @chelseabfinn , @dineshjayaraman more->
Tweet media one
1
10
39
0
0
5
@KarlPertsch
Karl Pertsch
2 years
Joseph will talk about a lot of our skill-based learning works in the PRL workshop @corl_conf today! Starting at 11.30am NZ time (in ~30 mins) — join with this zoom link:
@JosephLim_AI
Joseph Lim
2 years
On my way to #CoRL2022 ! Poke me if you want to chat :) Also, drop by my talks today if you are interested in Skill-based Robot Learning!
2
1
21
0
0
5
@KarlPertsch
Karl Pertsch
4 years
We are presenting our work on keyframe-based prediction and planning at #L4DC today! We also made a video, so it will take you only 5mins to get a gist of the paper!👇 Website: Paper: Video:
1
0
5
@KarlPertsch
Karl Pertsch
1 month
Most importantly: 99% of applications will require *fine-tuning*, i.e. collect a small dataset <100 robot demos in your target domain & fine-tune OpenVLA on it. Why? OpenVLA needs to learn your robot's action space, camera setup etc. More on 0-shot usage at the end! 2/
1
0
4
@KarlPertsch
Karl Pertsch
1 month
Final thought re 0-shot: we want models that can control your robot out-of-the-box, just like downloading an LLM checkpoint. Current limitation: models are not conditioned on robot URDF / action space definition etc, so they need to learn it implicitly via fine-tuning. 9/
1
0
4
@KarlPertsch
Karl Pertsch
8 months
@yuramel2000 @berkeley_ai You can run inference comfortably on standard consumer GPUs. A 4090 will run our biggest model at ~13it/sec, a 2080Ti still at ~5hz I believe! The smaller model is even faster if you need high control frequency
0
0
3
@KarlPertsch
Karl Pertsch
11 months
Many authors were involved in this project! Special thanks to  @QuanVng for leading the overall project and managing everything masterfully! Tagging a few more co-authors @pannag_ @hausman_k @chelseabfinn @svlevine 14/
2
0
3
@KarlPertsch
Karl Pertsch
2 months
For more details about OpenVLA, please check out our paper, example scripts and codebase. And give the model a try :) Paper: Website & Code:
1
0
4
@KarlPertsch
Karl Pertsch
3 months
If you’re working on real robot foundation models, give SIMPLER a spin! Real robot datasets (OpenX): SIMPLER Paper: SIMPLER Website: 6/
1
0
4
@KarlPertsch
Karl Pertsch
11 months
Finally, we analyzed the distribution of skills & objects in the dataset based on the language annotations (~60% of the datasets have language instructions). While common pick-place tasks are most frequent, there is a long tail of interesting skills like wiping, assembling etc 6/
Tweet media one
1
1
3
@KarlPertsch
Karl Pertsch
3 months
@chris_j_paxton Not sure these scaling laws are meaningful since they aggregate across v/ different tasks. "Curve goes up" is also not surprising since they filtered the raw data for "curve goes up". We need scaling laws but a meta-study is likely not the way to do it w/o agreed upon benchmarks?
1
0
4
@KarlPertsch
Karl Pertsch
2 months
How does it perform? Better than any previous generalist robot policy: we test OpenVLA for controlling multiple robots “out-of-the-box” & outperform our own Octo model across the board. We also match or outperform RT-2-X, a 55B closed VLA — the key: a larger robot dataset. 3/
Tweet media one
Tweet media two
1
0
4
@KarlPertsch
Karl Pertsch
7 months
@ericjang11 Great video! Specializing the generalist models to shorten finetuning + data iteration cycles is clever! One Q: is the data that finetunes the specialized model well also sufficient to upstream the skill into the generalist? Do you need to collect more of the same "type" of data?
0
0
4
@KarlPertsch
Karl Pertsch
1 month
Once trained, you can load your fine-tuned OpenVLA model for inference easily via 🤗 AutoModel. NVIDIA 4090s offer the best inference speed / $$$. We also provide code for serving fine-tuned OpenVLA models remotely & query via API, if your fastest GPU isn't near the robot! 6/
Tweet media one
1
0
3
@KarlPertsch
Karl Pertsch
8 months
If we want to build truly “foundational” models for robotics we need to support the diversity of real robot setups! Despite the added flexibility, we find Octo's performance to be strong compared to RT-1X and even RT-2X + great during finetuning! 3/
Tweet media one
1
0
3
@KarlPertsch
Karl Pertsch
11 months
@jeasinema Agreed! While the initial models we trained are 2D-only, the dataset actually has a lot of multi-view / depth cam data, so lots of opportunity for better policies! No tactile data yet, but would be great to include going forward! Maybe @LerrelPinto can help with that? 😉
2
0
3
@KarlPertsch
Karl Pertsch
11 months
👆We also estimated the # of visually distinct scenes per dataset & find that this metric is well distributed across robots, w/ many embodiments contributing a significant fraction of the scene diversity. Ultimately, scene diversity may be more important than trajectory count. 5/
1
1
2
@KarlPertsch
Karl Pertsch
3 months
Main technical delta vs December-Octo is that we use GPT to generate paraphrases of all language instructions in our OpenX data mix for better language grounding (thanks @oier_mees for leading implementation of this feature!) We also show Octo finetuning to bimanual ALOHA now!
1
0
2
@KarlPertsch
Karl Pertsch
3 months
@SOTA_kke @XuanlinLi2 @kylehkhsu @Jiayuan_Gu If there was bad correlation, it would likely be the other way around: the policy works in real but doesn't in SIMPLER bc of real-to-sim gap. SIMPLER worked well for the policies we tested, but may not always hold eg for policies that are sensitive to visual imperfections.
1
0
3
@KarlPertsch
Karl Pertsch
3 months
@RemiCadene 💯 -- Thanks for the shoutout Remi! Small correction: we don't need to modify the policy at all, we're using open-source checkpoints in our exp, some straight from HuggingFace 😉. We simply make sure that the sim is realistic enough so policies trained on real data work in it!
1
0
2
@KarlPertsch
Karl Pertsch
1 year
And @ZoeyC17 and @Vikashplus 's GenAug
@Vikashplus
Vikash Kumar
2 years
Lack of scale & diversity in robot datasets is demanding a change towards scalable alternatives- LLMs, Sim2Real, etc. GenAug presents a powerful recipe for using text2image generative models to demonstrate widespread generalization of robot behaviors to novel scenarios. 🧵(1/N)
2
45
195
1
0
3
@KarlPertsch
Karl Pertsch
1 month
So lots of potential for making these models easier to use! Thanks for sticking around! Check out the OpenVLA announcement below if you missed it & give the model a try! Huge thanks again to my co-leads @moo_jin_kim and @siddkaramcheti ❤️
@KarlPertsch
Karl Pertsch
2 months
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X) ⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA 💻 Open-source PyTorch codebase 🤗 Models on HuggingFace 1/
3
62
375
0
0
3
@KarlPertsch
Karl Pertsch
8 months
@GlenBerseth Jup! We use 128 TPUv4 for ~14 hours for pre-training of Octo-Base, so depending on how many A100 you can use it may take a bit longer but should definitely be possible.
0
0
1
@KarlPertsch
Karl Pertsch
3 months
@RemiCadene Key is that we only modify *the sim environment* to mitigate control&visual gap, the policy is 1:1 the same we are using on the real robot & was trained long before this project even started :) So in some sense this sim eval is "for free", but requires care in setting up the sim
1
0
3
@KarlPertsch
Karl Pertsch
1 month
Before you roll out OpenVLA on your robot, it's very helpful to feed in a few training images & make sure the predicted actions match the training data. This helps to catch silent bugs in your inference pipeline, one of the most common sources of error! 7/
1
0
2
@KarlPertsch
Karl Pertsch
3 months
If you're interested in all the Octo details, see the release thread below All models and the new OpenX language paraphrases are on HuggingFace :)
@KarlPertsch
Karl Pertsch
8 months
3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step: Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source! 💻: /🧵
10
90
373
0
0
2
@KarlPertsch
Karl Pertsch
8 months
We’re releasing a tech report with lots of details on what worked and, importantly, what didn’t -- go check it out! 📜: 6/
1
0
2