What state representation should robots have? 🤖 I’m thrilled to present an Any-point Trajectory Model (ATM), which models physical motions from videos without additional assumptions and shows significant positive transfer from cross-embodiment human and robot videos! 🧵👇
Excited to share my first paper at UC Berkeley! We identify key bottlenecks in learning from a pre-trained visual representation and show generalization to novel objects from only three instances.🧵⬇️
Website:
Excited to share our
#CoRL2020
paper where we present SoftGym, the first benchmark for deformable object manipulation and show that the tasks present great challenges for RL.
Project page:
w/
@YufeiWang15
Jake Olkin,
@davheld
.
@corl_conf
For smoothing crumpled cloths, we found that a mesh based graph neural network achieves better performance, generalization to novel shapes and materials, and makes it easy to transfer to the real world!
To appear at
#CORL2021
@CMU_Robotics
Website:
Object-centric representations and hierarchical reasoning are key to generalization. How can we manipulate deformables, where “objectness” changes over time? Our method finds a way and solves challenging real-world dough manipulation tasks!
#CoRL2022
Robotic manipulation of deformable objects like dough requires long-horizon reasoning over the use of different tools. Our method DiffSkill utilizes a differentiable simulator to learn and compose skills for these challenging tasks.
#ICLR2022
Website:
I’ll be in Atlanta for
#CoRL2023
and will present our recent works on SpawnNet (
#NeuRL4RM
workshop) and GELLO (
#TGR
) tomorrow.
Also excited to share that I’m on the job market, looking for tenure-track positions in AI and robotics. Would love to chat about potential fit!
Excited to share my first paper at UC Berkeley! We identify key bottlenecks in learning from a pre-trained visual representation and show generalization to novel objects from only three instances.🧵⬇️
Website:
2/2 papers accepted at
#RSS2024
🥳. Huge congratulations to my incredible collaborators!
Check out our work on trajectory modeling from videos:
and humanoid benchmark for whole-body control:
Predicting future point trajectory can serve as a "language" for actions, bridging different embodiments. It's exciting to see new work advancing in this direction!
If you are interested, also also check out our prior ATM paper:
Track2Act: Our latest on training goal-conditioned policies for diverse manipulation in the real-world. We train a model for embodiment-agnostic point track prediction from web videos combined with embodiment-specific residual policy learning
1/n
Excited to share my first paper at UC Berkeley! We identify key bottlenecks in learning from a pre-trained visual representation and show generalization to novel objects from only three instances.🧵⬇️
Website:
How should we combine multiple auxiliary tasks to accelerate RL? Check out our
#NeurIPS2019
paper that provides a principled method in this direction:
Paper:
Code:
@davheld
@HarjatinS
Humanoids 🤖 will do anything humans can do. But are state-of-the-art algorithms up to the challenge?
Introducing HumanoidBench, the first-of-its-kind simulated humanoid benchmark with 27 distinct whole-body tasks requiring intricate long-horizon planning and coordination.
🧵👇
We have released the hardware files and instructions for GELLO 🦾! See the updated website . Here's a video of me assembling it start to finish in 30 min! GELLO is also at CoRL this week, see 👇 for details
🎉Excited to share a fun little hardware project we’ve been working on. GELLO is an intuitive and low cost teleoperation device for robot arms that costs less than $300. We've seen the importance of data quality in imitation learning. Our goal is to make this more accessible
1/n
4/5 Our work is enabled by recent advances in video tracking. We build on top of the great works from CoTracker (
@n_karaev
@chrirupp
) and Tracking-Any-Point by
@CarlDoersch
et al.
Self-occlusion is a challenging problem in cloth manipulation. Come and check out our recent paper presented at
#RSS2022
tomorrow. Lead by the wonderful
@ZixuanHuang15
How can we enable a robot to explicitly reason about occlusions for better cloth manipulation? Check out our
#RSS2022
paper which proposes a self-supervised test-time finetuning method for reconstructing crumpled clothes.
w/
@davheld
,
@xingyu2017
Website:
1/5 Our goal is to improve policy learning from video data, a rich and scalable source. Since videos lack explicit actions, we focus on learning to predict the future trajectories of any set of particles based on their initial 2D positions, circumventing the need for actions
Generating diverse tasks/scenes is always a time-consuming part when building simulation environments. Very excited to see generative models being used to scale up the diversity in simulation!
Can GPTs generate infinite and diverse data for robotics?
Introducing RoboGen, a generative robotic agent that keeps proposing new tasks, creating corresponding environments and acquiring novel skills autonomously!
code:
👇🧵
(better with audio)
3/5 By modeling the low-level particle trajectories, we find significant positive transfer from videos of humans or from a different robot! Our current model is trained from relatively in-domain videos. Stay tuned for developments on a more generalized model!
Excited to share my first paper at UC Berkeley! We identify key bottlenecks in learning from a pre-trained visual representation and show generalization to novel objects from only three instances.🧵⬇️
Website:
The Internet is too fast, I’m still crafting my catchy twits, and word is already out😂 Well then, now you have it:
RoboNinja🥷: Learning an Adaptive Cutting Policy for Multi-Material Objects
🧵👇 for a few interesting details you might have missed
2/5 Once the trajectory model is trained, we learn trajectory-guided policies. We simply look at the trajectories of points from a fixed grid. We do not assume any calibration and our model utilizes cameras of different viewpoints.
Visual pre-training on internet-scale vision datasets has the potential to enable generalizable manipulation. But tasks in prior works have limited variations. A recent paper from
@ncklashansen
shows that simple data augmentation is competitive to the SOTA visual pre-training
The difficulty of a manipulation task is largely defined by the amount of task variations the robot needs to handle, such as object geometries and poses. Glad to see new robot demos with more diverse objects!
1X’s mission is to create an abundant supply of physical labor through androids that work alongside humans. We're excited to share our latest progress on teaching EVEs general-purpose skills. The following is all autonomous, all 1X speed, all controlled with a single set of
@ncklashansen
In this project, we build a set of challenging tasks where policies are trained on a few instances and evaluated on held-out, novel objects from the same category.
Check out our work on learning closed-loop dough manipulation: we use a differentiable reset module to avoid local optima from gradient-based trajectory optimization!
#RAL2022
#IROS2022
w/ Xingyu Lin
@Xingyu2017
, and David Held
@davheld
.
@ncklashansen
We propose a novel architecture for learning from pre-trained networks that address the key bottleneck of a frozen pre-trained representation. Our method is very simple but shows significant and consistent improvements over prior works!
3/7 This latent set representation has two benefits: one is to effectively model the change in the number of components during an episode, e.g. a piece of dough being cut into two. The second benefit is to enable compositional generalization to more components at test time.
2/7 To solve long-horizon tasks, we reason over a spatial and temporal abstraction. We obtain a spatial abstraction by clustering the points into different components based on their proximity in space. We encode each component to a latent representation to obtain a latent set.
5/7 Given an observed and target point cloud, we encode them into latent sets and then plan latent subgoals by optimizing a combination of the feasibility scores and the cost to chain the skills.
7/7 Our method outperforms all the baselines in simulation and excels at tasks (CutRearrangeSpread, CRS-Twice) where there are more components and planning steps at test time.
1/7 Given some skill demonstration trajectories (obtained by differentiable trajectory optimization), we first learn goal-conditioned policies from these trajectories via BC+HER. Each skill uses one tool to manipulate the dough and needs to be chained to solve multi-stage tasks.
The resulting method is named PASTA: PlAnning with Spatial-Temporal Abstraction. It can reason over long horizon tasks. We transfer the planner to the real world without any fine-tuning.
4/7 To chain skills, we also learn temporal abstraction modules: one feasibility predictor per skill (predicts the likelihood of reaching one state from another using the learned skill) and a cost function. Both modules take the set representation of the observation and the goal.
@servo_boyd
You are right. Given the limited video data we are training, I do not expect the model to generalize across large viewpoint variation. Minor camera jittering might be fine.
@m0hitsharma
Hey Mohit, thanks for the pointer! sim2real transfer is another benefit of using pre-trained networks, while our paper focuses more on categorical generalization.
1/5 It is not easy to manually define skills such as spreading or gathering dough using tools. As such, we run gradient-based trajectory optimization in a differentiable simulator to solve for trajectories that can reach short-horizon goals.