Prev PI of a Robot Learning Lab in London.
Postdoc
@UCBerkeley
w/
@pabbeel
.
PhD Imperial College London w/
@ajdDavison
. AI, Robotics, Machine Learning 🤖
Dyson Robot Learning Lab is Hiring full-timers and interns! 🤖
1x Research Scientist:
1x Data Engineer (Data Collection & ML Training):
3x PhD Internship:
Come join our lab of 12; located in London, UK 🇬🇧
We have a Fall PhD Internship opening at the Dyson Robot Learning Lab in London! 🇬🇧
Come and join our world-class team of robot learning gurus for 3-6 months!
Apply via form:
New paper! Not all reinforcement learning problems are suited for a Gaussian policy parameterization!🤯
Plan to use 3D rotation or 6D pose as part of an action space? Consider the Bingham distribution! 🧵1/5
Paper:
Code:
w/
@pabbeel
Following the success of Masked AutoEncoders (MAE), we all knew it was coming.... VideoMAE. The death of contrastive learning for video representation learning? I think so.
Props to the rapid pace of the authors.👏 MAE has only been out for ~3 months🤯
A year ago, I thought it obvious that meta-RL had a role to play in robotics... Now I’m no longer convinced!
Our large-scale study shows multi-task pretraining followed by fine-tuning on novel tasks, performs >= meta-RL!
Lead:
@ZhaoMandi
w/
@pabbeel
Sim-to-Real via Sim-to-Sim: we learn a generator that translates real-world images to a canonical simulation version to learn robot grasping with no real-world data!
w/ collaborators from
@GoogleAI
,
@Theteamatx
,
@DeepMindAI
We are happy to announce PyRep - a toolkit for rapid robot learning research! PyRep is a modification to V-REP that is ~10,000x faster vs the previous remote client approach! Work with
@coppeliaRobotic
&
@AjdDavison
.
Report:
Code:
Coarse-to-fine Q-attention has been selected for an oral at
#CVPR2022
!🙀
The first *general* 6DoF RL-based manipulation algorithm that can train in the real world in minutes (not days/months). High-res images, and no shaped rewards!
🚨Important update from our Robot Learning Lab in London.
Following recent news, we’re moving on after a wonderful 2 years…
Today, we unveil 4 big pieces of research from our incredible team. Check out the compilation video and thread below to see our final work! 📽️👇
Render and Diffuse (R&D) uses joint observation-action representation to learn low-level robot actions using a learnt diffusion process that iteratively updates virtual renders of robot actions leading to big sample-efficiency gains 💪
#RSS2024
Dyson RLL
Our new Sim2Real 6D Object Pose Estimation work!
1) Train pose estimation model on sim data.
2) Use model to generate poses on unlabelled real data.
3) Auto filter generated poses and update model.
4) Repeat 2-4.
Result: SOTA performance + robot demo! 🤖
Hierarchical diffusion policy is another step along the journey of making hierarchical next-best pose agents more capable, through introduction of a kinematically-aware low-level diffusion planner.🤖
New work from the Dyson Robot Learning Lab.
CVPR 2024
Announcing the 1st "Workshop on Pre-training Robot Learning" at
@corl_conf
, Dec 15.
Fantastic lineup of speakers: Jitendra Malik, Chelsea Finn, Joseph Lim, Kristen Graumen, Abhinav Gupta, Raia Hadsell.
Submit your 4-page extended abstract by September 28.
New work! We envision coarse-to-fine Q-attention as a tree that can be expanded and used to accumulate value estimates across the top-k voxels at each Q-attention depth.
Allows for more robust sparse-reward, vision-based manipulation! 🧵👇
w/
@pabbeel
2 life updates!
(1) Yesterday I passed my PhD viva! Thank you to my examiners Jens Kober and
@Petar_Kormushev
, and of course, my supervisor
@AjdDavison
!
(2) In June I begin a postdoc at
@pabbeel
's group at UC Berkeley!
Looking forward to continuing making robots see and do! 🤖
We are thrilled to announce RLBench: an ambitious large-scale benchmark and learning environment for vision-guided manipulation with 100 unique, hand-design tasks!
Paper:
Video:
w/
@StephenLJames
, Z. Ma, D. Arrojo,
@AjdDavison
'basketball_in_hoop'; one of many new tasks joining the
#RLBench
family of 100+ tasks in V1.2. Coming early November! 🤖
RLBench is still the hardest manipulation sim-benchmark to date due to its large-scale focus on vision, sparse rewards, and multi-stage tasks.
New! Learned Path Ranking (LPR) takes a Q-attention next-best pose, and learns to rank a set of goal-reaching paths generated by path planning, Bezier curve sampling, and a learned policy.
We can now accomplish more RLBench and real-world tasks!
w/
@pabbeel
New work! We generate 3D shape + segmentation from depth, probabilistically sampling occluded region proposals; giving robots the means to intuitively reason about partially observed scenes, allowing grasps without causing stacks to fall! Lead:
@LandgrafZoe
Attention-driven Robot Manipulation (ARM)🦾 is our new learning algorithm that can do RLBench tasks, while other baselines fail. Secret sauce is Q-attention🔎, which crops around interesting pixels before giving to an actor-critic method. w/
@AjdDavison
High-level agent predicts next-best pose (e.g. C2F-ARM/PerAct/etc), which is then fed into a kinematically aware diffusion policy.
Pros of this over C2F/PerAct is that it doesn't rely on motion planning; here we are essentially "learning" the motion planning component.
#CVPR2024
Hierarchical diffusion policy is another step along the journey of making hierarchical next-best pose agents more capable, through introduction of a kinematically-aware low-level diffusion planner.🤖
New work from the Dyson Robot Learning Lab.
CVPR 2024
🚨End-effector Redundancy (ER) Action Space for Robot Manipulation!
It has the *sample efficiency* of task-space (end-effector) control, but the versatility of joint control!
Now one of our de-facto action modes we use in the Dyson Robot Learning Lab!
Novel action spaces leveraging redundancy in 7 DoF arms enable efficient & precise learning in robotic manipulation 🤖 Current action spaces often fall short in human environments, where solving complex tasks requires avoiding obstacles & reaching confined spaces. 1/n
My proudest PhD work was action-value next-best pose (Q-attention). Really impressed to see so much cool work use this! PerAct, RVT, Act3D. What next?
Fun fact: it was first rejected, then resubmitted unchanged and got an oral at CVPR 22. Don't give up on work you believe in.
great work! It is not well known outside of the robotic visual manipulation community but these sort of action-value maps tend to be the most performant when it comes to high success rates for object manipulation.
StereoPose! Our new stereo RGB framework for category-level object pose estimation that works incredibly well for transparent objects!
The magic sauce? Back-view normalized object coordinate space (NOCS)! More below ⬇️.
📜
Lead: Kai Chen,
@CUHKofficial
We are now in full hiring mode!
Research Scientist in Robot Learning:
Simulation Engineer (Unity Game Dev):
Fancy moving from the games industry to robotics? This SimEng role could be for you!
Robot/ML Eng applications closed!
Very excited to be invited to talk at the 6th International Workshop on Recovering 6D Object Pose at
#ECCV2020
.
I'll be talking about multi-object reasoning, end-to-end
#manipulation
, and
#sim2real
. See you Sunday at 10.10 (UTC+1).
MoreFusion
#CVPR2020
is our new real-time and incremental pose estimation system (achieves SOTA) that builds an object-level map describing the full geometry of objects in scene; allows precise pick and place of cluttered objects. Work led by
@wkentaro_
.
New! Patch-based Object-centric Video Transformer.
We use object-centric information (bounding boxes) as a compressed representation for videos, giving improved computational efficiency on long-horizon video prediction.
Lead: Wilson Yan w/ Ryo O,
@pabbeel
PhD internship applications are now OPEN from spring 2023 onwards at the Dyson Robot Learning Lab in London. They can be undertaken at any time during the year!
Apply through this form⬇️
Today at
#CVPR2019
we are presenting Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks (RCAN). Come and see us at Poster 204!
Today at 3pm (BST), I'll be talking about sample-efficient robot learning at ETH Zurich.
Topics include Q-attention, model-based RL, and pre-training for robot control.
Come join! No registration required; link to zoom can be found here:
In our oral at
#CoRL2018
tomorrow, I will present Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations.
#robotics
#MachineLearning
#AI
#phdlife
If you haven't already, please check out what I worked during my
@Theteamatx
internship! Randomized-to-Canonical Adaptation Network (RCAN) is a real2sim image translator trained with domain randomization and achieves SoTA performance on robotic grasping!
Together with
@Theteamatx
and
@GoogleAI
, we have recently proposed the Randomized-to-Canonical Adaptation Network (RCAN): a real2sim image translator trained with domain randomization. It achieves SoTA performance on robotic grasping with no real data.
Woohoo! Super excited about this new work!
Multi-View Masked World Models (MV-MWM) gives super robust end-to-end manipulation with RGB input from a *hand-held camera*!
Looks like our camera operator had a few too many beers!🍻
@younggyoseo
thread below!
Excited to share Multi-View Masked World Models (MV-MWM) that learns multi-view representations and a world model for viewpoint-robust control!
MV-MWM enables zero-shot sim2real transfer with hand-held cameras, only using pixel observations 🚀
Our mission? Deploy advanced robots in household environments.
I'll be Moving into my full time Dyson role starting August! Big shoutout to my advisors (
@pabbeel
and
@AjdDavison
) who have inspired me over the years!
Full Dyson Robotics video reveal:
Video-language models (VLMs) are often used as a means to solve sparse rewarded tasks. The issue is, they kinda suck giving meaningful rewards 💩.
In this work, we question whether VLMs are perhaps best-suited
as a *pretraining* signal for RL.
LAMP💡:
Excited to share LAnguage Reward Modulation for Pretraining Reinforcement Learning!
LAMP💡pretrains a language-conditioned agent without human supervision using VLM rewards and unsupervised reinforcement learning
w/
@amberxie_
@carlo_sferrazza
@younggyoseo
@stepjamUK
@pabbeel
Insightful paper! They study the effect of different action spaces in deep RL. After evaluating multiple action spaces across three
#manipulation
tasks, they conclude that variable impedance control in end-effector space is the winner!
New work! Can Masked Autoencoders (MAE) be effective for visual model-based RL?
Yes! ✅
Our Masked World Model (MWM) decouples visual representations and dynamics by training a latent dynamics model on features from a pre-trained MAE.
Excited to share Masked World Models for Visual Control!
Inspired by MAE and World Models, we train an autoencoder with convolutional feature masking and reward prediction, then train a dynamics model in the latent space of the autoencoder.
Thread 🧵
Table-top instance grasping of objects using no real world grasping data! Method consists of a shape prediction model that learns a domain-invariant 3D point cloud representation of objects which is consequently used for grasping (via a critic network).
#Robotics
Over the next few weeks, I'll periodically post various tasks/features from
#RLBench
. First up, a shape sorting task! This one would be a great task to try for a sim-to-real project! Both vision and proprioceptive feedback needed for this one.
Fantastic news from
@coppeliaRobotic
: it now supports MuJoCo (in addition to the already supported Bullet, ODE, Newton, Vortex).
A reminder that RLBench is built around CoppeliaSim; no better tool out there for fast robot learning research iteration 🙂
Happy to announce a new release of the
#RoboticsSimulator
#CoppeliaSim
. Some features:
-
#MuJoCo
now also supported with dyn. content that can be modified on-the-fly. Bringing soft bodies, cables, etc. to CoppeliaSim
-
#python
also supported for embedded scripts
- ROS2 Humble
#RLBench
has been accepted to
@icra2020
! 🎉
We also have an active and growing community with >230 stars on GitHub! Keep the issues and pull requests coming! 🤖
We are thrilled to announce RLBench: an ambitious large-scale benchmark and learning environment for vision-guided manipulation with 100 unique, hand-design tasks!
Paper:
Video:
w/
@StephenLJames
, Z. Ma, D. Arrojo,
@AjdDavison
Papers like these are incredibly valuable to the
#robotics
community, at a time when end-to-end approaches are becoming more prominent. This paper compares ORB-SLAM to a learned system (what I like to call 'implicit
#SLAM
').
Paper by
@ducha_aiki
:
In our oral at
#CoRL2018
tomorrow, I will present Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations.
#robotics
#MachineLearning
#AI
#phdlife
New!
Auto-Lambda is a gradient-based meta learning framework that explores continuous, dynamic task relationships via task-specific weightings; achieving SOTA on CV and robotics tasks!
Honoured to be one of the first set of accepted papers at
@TmlrOrg
!
Lead by
@liu_shikun
Finally, after > year of hard work, we are excited to release the most ambitious Sim2Real project to date:
End-to-end off-road autonomous driving, trained in sim, and transferred to the real world.
@corl_conf
Amazing work lead by
@amberxie_
&
@johnrso_
.
One thing I've always found hacky with robot manipulation is the need to "disable" collision checking when interacting with objects (e.g. grasping, pushing, etc); now you don't have to!
We propose a new domain: Language-Conditioned Path Planning!
#CoRL2023
Excited to share our CoRL 2023 paper: Language-Conditioned Path Planning! ⚡️
Not all collisions are created equal! We address the limitations of traditional path planning by incorporating contact awareness into path planning.
w/
@YoungwoonLee
@pabbeel
@stepjamUK
Our new Sim2Real 6D Object Pose Estimation work!
1) Train pose estimation model on sim data.
2) Use model to generate poses on unlabelled real data.
3) Auto filter generated poses and update model.
4) Repeat 2-4.
Result: SOTA performance + robot demo! 🤖
Having robot learning blues after the end of
#CoRL2019
? Fear not! Our new paper takes sim2real further and explores how we can learn to one-shot imitate humans (using TecNets) without any real-world data during training!
New work!
1) Pre-train action-free video prediction from "wild" data (e.g. RLBench/youtube/etc).
2) Fine-tune action-conditioned video prediction to quickly learn vision-based RL in a new domain (e.g. MetaWorld).
Lead:
@younggyoseo
w/
@kimin_le2
,
@pabbeel
Can we leverage diverse out-of-domain videos for improving vision-based RL?
Yes! Excited to share APV that quickly learns world models 🌎 by fine-tuning a pre-trained action-free video prediction model.
Paper:
w/
@kimin_le2
@stepjamUK
@pabbeel
In 5 mins I will be talking about Next-best Pose Agents at the ICRA Workshop on 3D Visual Representations for Robot Manipulation.
I'll be covering a few pieces of work, including our recent CVPR work: Hierarchical Diffusion Policy 🤖
#ICRA2024
Hierarchical diffusion policy is another step along the journey of making hierarchical next-best pose agents more capable, through introduction of a kinematically-aware low-level diffusion planner.🤖
New work from the Dyson Robot Learning Lab.
CVPR 2024
Great talk by
@sainingxie
at
#CVPR2022
, reminding us that if you add in all the training bells & whistles that vision transformers get, then ConvNets can perform equally as well / better.
#ICRA2023
may be over, but that doesn't mean we have to stop talking about
#robots
, right?
My first time in York today to talk about "Hyper-efficient End-to-end Manipulation" at the
@UniOfYork
Robotic Manipulation Seminar. 🤖
#RLBench
tasks/features (3/n). This bin emptying task is one of the 100 tasks of RLBench. The goal is to move the objects from the grey bin to one of the specified coloured bins. The robot must infer the goal from either the textual description of the task or from the demos.
Great to be here in Atlanta for
@corl_conf
!
Join Amber Xie on Wednesday 5.15pm, where she will be presenting language-conditioned path planning --- a new way to do intelligent and intuitive path planning!
#CoRL2023
One thing I've always found hacky with robot manipulation is the need to "disable" collision checking when interacting with objects (e.g. grasping, pushing, etc); now you don't have to!
We propose a new domain: Language-Conditioned Path Planning!
#CoRL2023
Robot infers action models by observing teacher demonstrations which are subsequently used for Monte Carlo tree search. Tasks include: placing box inside box and taking cereal out of cabinet.
Paper:
Tim Welschehold, ...,
@wolfram_burgard
We present End-to-End Egospheric Spatial Memory (ESM) at
#ICLR2021
, today 9-11am PST!
Working with a wrist camera on a manipulator? ESM represents the sequence data as an ego-sphere 'memory' around the camera, outperforming LSTM and NTM on IL and RL tasks!
We are releasing Ivy!
A rich set of functions (compatible with TensorFlow, PyTorch, Jax, MXNet, and Numpy) that complements your research in vision, robotics, and more!
Ivy can significantly reduce LoC for rapid prototyping.
Code below is how I like to use Ivy!
We are excited to release Ivy, a new open source Deep Learning framework!
Ivy unifies the syntax and call signatures of existing frameworks.
Write your code once in Ivy, and support all frameworks simultaneously.
Links to Paper, Code and Docs here:
#RLBench
tasks/features (2/n). RLBench comes with support for domain randomisation of both visual and dynamics with 1 line of code, allowing for rapid sim-to-real experiments!
Today we'll be presenting R&D at
#RSS2024
during the Imitation Learning session at 8:30 am -- come by and visit our poster afterwards if you are around! 🤖
We have released TECO! A video prediction model that excels in generating long, temporally consistent videos in complex 3D scenes!
Work led by
@WilsonYan8
📜
🌐
Excited to announce TECO, an efficient video prediction model that can generate long, temporally consistent video for complex datasets in 3D scenes such as DMLab, Minecraft, Habitat, and real-world video from Kinetics!
📜
🌐
(🧵)
Workshop on Pre-training Robot Learning final call for papers!
The second and final submission window closes today (Oct 26th) at 11:59PM UTC! Submit your 4-page extended abstract; virtual-only presentations also accepted!
Announcing the 1st "Workshop on Pre-training Robot Learning" at
@corl_conf
, Dec 15.
Fantastic lineup of speakers: Jitendra Malik, Chelsea Finn, Joseph Lim, Kristen Graumen, Abhinav Gupta, Raia Hadsell.
Submit your 4-page extended abstract by September 28.
This study took ~year to complete. We had no horse in this race, and tried to make it as fair as possible.
We hope it's useful to the community, and hope that it encourages future meta-RL papers to include a simple multi-task baseline; you may be surprised how well it works! 😉
Released the code of our paper, MoreFusion
#CVPR2020
! It contains all of the software for training/evaluation of pose estimation network, joint pose refinement, and online demonstration with an RGB-D camera/robot.
The successor to RLBench?? This is a (very hard) robot learning benchmark for mobile (biped) bi-manual manipulation. Like RLBench, it comes with vision, demos and sparse rewards!
🚀 Looking for a benchmark for bi-manual mobile manipulation with nicely collected demonstrations? We are excited to release BiGym, a new benchmark with human-collected demos!
🌐 Website:
📄 Paper:
💻 Code:
Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4)
The gist of our paper is to show *how* we can leverage the power of Bingham for RL! 💪
When evaluating our approach on the Wahba problem and a set of vision-based robot manipulation tasks from RLBench, we achieve superior performance over a Gaussian parameterization! 🧵5/5
Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations?
Introducing 𝗚𝗘𝗡𝗜𝗠𝗔
paper, videos, code, ckpts:
🧵Thread⬇️
Collect all your robot data with a green screen, and then apply Chroma Keying. Leads to policies that can generalize to ANY visually distinct location (scene)! The first example of a truly (scene) general robot learning policy?
🚀 We are excited to announce GreenAug (Green-screen Augmentation), a physical visual augmentation method for robot learning algorithms! GreenAug enables generalisation to unseen visually distinct locations (scenes). In collaboration with
@TinkerSumit
@yusufma555
@stepjamUK
(1/6)
@AjdDavison
RLBench can facilitate research in: few-shot learning, reinforcement learning, imitation learning, multi-task learning, geometric computer vision, etc.
Observations: rgb, depth, and segmentation masks from an over-the-shoulder stereo camera and an eye-in-hand monocular camera.
Quaternions are often used as the output rotation
representation when using deep networks, but due to their antipodal symmetric property, sampling a quaternion from a Gaussian doesn't seem appropriate. Here comes the Bingham distribution to the rescue! 🧵2/5
Uber has announced Atari Zoo () which is collection of trained
#reinforcementlearning
agents, along with this awesome visualisation tool: . Super cool, worth checking out!
This is the first work (I'm aware of) that does sim2real via iterative self-training!
Great collaboration between UC Berkeley and the Chinese University of Hong Kong.
Lead: Kai Chen
My final PhD project from the Dyson Robotics Lab at Imperial College London.
Work with
@wkentaro_
, Tristan Laidlow, and
@AjdDavison
.
2 new papers extending Q-attention coming out very soon... 😉
Released our work on object extraction: SafePicking
#ICRA2022
! This robotic system finds and “safely” extracts target objects via known object mapping with an onboard camera. Check out the project page for paper/code:
Work with
@stepjamUK
@AjdDavison
(1/n)
Quadrupedal robot (
#ANYmal
) learns to walk and recovering from falling in complex configurations in simulation, with zero-shot transfer to the real world! I went to Jemin's
#PhD
defence, and was very impressed by the physics simulation he had created.
Finally taken the plunge to Twitter, as I seem to be the only PhD student who does not use it. Will try to be fairly active, but no promises.
#PhD
#Robotics
#MachineLearning
Ivy was used to implement our recently accepted ICLR 2021 paper: End-to-End Egospheric Spatial Memory (ESM).
ESM encodes the scene into an egosphere, which travels with an agent.
Great work led by
@DanielLenton1
Site:
Video:
Another surprising result: we can solve novel vision-based, sparse-reward manipulation (test-time) tasks *without* any demonstrations, by pretraining on only 10 tasks!
This is a big win for robotics -- we may need demonstrations for pre-training, but not for fine-tuning!
I have quite a few qualms with MuJoCo, and this is certainly one of the main ones. Might be a good time for folks to check out
#RLBench
for benchmarking their manipulation algorithms. 😉 ()
Uses Bullet under the hood.
Glad to see some discussion about the rejection of the use of proprietary tech (MuJoCo, Unity etc) in public benchmarks, while open source alternatives such as PyBullet are available:
Indexing Datasets of 3D Indoor Objects: A blog post that discusses the strengths and limitations of various 3D datasets that have been released over the past 15 years.
Finally, after > year of hard work, we are excited to release the most ambitious Sim2Real project to date:
End-to-end off-road autonomous driving, trained in sim, and transferred to the real world.
@corl_conf
Amazing work lead by
@amberxie_
&
@johnrso_
.