Suraj Nair Profile
Suraj Nair

@SurajNair_1

2,107
Followers
449
Following
18
Media
243
Statuses

Founding team @physical_int building foundation models for robotics. Prev PhD @StanfordAILab

Palo Alto, CA
Joined September 2019
Don't wanna be here? Send us removal request.
@SurajNair_1
Suraj Nair
1 year
Successfully defended my PhD today! So grateful to my advisors @chelseabfinn and @silviocinguetta , committee members @MacSchwager , @DorsaSadigh , and @leto__jean , as well as all the collaborators, mentors, and supporters who've been there every step of the way!
@chelseabfinn
Chelsea Finn
1 year
Congratulations to @SurajNair_1 who defended his PhD thesis today! 👏 His work includes: - the R3M model: - using generative models to plan subgoals for long-horizon tasks: - data collection for offline RL:
Tweet media one
20
7
219
15
9
222
@SurajNair_1
Suraj Nair
2 years
Super excited to share my internship project @MetaAI ! We show that pre-training visual representations on diverse human videos and language enables efficient robotic manipulation. Check out the summary from @chelseabfinn below:
@chelseabfinn
Chelsea Finn
2 years
Excited to share new work on pre-training a single reusable representation for many robot manipulation domains & tasks R3M pre-trains on human videos & then learns w/ only ~10 min of demos of downstream task w @SurajNair_1 @aravindr93 @Vikashplus A Gupta
4
53
313
5
26
189
@SurajNair_1
Suraj Nair
1 year
Can robots leverage *retrieval* from offline datasets to learn new tasks efficiently? In our new work, Behavior Retrieval, we find that they can! Led by @du_maximilian , w/ @DorsaSadigh and @chelseabfinn . 🧵👇
Tweet media one
2
41
187
@SurajNair_1
Suraj Nair
4 months
Thrilled to be starting a new adventure at Physical Intelligence with some amazing colleagues and friends! Learn more:
@hausman_k
Karol Hausman
4 months
🚨 Big news 🚨 Together with a set of amazing folks we decided to start a company that tackles one of the hardest and most impactful problems - Physical Intelligence In fact, we even named our company after that: or Pi (π) for short 🧵
46
46
549
7
4
121
@SurajNair_1
Suraj Nair
3 years
Excited to share my first blog post! If you're interested in how robots can learn rewards from videos of humans and language, check it out 👇
@StanfordAILab
Stanford AI Lab
3 years
Where do the rewards for robotic reinforcement learning come from? In this blog post we explore how using crowdsourced language annotations and videos of humans, we can learn reward functions and enable them to generalize more broadly.
1
16
77
2
14
94
@SurajNair_1
Suraj Nair
3 years
Thrilled to share our new work on learning language-conditioned visuomotor skills on real robots! Paper: In collaboration with @_eric_mitchell_ , Kevin Chen, @brian_ichter , @silviocinguetta , @chelseabfinn 👇1/9
2
11
86
@SurajNair_1
Suraj Nair
4 months
Super excited that the DROID dataset is out -- huge shoutout to the whole team especially leads @SashaKhazatsky and @KarlPertsch ! Can't wait to see what the community can build with this dataset 🤖
@SashaKhazatsky
Alexander Khazatsky
4 months
After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices
5
78
304
0
1
22
@SurajNair_1
Suraj Nair
3 years
Excited to share our new work on learning reward functions for robots from in-the-wild videos of humans! By training on diverse human videos and a small amount of robot data, the learned reward function is able to generalize to new environments and tasks.
@chelseabfinn
Chelsea Finn
3 years
How can robots generalize to new environments & tasks? We find that using in-the-wild videos of people can allow learned reward functions to do so! Paper: Led by @_anniechen_ , @SurajNair_1 🧵(1/5)
3
28
153
0
4
23
@SurajNair_1
Suraj Nair
3 years
I'll be presenting our work on learning language-conditioned visuomotor manipulation skills tomorrow @corl_conf (Wed 11:30-12:30 GMT)! Drop by the poster or check out the paper () to learn more!
@SurajNair_1
Suraj Nair
3 years
Thrilled to share our new work on learning language-conditioned visuomotor skills on real robots! Paper: In collaboration with @_eric_mitchell_ , Kevin Chen, @brian_ichter , @silviocinguetta , @chelseabfinn 👇1/9
2
11
86
1
1
22
@SurajNair_1
Suraj Nair
4 years
Cool new article from @VentureBeat on our work on time reversal as self-supervision, to appear at @icra2020 !
@Kyle_L_Wiggers
Kyle Wiggers
4 years
Google AI researchers want to teach robots tasks through self-supervised reverse engineering
0
3
4
1
4
20
@SurajNair_1
Suraj Nair
1 year
Thrilled to share our new work on language-driven representation learning for robotics! We train a single model with many downstream capabilities from features for control to expression grounding and reward/intent inference. Led by the amazing @siddkaramcheti
@siddkaramcheti
Siddharth Karamcheti
1 year
How can we use language supervision to learn better visual representations for robotics? Introducing Voltron: Language-Driven Representation Learning for Robotics! Paper: Models: Evaluation: 🧵👇(1 / 12)
Tweet media one
5
95
398
0
2
18
@SurajNair_1
Suraj Nair
4 years
Can we learn dynamics models that are conditioned on goals, and only model goal-relevant quantities? We explore this question in our new work Goal-Aware Prediction, to appear at #ICML2020 at 7 AM/6 PM PDT tomorrow w/ @chelseabfinn @silviocinguetta Paper:
Tweet media one
0
3
16
@SurajNair_1
Suraj Nair
5 years
Our new work on hierarchical visual foresight is now available! Learns to break long-horizon visual manipulation tasks into subtasks through only self-supervision.
@chelseabfinn
Chelsea Finn
5 years
New paper: Hierarchical visual foresight learns to generate visual subgoals to break down a goal into smaller pieces. Accomplishes long-horizon vision-based tasks, without *any* supervision. w/ Suraj Nair @GoogleAI Paper: Code:
2
47
270
0
0
15
@SurajNair_1
Suraj Nair
3 years
Excited to share our new work on learning visuomotor skills which can be sequenced to complete long-horizon manipulation tasks! Led by @bohanwu_1 and to appear at #CoRL2021 .
@chelseabfinn
Chelsea Finn
3 years
RL allows robots to acquire skills, but are those skills good for the long-horizon tasks we ultimately care about? EMBR is an algorithm that learns & grounds skills, which can be sequenced to solve multi-stage tasks with @bohanwu_1 @SurajNair_1 @drfeifei
4
17
122
1
0
14
@SurajNair_1
Suraj Nair
4 years
Check out our new work led by @ashwinb96 and @brthananjeyan on safe reinforcement learning! Recovery RL combines offline learning and a decoupled task/recovery policy to enable safe learning from images on a real robot 🤖.
@ashwinb96
Ashwin Balakrishna
4 years
Safe learning is critical for using RL to learn tasks in the real world. We introduce Recovery RL: in collaboration with @brthananjeyan , @SurajNair_1 , Michael Luo, @krishpopdesu , @MinhoHwang6 , @mejoeyg , @julianibarz , @chelseabfinn , @ken_goldberg 👇(1/8)
2
7
35
0
1
14
@SurajNair_1
Suraj Nair
2 years
We're organizing this year's Deep RL workshop @NeurIPSConf ! We have an exciting program planned, including great speakers, Opinion Talks, and new Implementation Talks covering the tricks to get common RL algorithms working. Consider submitting your latest work (deadline Sept 22)!
@hausman_k
Karol Hausman
2 years
Super excited to announce this year’s Deep RL workshop @NeurIPSConf 🎉🎉🎓🎓 website: submission deadline: September 22nd, 11:59 PM PST Some exciting changes that we made this year in 🧵👇
1
26
154
0
0
13
@SurajNair_1
Suraj Nair
4 years
Check out our new paper on accelerating robotic exploration using weak human supervision!
@chelseabfinn
Chelsea Finn
4 years
Can robots learn to autonomously explore their environment? We introduce Batch Exploration with Examples (BEE) led by @anniee268 , Alex Nam, & @SurajNair_1 Thread👇 (1/8)
1
43
264
0
0
13
@SurajNair_1
Suraj Nair
5 months
As new VLMs get released everyday, its easy to get lost in the myriad of details and design choices each one adopts. With Prismatic VLMs we try an cut through the noise - presenting a modular VLM training codebase that enables rigorous investigation of the VLM design space.
@siddkaramcheti
Siddharth Karamcheti
5 months
What design choices matter when developing a visually-conditioned language model (VLM)? Check out our paper – Prismatic VLMs – and open-source training code, evaluation suite, and 42 pretrained VLMs at the 7B-13B scale! 📜 ⚙️ + 🤗
Tweet media one
6
52
196
3
1
13
@SurajNair_1
Suraj Nair
3 years
Check out our new work led by Bohan Wu on large scale video prediction for robot manipulation! Greedy training enables training larger models, which yields more accurate predictions and better planning performance on a real robot 🤖. To appear at #CVPR2021
@chelseabfinn
Chelsea Finn
3 years
Predicting video is a powerful self-supervised approach for robots to learn about the world Greedy hierarchical VAEs *can train larger models with less memory *surprisingly outperform end-to-end training w Bohan Wu @SurajNair_1 @RobobertoMM , @drfeifei 🧵
3
60
298
0
1
11
@SurajNair_1
Suraj Nair
2 years
Huge thanks to my co-authors @aravindr93 @Vikashplus @chelseabfinn and Abhinav Gupta for their support and advising. This project has been a really fun collaboration!
0
1
11
@SurajNair_1
Suraj Nair
1 year
Lastly, we see that behavior retrieval can significantly improve the robustness of learned policies. Check out a policy trained on toy pickles working on a real pickle:
1
1
10
@SurajNair_1
Suraj Nair
2 years
Thanks @jeremyhsu and @newscientist for covering our work on audio-visual learning for robotic manipulation!
@jeremyhsu
Jeremy Hsu
2 years
My first for @newscientist is about a robot learning to harness both sound and vision while searching for keys in a bag. Necessary skills for a complex world. Thanks to @chelseabfinn @du_maximilian @SurajNair_1 @olivia_y_lee at @StanfordAILab
0
3
10
0
1
10
@SurajNair_1
Suraj Nair
2 years
Excited to share our #RSS2022 paper! We learn policies end-to-end from vision and audio (from a gripper mounted microphone) to complete tasks with occlusion, like extracting keys from a bag. Kudos to @du_maximilian and Olivia Lee who led the project. Check out the summary below:
@chelseabfinn
Chelsea Finn
2 years
Can robots deal with occlusion? We put a microphone on a robot's gripper & found that audio helps robots learn to solve tasks amidst occlusion. #RSS2022 paper: w/ @du_maximilian , Olivia Lee, @SurajNair_1 🧵(1/4)
4
23
142
0
1
9
@SurajNair_1
Suraj Nair
4 years
I'll be presenting our work on visual subgoal generation for long-horizon tasks @iclr_conf tomorrow at 10 AM and 1PM PT!
@chelseabfinn
Chelsea Finn
4 years
Check out @SurajNair_1 ’s work on hierarchical foresight @iclr_conf ! HVF tackles long-horizon vision-based tasks without any human supervision. 10 am & 1 pm PT Weds at #ICLR2020 Poster & Spotlight: Website:
0
11
30
0
1
7
@SurajNair_1
Suraj Nair
2 years
Interested in how robots can learn from large and diverse data sources? Join us at the Workshop on Learning from Diverse, Offline Data at RSS! Check out the agenda and CfP here:
@siddkaramcheti
Siddharth Karamcheti
2 years
Diverse, representative data is becoming increasingly important for building generalizable robotic systems. We're organizing the Workshop on Learning from Diverse, Offline Data (L-DOD) at RSS 2022 (NYC/hybrid) to come together and discuss this!
Tweet media one
2
24
86
0
3
7
@SurajNair_1
Suraj Nair
3 years
Also big shout out to @ShibaniSan @jmschreiber91 @siddkaramcheti @ashwinb96 @_anniechen_ for invaluable feedback on the post.
0
0
6
@SurajNair_1
Suraj Nair
3 years
For more details, please see the Website: Paper: Code/Data: 9/9
0
0
5
@SurajNair_1
Suraj Nair
3 years
Lastly, we deploy our method on a real robot, with an offline dataset taken directly from the replay buffer of a different project and crowdsourced annotations. We find that the agent is able to complete a set of 5 language-specified visuomotor skills on the real robot. 8/9
Tweet media one
1
0
4
@SurajNair_1
Suraj Nair
4 years
Check out our new work on goal-aware prediction tomorrow at 7AM/6PM PDT at #ICML2020 !
@chelseabfinn
Chelsea Finn
4 years
In model-based RL, learning a global model of *everything* is really hard. Can we learn to model only what matters? We introduce: Goal-Aware Prediction (GAP) with @SurajNair_1 @silviocinguetta @StanfordAILab Thread ⬇️ (1/5)
Tweet media one
2
44
224
0
0
4
@SurajNair_1
Suraj Nair
1 year
@du_maximilian will be presenting Behavior Retrieval at #RSS2023 this week. Drop by Session 2 to learn more!
@SurajNair_1
Suraj Nair
1 year
Can robots leverage *retrieval* from offline datasets to learn new tasks efficiently? In our new work, Behavior Retrieval, we find that they can! Led by @du_maximilian , w/ @DorsaSadigh and @chelseabfinn . 🧵👇
Tweet media one
2
41
187
0
0
4
@SurajNair_1
Suraj Nair
9 months
@oier_mees @ToyotaResearch Thanks for visiting us and for the great talk @oier_mees !
1
0
3
@SurajNair_1
Suraj Nair
3 years
To scalably learn language-conditioned behavior on robots, we leverage pre-existing (potentially highly sub-optimal) robot datasets, such as autonomous exploration data, or replay buffers of trained RL agents. We then crowdsource natural language descriptions of each episode. 3/9
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
1 year
We find that behavior retrieval improves performance over other retrieval strategies and multi-task pre-training + fine-tuning approaches across a range of simulated and real robot manipulation tasks:
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
1 year
Qualitatively, we see that behavior retrieval indeed retrieves the relevant offline data to the new task, while discarding data that is not useful:
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
1 year
The key idea of behavior retrieval is that when learning a new task, task-specific data can inform what data from a broader offline dataset to query for learning. Behavior retrieval does this by pre-training a state-action embedding space:
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
1 year
Then when learning a new task, the agent can lookup similar (s,a) tuples in the offline data to make a larger and more diverse dataset for learning. A policy can then be trained on the combined data:
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
3 years
We additionally observe that by virtue of using pre-trained language models, the method is able to generalize to unseen rephrasings of tasks, including natural language, while only being trained on procedurally generated language. 7/9
Tweet media one
1
0
3
@SurajNair_1
Suraj Nair
1 year
Check out the details below: Paper: Website: Code:
2
1
3
@SurajNair_1
Suraj Nair
3 years
We can then combine the learned reward with a task-agnostic visual dynamics model to perform model predictive control to complete language specified tasks. 5/9
Tweet media one
1
0
2
@SurajNair_1
Suraj Nair
3 years
We find that this approach is able to learn from data collected by even a random policy, which may often not complete any task, outperforming both language-conditioned imitation techniques and goal image based task specification. 6/9
Tweet media one
1
0
2
@SurajNair_1
Suraj Nair
2 years
We're extending the deadline for the Deep RL workshop @NeurIPSConf to **Oct 3**!
@hausman_k
Karol Hausman
2 years
Are you working on @iclr_conf deadline and you need a few more days to finish up the experiments for Deep RL workshop @NeurIPSConf ? We got you! We're extending the deadline to October 3rd! Good luck with your submissions!
0
6
32
0
0
2
@SurajNair_1
Suraj Nair
3 years
Since the actions in the data may not be good enough to imitate, we propose to learn a language-conditioned reward function in the form of a classifier which predicts if transitioning between images completes an instruction. 4/9
Tweet media one
1
0
2
@SurajNair_1
Suraj Nair
2 years
Did you submit to NeurIPS on topics related to RL Pre-training/learning control from offline datasets? Consider submitting your work to the RSS Workshop on Learning from Diverse, Offline Data 🤖. Deadline extended to **May 27**!
@siddkaramcheti
Siddharth Karamcheti
2 years
Diverse, representative data is becoming increasingly important for building generalizable robotic systems. We're organizing the Workshop on Learning from Diverse, Offline Data (L-DOD) at RSS 2022 (NYC/hybrid) to come together and discuss this!
Tweet media one
2
24
86
0
0
2
@SurajNair_1
Suraj Nair
5 months
Thankful for a great team in @siddkaramcheti @ashwinb96 @percyliang @tkollar @DorsaSadigh -- it's been a fantastic collaboration between @StanfordAILab and @ToyotaResearch .
0
0
2
@SurajNair_1
Suraj Nair
3 years
Easily specifying tasks and acquiring rewards for learning can be challenging on real robots. Motivated by the flexibility and ease of natural language, we aim to learn language-conditioned rewards and skills on robots. 2/9
1
0
2
@SurajNair_1
Suraj Nair
5 months
We pair this training codebase with an evaluation suite () that compiles 11 established benchmarks into a common arena. Think we should add your benchmark? Send us a PR!
Tweet media one
1
0
1
@SurajNair_1
Suraj Nair
2 years
@crypto292929 We haven't tried it yet 🙂 , but I think it would be very interesting to see if pre-trained representations like R3M can be combined with planning algs. to complete these sorts of long horizon tasks
0
0
1
@SurajNair_1
Suraj Nair
5 months
We'd love to incorporate feedback and new features from the community into our codebases. I'm excited to see how the community can build on and leverage our work to further push the boundaries of VLM design!
1
0
1
@SurajNair_1
Suraj Nair
5 months
Finally, compiling our insights we are able to train a family of capable VLMs - Prisms - at the 7-13B scale that outperform open VLMs like LLaVA v1.5 with *the same training data and less compute*.
Tweet media one
1
1
1
@SurajNair_1
Suraj Nair
5 months
Leveraging our training and eval codebases allows us to conduct an investigation () into these different design choices, with several insights around multi-stage training, vision backbones, and more! We release *all 42 models* we trained in this study.
Tweet media one
1
0
1
@SurajNair_1
Suraj Nair
5 months
Our codebase () allows probing questions around optimization procedure, visual representations, language models, and scaling properties in a controlled way.
Tweet media one
1
0
1
@SurajNair_1
Suraj Nair
2 years
@infinitesimalo_ @chelseabfinn You definitely could try and use the language-alignment module zero-shot on another video dataset for classification/captioning. However our goal is on producing a good representation for robotic control, so we focus on that in our experiments.
0
0
1
@SurajNair_1
Suraj Nair
3 years
More info in Bohan's tweet below:
0
0
1