Wenlong Huang @wenlong_huang Twitter profile | Pikagi

Pikagi

Wenlong Huang

@wenlong_huang

2,907

Followers

938

Following

22

Media

381

Statuses

PhD Student @StanfordSVL @StanfordAILab . Previously @Berkeley_AI @GoogleDeepMind . Robotics, Foundation Models.

Stanford, CA

https://t.co/0BkgpecLw4

Joined May 2019

Don't wanna be here? Send us removal request.

Pinned Tweet

@wenlong_huang

Wenlong Huang

8 days

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

12

87

448

Last Seen Profiles

@aytobolanos

@cartervvv

@MPF_RS

@easysivans

@52522703A

@K12_owens

@charmingfoxy

@TanukiCTO

@jenma1661871

@sciar_suel

@ThemSuspect

@chucker317

@Laser_kai30fps

@Carollemosph

@smhotelx2

@devon_haye96071

@trade_with_kayn

@Reina_0126xoxo

@NTRKASIDASI

@hppyc__

@vicenttianfr

@carmhuntress

@ydshubham09

@coachba81

@moareditzzz

@SREescucha

@Bob770018398958

@WetStefa

@AH_Innovation

@LillBro2

@tilmun_1

@ColschJose43690

@sakimichanmale

@chnlrn162672

@aquileshervas

@Ellen_K8143

@wenlong_huang

Wenlong Huang

1 year

How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇

10

143

584

@wenlong_huang

Wenlong Huang

2 years

Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”? Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: 🧵👇

6

85

455

@wenlong_huang

Wenlong Huang

2 years

Thrilled to announce that I will join @Stanford for my PhD! Extremely grateful to @pathak2206 @IMordatch @pabbeel for years of amazing mentorship and Zhuowen Tu for introducing me to AI research. Looking forward to tackling interesting problems in robotics and AI @StanfordAILab !

18

7

299

@wenlong_huang

Wenlong Huang

4 months

Very well-written thread about LLM in robotics! My 2 cents is: robotics requires a full-stack approach - whether it's symbolic or LLM or hybrid planners, one has to think about the abstractions they operate in, especially pertaining to closely-tied perception-action loops. 1/N

@chris_j_paxton

Chris Paxton

@chris_j_paxton

4 months

One of the most interesting questions to me right now is: can LLMs plan, why/why not, and to what extent do we care about this, especially as it pertains to robotics?

20

33

261

1

17

84

@wenlong_huang

Wenlong Huang

3 years

Really enjoyed the interview with Yannic -- he had many interesting and insightful questions! I've also been a big fan of his channel! Project Website: w/ @pabbeel @pathak2206 @IMordatch

@ykilcher

Yannic Kilcher 🇸🇨

3 years

GPT-3 "knows" so much about the world, but how can we get that knowledge into a structured and usable form? Today's Video: Language Models as Zero-Shot Planners w/ first author Wenlong Huang ( @wenlong_huang )! Super interesting & many potential use cases 💪

Tweet media one

3

22

109

1

8

51

@wenlong_huang

Wenlong Huang

1 year

Thanks @_akhaliq for sharing! The full thread can be found here:

@_akhaliq

AK

1 year

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models paper page: Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and

1

22

85

2

7

43

@wenlong_huang

Wenlong Huang

2 years

If we can debug our robots by reasoning, can we use LLMs to emulate such process too? Following up on language planner () & SayCan (), we study how closed-loop feedback enables LLM to correct policy failures in long-horizon tasks🧵👇

@hausman_k

Karol Hausman

2 years

Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too! Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇

24

167

895

0

5

41

@wenlong_huang

Wenlong Huang

8 months

Been thinking about this for a while: - CLIP finds language-conditioned features but they're also bottlenecked by language - DINO attends to rich visual details but lacks semantics - An image is worth a thouand words. We need MLLMs w/ better vision Really great work - congrats!

@sainingxie

Saining Xie

8 months

Multimodal LLMs have been shown to err in complex, OOD, and edge-case scenarios. Yet, we have identified a systematic method for pinpointing visual errors in these models even when they are posed with *very basic* questions, using just common images from ImageNet and LAION. 🧵

Tweet media one

Tweet media two

Tweet media three

Tweet media four

6

70

368

1

2

42

@wenlong_huang

Wenlong Huang

1 year

Very excited to see Code as Policies got Outstanding Paper Award in Robot Learning #ICRA2023 !! Big congrats to Jacky!!

@jackyliang42

Jacky Liang

1 year

Super happy to share Code as Policies received the Outstanding Paper Award in Robot Learning #ICRA2023 @ieee_ras_icra !! Big thank yous to my collaborators @wenlong_huang @xf1280 @sippeyxp @hausman_k @brian_ichter @peteflorence @andyzengtweets y'all rock 🎉🎉

Tweet media one

11

16

178

2

3

38

@wenlong_huang

Wenlong Huang

1 year

Glad to see Inner Monologue () is running behind the scene for every Bing query 😄. A LLM agent that not only engages in a dialogue between the user and the Internet, but also importantly, itself!

@StudentInfosec

tuneworm (Joaquin Castellano)

@StudentInfosec

1 year

After interpreting the message, Bing runs an internal command called #inner_monologue . In here it decides on the language for the message, and how to generate its response — whether it’s necessary to perform a web search, or if it should provide product ads

Tweet media one

1

4

59

0

3

34

@wenlong_huang

Wenlong Huang

10 months

@chenwang_j @RuohanZhang76 @YunzhuLiYZ @jiajunwu_cs @drfeifei @StanfordSVL @StanfordAILab Excited to share that we have open-sourced the code based on RLBench: We will also be presenting VoxPoser at CoRL next week as an oral presentation. See you in Atlanta! #CoRL @corl_conf

Tweet card media

GitHub - huangwl18/VoxPoser: VoxPoser: Composable 3D Value Maps for Robotic Manipulation with...

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models - huangwl18/VoxPoser

0

6

34

@wenlong_huang

Wenlong Huang

2 years

Work done during my internship @GoogleAI with an amazing team @xf1280 @shahdhruv_ @DannyDriess @andyzengtweets @Yao__Lu @peteflorence @IMordatch @svlevine @hausman_k @brian_ichter Website: Paper: Video:

0

4

27

@wenlong_huang

Wenlong Huang

2 years

Excited to see Inner Monologue is covered by @twominutepapers ! Using language as common interface, we show how human and different robot modules can talk to each other, enabling closed-loop planning. This was done during my internship with an amazing team at Google Robotics!

@twominutepapers

Two Minute Papers

@twominutepapers

2 years

New Video - Google’s New Robot: Your Personal Assistant! 🤖

1

2

16

1

4

25

@wenlong_huang

Wenlong Huang

2 years

Beyond task planning, can LLMs generate robot policy code that exhibits spatial-geometric reasoning ("draw 5cm hexagon around apple"), and leverages code logic ("go in a 1.5m square until you see a coke"), all given a language instruction and without any additional training? 🧵👇

@jackyliang42

Jacky Liang

2 years

How can robots perform a wide variety of novel tasks from natural language? Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at long 🧵👇

17

148

666

1

3

25

@wenlong_huang

Wenlong Huang

4 months

Very incredible to see how capable VR-controlled robot hands can be. While there is a lot of debate on grippers vs hands, why not think of hands as just grippers that offer more redundancy and stability? Congrats on the great work!

@ToruO_O

Toru

4 months

Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies? Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌

7

75

280

1

2

23

@wenlong_huang

Wenlong Huang

3 years

Excited to share the fun project I've been working on! We explore actionable knowledge contained in GPT-3/Codex. A super early but promising step towards realizing intelligent robots that perform complex human activities! Project Website:

@pathak2206

Deepak Pathak

3 years

LLMs like GPT-3 and Codex contain rich world knowledge. In this fun study, we ask if GPT like models can plan actions for embodied agents. Turns out, with apt sanity checks, even vanilla LLMs without any finetuning can generate good high-level plans given a low-level controller.

10

164

992

0

0

22

@wenlong_huang

Wenlong Huang

2 years

Very excited to see a further step in using LLMs for long-horizon planning with real-world robots! Our previous work in this direction:

@hausman_k

Karol Hausman

2 years

Super excited to introduce SayCan (): 1st publication of a large effort we've been working on for 1+ years Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions

19

287

1K

1

1

21

@wenlong_huang

Wenlong Huang

1 year

Data is key for generalization, but robot data is scarce and expensive. Instead of training policies on labeled data, VoxPoser uses LLM+VLM to compose 3D value maps using generated code. Then 6-DoF actions are synthesized by motion planners, all w/o any training or primitives.

Tweet media one

1

4

21

@wenlong_huang

Wenlong Huang

1 year

Given free-form instructions + RGB-D obs, LLM orchestrates perception calls to VLM and array operations to assign continuous values to voxel map, showing *where to act* and *how to act*. It also parametrizes rotation, velocity, and gripper actions for a complete SE(3) trajectory!

1

2

20

@wenlong_huang

Wenlong Huang

1 year

For more, check out 👇 Project site: Walkthrough video: Paper: Work done w/ @chenwang_j , @RuohanZhang76 , @YunzhuLiYZ , @jiajunwu_cs , and @drfeifei at @StanfordSVL & @StanfordAILab .

Tweet card media

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with...

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models (CoRL 2023 Oral)Authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, ...

www.youtube.com

4

7

19

@wenlong_huang

Wenlong Huang

2 years

Super exciting project led by @DannyDriess - a 562B embodied multimodal language model, trained to be grounded! Important signal that language is a universal generalization interface, across text, vision, and robot planning 📚🌁🤖 Check out the deep dive by @DannyDriess 🧵👇

@DannyDriess

Danny Driess

2 years

What happens when we train the largest vision-language model and add in robot experiences? The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website:

32

522

2K

0

0

18

@wenlong_huang

Wenlong Huang

1 year

Extremely impressive work pushing the boundary of what robot hands can achieve for human-like long-horizon tasks, especially cool when seeing it in the real world!

@chenwang_j

Chen Wang

1 year

How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks? Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure. Introducing our new work - Sequential Dexterity 🧵👇

26

91

470

1

3

18

@wenlong_huang

Wenlong Huang

2 years

Prior work SayCan () grounds LLM for robots using affordances. But instead of speaking with full vocab, LLM only ranks across pre-set skills. Imagine scratching your head through 700+ choices or O(billions) of natural language choices. How can we do better?

Tweet card media

Project page for Do As I Can, Not As I Say: Grounding Language in Robotic Affordances.

say-can.github.io

@hausman_k

Karol Hausman

2 years

Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate! Generalizes to new tasks✅ Robust to new environments and objects✅ Fast inference for real time control✅ Can absorb multi-robot data✅ Powers SayCan✅ 🧵👇

62

549

2K

1

1

16

@wenlong_huang

Wenlong Huang

4 months

Simulation is a scalable data source that fuels progress in contact-rich manipulation, but sim2real is hard due to contact-modeling, perception, etc - great work by @YunfanJiang shows how this can be transformed seamlessly into an imitation learning problem to tackle them all!

@YunfanJiang

Yunfan Jiang

4 months

Does your sim2real robot falter at critical moments 🤯? Want to help but unsure how, all you can do is reward tuning in sim 😮‍💨? Introduce 𝐓𝐑𝐀𝐍𝐒𝐈𝐂 for manipulation sim2real. Robots learned in sim can accomplish complex tasks in real, such as furniture assembly. 🤿🧵

16

44

187

1

2

15

@wenlong_huang

Wenlong Huang

7 months

Reducing the need for in-the-wild robots for data collection is critical for breaking free of the chicken & egg dilemma *before* we can actually deploy general-purpose robots. Amazing work!

@chichengcc

Cheng Chi

7 months

Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

44

353

2K

1

2

13

@wenlong_huang

Wenlong Huang

6 months

One thing I particularly like about using foundation models for robotics is the *in-the-wild generalization* they provide. Very exciting to see what VLMs can offer here!

@fangchenliu_

Fangchen Liu

6 months

Can we leverage VLMs for robot manipulation in the open world? Checkout our new work MOKA, a simple and effective visual prompting method!

12

43

206

0

0

14

@wenlong_huang

Wenlong Huang

1 year

LLMs show emergent abilities at scale – same applies to VoxPoser, but on physical behaviors! It can conduct physics experiments, have behavioral commonsense, listen to your fine-grained correction, come up with multi-step visual program, and more.

Tweet media one

2

2

14

@wenlong_huang

Wenlong Huang

1 year

We verified VoxPoser in everyday manipulation tasks in the wild, including articulated and deformable object manipulation. All the results here are synthesized with zero-shot execution.

2

3

13

@wenlong_huang

Wenlong Huang

6 months

An incredible project led by @chenwang_j that makes collecting robotic data in the wild as seamless as a breeze - not only for simple tasks but also for those everyday tasks requiring human-level dexterity, coordination, and precision!

@chenwang_j

Chen Wang

6 months

Can we use wearable devices to collect robot data without actual robots? Yes! With a pair of gloves🧤! Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands Everything open-sourced

22

136

620

1

0

13

@wenlong_huang

Wenlong Huang

1 year

Just toss your objects too! VoxPoser is robust to disturbances because it replans actions in *real-time* with visual feedback. The 3D value maps are always updated with latest observations, allowing robot to recover from unexpected errors.

1

2

12

@wenlong_huang

Wenlong Huang

8 days

Project website: Walkthrough video: Paper: Code: Work done together with @chenwang_j , @YunzhuLiYZ , @RuohanZhang76 , and @drfeifei . (N/N)

Tweet card media

GitHub - huangwl18/ReKep

Contribute to huangwl18/ReKep development by creating an account on GitHub.

1

5

18

@wenlong_huang

Wenlong Huang

7 months

For robots to be actually useful for humans, we have to stress test them *in the wild* on real scenarios. Awesome work - congrats!

@Haoyu_Xiong_

Haoyu Xiong

7 months

Introducing Open-World Mobile Manipulation 🦾🌍 – A full-stack approach for operating articulated objects in open-ended unstructured environments: Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪 Opening cabinets, drawers, and refrigerators 🗄️ 👇

30

105

775

0

1

10

@wenlong_huang

Wenlong Huang

3 months

Very interesting to see how easy humanoids can be teleoperated in the wild to perform many manipulation tasks. Bringing robots (neural networks) closer to human embodiment (abundant data sources) is one clear path forward for generalizable robot learning. Congrats!!

@zipengfu

Zipeng Fu

3 months

Introduce HumanPlus - Shadowing part Humanoids are born for using human data. We build a real-time shadowing system using a single RGB camera and a whole-body policy for cloning human motion. Examples: - boxing🥊 - playing the piano🎹/ping pong - tossing - typing Open-sourced!

17

166

770

0

2

11

@wenlong_huang

Wenlong Huang

2 years

PS: I will be joining the same team @GoogleAI as student researcher starting next week, co-hosted by @brian_ichter and @hausman_k . Thrilled for what’s lying ahead!

0

0

11

@wenlong_huang

Wenlong Huang

8 months

Extremely impressive mobile manipulation results! Huge congrats to @zipengfu @tonyzzhao

@tonyzzhao

Tony Z. Zhao

8 months

Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware! A low-cost, open-source, mobile manipulator. One of the most high-effort projects in my past 5yrs! Not possible without co-lead @zipengfu and @chelseabfinn . At the end, what's better than cooking yourself a meal with the 🤖🧑‍🍳

236

1K

5K

1

3

11

@wenlong_huang

Wenlong Huang

2 months

Love the take of using free-form language as a conditioning variable to organize (not just to specify) visual understanding tasks, and that the key is to advance visual capabilities! This is a philosophy that I believe is useful for robotics too

@sainingxie

Saining Xie

2 months

Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]

Tweet media one

17

257

1K

0

0

11

@wenlong_huang

Wenlong Huang

2 years

This proj & other recent proj I contributed to echo “Bitter Lesson 2.0” @hausman_k (). The goal is to offload robot generalization burden to foundation models as much as possible!

Tweet card media

Inner Monologue: Embodied Reasoning through Planning with Language Models.

Project page for Project page for Inner Monologue: Embodied Reasoning through Planning with Language Models

innermonologue.github.io

@hausman_k

Karol Hausman

2 years

Bitter lesson by @RichardSSutton is one of the most insightful essays on AI development of the last decades. Recently, given our progress in robotics, I’ve been trying to predict what the next bitter lesson will be in robotics and how can we prevent it today. Let me explain 🧵

Tweet media one

11

46

368

1

1

9

@wenlong_huang

Wenlong Huang

2 years

Using affordance as GM, LLM can generate plans for “separating vowels from other letters” w/o prompted a list of present objects. We can also include safety and preference GMs, which allow robots to pack picnic boxes with snacks you like w/o accidentally touching a knife.

1

1

9

@wenlong_huang

Wenlong Huang

2 years

To let LLM speak “robot language” that scales & grounds, we look at its most basic functioning unit - tokens. Our formulation decodes likely tokens under both LLM & Grounded Models. GMs reward tokens that respect embodiment, while LLM provides world knowledge & coherent behaviors

Tweet media one

1

1

8

@wenlong_huang

Wenlong Huang

2 years

LLM can also choose when/where it needs grounding! It can generate an open bracket when it’s unsure, which loops in an object detector GM for the rescue. We show this through a grounded chain-of-thought that helps a kitchen robot to handle ambiguous instructions.

1

2

6

@wenlong_huang

Wenlong Huang

2 years

Yet this is only one side of the picture. When LLMs can increasingly speak better “robot language”, how can we develop low-level policies that *fully* understand it and translate it into *physical actions*. Many challenges remain, but I’m optimistic about what will come next!

1

1

6

@wenlong_huang

Wenlong Huang

8 days

By sequencing multiple ReKeps, our framework organically integrates high-level task planning with dense low-level actions as a unified continuous optimization problem. With tracked keypoints, this enables rapid backtracking & replanning behaviors both within/across stages. 4/N

1

0

7

@wenlong_huang

Wenlong Huang

8 days

ReKeps are Python functions mapping kp to costs w/ NumPy operations, specifying relations b/w robot, obj, and obj parts. While each kp has only (x,y,z), multiple kp can specify SO(3) rotations, vectors, surfaces, volumes to capture rich geometric structures in manipulation. 3/N

2

2

10

@wenlong_huang

Wenlong Huang

4 months

And it's not just about discrete symbols (e.g., language), given all the advances in multimodality and video modeling. (shameless plug) Our work VoxPoser from last year was an exploration in this direction: N/N

0

0

3

@wenlong_huang

Wenlong Huang

8 days

ReKep can also be fully automated w/ foundation models for in-the-wild task execution. We use large vision models (SAM+DINOv2) to identify keypoints, overlay on image, and prompt VLM (GPT-4o) to write a seq of ReKep constraints based on the task instruction. 5/N

Tweet media one

1

3

7

@wenlong_huang

Wenlong Huang

4 months

While much of the world is pushing the limit of what robots can achieve autonomously, it's always underrated to tackle generalization brought by web-scale data, if we ever want to deploy robots in the wild. Cool work!

@mangahomanga

Homanga Bharadhwaj

4 months

Track2Act: Our latest on training goal-conditioned policies for diverse manipulation in the real-world. We train a model for embodiment-agnostic point track prediction from web videos combined with embodiment-specific residual policy learning 1/n

2

29

123

1

2

4

@wenlong_huang

Wenlong Huang

8 days

Relational Keypoint Constraints (ReKep) represent tasks as seq of keypoint relations. Eg in pouring task: pull together gripper kp & handle kp -> keep handle & spout kp at same height (avoid spillage) -> align spout & cup kp -> handle & spout kp form a tilting angle to pour. 2/N

Tweet media one

1

1

5

@wenlong_huang

Wenlong Huang

2 years

@pabbeel @Stanford @pathak2206 @IMordatch @StanfordAILab Thank you Pieter!! All of this would not have been possible without your support!

0

0

3

@wenlong_huang

Wenlong Huang

1 year

Thanks Anthony! This is really great and super insightful perspective on the generalization capability of VoxPoser!

@anthonysimeono_

Anthony Simeonov

@anthonysimeono_

1 year

Love this and its connection to motion planning. Planning is so powerful in its generalization across tasks, but linking perception to "plannable" representations (and having *this* link generalize) is still hard. Here's one very compelling way to do it. Great job @wenlong_huang

0

1

11

0

1

3

@wenlong_huang

Wenlong Huang

2 years

Excited to see the Modular RL dataset we developed is being used! What’s even cooler is that modularity is again validated to be a key building block for a generalist agent with different embodiments (and it scales to so many domains!) See our work here:

@GoogleDeepMind

Google DeepMind

@GoogleDeepMind

2 years

Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/

93

1K

5K

0

0

3

@wenlong_huang

Wenlong Huang

2 years

@AlperCanberk1 Good question! GD naturally allows input from any other modalities besides only text, so whenever it's difficult to succinctly describe everything needed as text, GD is likely a more natural way at inference time to ground LLM

0

0

2

@wenlong_huang

Wenlong Huang

2 years

Website: Led by @jackyliang42 w/ @xf1280 Peng Xu @hausman_k @brian_ichter @peteflorence @andyzengtweets

Tweet card media

Code as Policies

Project page for Code as Policies: Language Model Programs for Embodied Control.

code-as-policies.github.io

0

0

2

@wenlong_huang

Wenlong Huang

2 years

@ajayj_ @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Ajay!

0

0

2

@wenlong_huang

Wenlong Huang

8 days

ReKep is implemented upon common packages like SciPy, and the code has also been open-sourced! The code runs on BEHAVIOR (), a large-scale benchmark with diverse scenes and objects, so you can easily try ReKep without setting up a real robot. 7/N

@drfeifei

Fei-Fei Li

6 months

One year ago, we first introduced BEHAVIOR-1K, which we hope will be an important step towards human-centered robotics. After our year-long beta, we’re thrilled to announce its full release, which our team just presented at NVIDIA #GTC2024 . 1/n

7

142

703

1

1

6

@wenlong_huang

Wenlong Huang

2 years

@drfeifei @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab @StanfordSVL Thank you Fei-Fei! Thrilled about meeting everyone at @StanfordSVL and working with you!

0

0

2

@wenlong_huang

Wenlong Huang

10 months

@haosu_twitr Huge congrats!!!

1

0

2

@wenlong_huang

Wenlong Huang

2 years

@micheli_vincent @hausman_k Cool work! We didn't know about this but will contextualize it in the new version 😀

0

0

2

@wenlong_huang

Wenlong Huang

4 months

For example, in what abstraction can these modules ground to low-level behaviors? Clearly the skill-level language abstraction (e.g., pick X & place on Y) is insufficient by definition, but can there be other abstractions where they may? 2/N

1

0

2

@wenlong_huang

Wenlong Huang

8 days

We test ReKep on two setups – in-the-wild and bimanual. It can perform diverse 6-12 DoF tasks w/ a perception-action loop at 10 Hz. It can also fold diff. clothes with diff. (human-like) strategies. And entire pipeline does not require task-specific training or env models. 6/N

1

0

6

@wenlong_huang

Wenlong Huang

1 year

@adcock_brett Thank you Brett!

0

0

0

@wenlong_huang

Wenlong Huang

1 year

Project site: Walkthrough video: Paper: Work done w/ @chenwang_j , @RuohanZhang76 , @YunzhuLiYZ , @jiajunwu_cs , and @drfeifei at @StanfordSVL & @StanfordAILab .

Tweet card media

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with...

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models (CoRL 2023 Oral)Authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, ...

www.youtube.com

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@k_saifullaah @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thank you!

0

0

1

@wenlong_huang

Wenlong Huang

1 year

@pathak2206 @chenwang_j @RuohanZhang76 @YunzhuLiYZ @jiajunwu_cs @drfeifei @StanfordSVL @StanfordAILab Thanks Deepak!!

0

0

1

@wenlong_huang

Wenlong Huang

1 year

@sippeyxp This makes L2R amenable to optimize actions through the simulator (MuJoCo) as a powerful model. However, the requirement is that real2sim system ID is needed for real-world tasks. It would be exciting future direction to think about combining the strength of both works! (2/2)

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@IMordatch @Stanford @pathak2206 @pabbeel @StanfordAILab Thanks Igor! Hope we can collaborate more in the future!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@hausman_k @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Karol! Looking forward to it too!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@shaneguML @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Shane! Looking forward to seeing you too and hope we can collaborate at some point!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@pathak2206 @Stanford @IMordatch @pabbeel @StanfordAILab Thanks Deepak!! There are countless things I've learned from your close mentorship, and I really appreciate for everything!

0

0

1

@wenlong_huang

Wenlong Huang

8 days

@YunfanJiang Thanks Yunfan!!

0

0

1

@wenlong_huang

Wenlong Huang

5 months

@tomssilver Congrats Tom!!!

0

0

1

@wenlong_huang

Wenlong Huang

8 days

@nishanthkumar23 Thanks Nishanth!!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@MishaLaskin @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thank you Misha!!

0

0

1

@wenlong_huang

Wenlong Huang

3 years

@peterjansen_ai @ykilcher This is a cool work! We didn’t know about this, but will add it and contextualize the contributions. On a quick skim, some differences are 1) we look at existing knowledge in LMs w/o any fine-tuning, 2) tasks are not limited to pre-defined templates/categories (e.g. pick & place)

1

0

1

@wenlong_huang

Wenlong Huang

2 years

@siddkaramcheti @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Sidd! Looking forward to collaborating with you!

0

0

1

@wenlong_huang

Wenlong Huang

8 days

@Haoyu_Xiong_ Thanks Haoyu!!

0

0

1

@wenlong_huang

Wenlong Huang

1 year

@sippeyxp Language to reward is a cool work! Both works similarly extracts knowledge from LLMs for real-time behavior synthesis. Key difference is that VoxPoser grounds LLMs to 3D obs space, and L2R defines rewards over known robot/object models. (1/2)

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@DorsaSadigh @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Dorsa! Looking forward to working with you soon!

0

0

1

@wenlong_huang

Wenlong Huang

2 months

@pathak2206 @SkildAI @gupta_abhinav_ @lightspeedvp Congrats Deepak!!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@MishaLaskin Thanks Misha!

0

0

1

@wenlong_huang

Wenlong Huang

2 years

Link to the previous thread 🧵 Website: Paper:

Tweet card media

Inner Monologue: Embodied Reasoning through Planning with Language Models

Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots....

@wenlong_huang

Wenlong Huang

2 years

If we can debug our robots by reasoning, can we use LLMs to emulate such process too? Following up on language planner () & SayCan (), we study how closed-loop feedback enables LLM to correct policy failures in long-horizon tasks🧵👇

0

5

41

0

0

1

@wenlong_huang

Wenlong Huang

3 years

@spencerbreid @ykilcher @pabbeel @pathak2206 @IMordatch Thanks! Glad you liked it!

0

0

1

@wenlong_huang

Wenlong Huang

3 years

@grad_ascent @ak92501 A model trained on datasets particularly for these tasks can certainly do a lot better! But the difficulty is it's hard to acquire a large dataset covering so many tasks humans do in daily lives, and we're interested in whether LLMs already have this knowledge from pretraining

0

0

1

@wenlong_huang

Wenlong Huang

2 years

@shaneguML @GoogleAI @OpenAI @johnschulman2 Congrats Shane! Very exciting time and looking forward to your work there!

0

0

1