Wenlong Huang Profile Banner
Wenlong Huang Profile
Wenlong Huang

@wenlong_huang

2,907
Followers
938
Following
22
Media
381
Statuses

PhD Student @StanfordSVL @StanfordAILab . Previously @Berkeley_AI @GoogleDeepMind . Robotics, Foundation Models.

Stanford, CA
Joined May 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@wenlong_huang
Wenlong Huang
8 days
What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇
12
87
448
@wenlong_huang
Wenlong Huang
1 year
How to harness foundation models for *generalization in the wild* in robot manipulation? Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world! 🌐 🧵👇
10
143
584
@wenlong_huang
Wenlong Huang
2 years
Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”? Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: 🧵👇
6
85
455
@wenlong_huang
Wenlong Huang
2 years
Thrilled to announce that I will join @Stanford for my PhD! Extremely grateful to @pathak2206 @IMordatch @pabbeel for years of amazing mentorship and Zhuowen Tu for introducing me to AI research. Looking forward to tackling interesting problems in robotics and AI @StanfordAILab !
18
7
299
@wenlong_huang
Wenlong Huang
4 months
Very well-written thread about LLM in robotics! My 2 cents is: robotics requires a full-stack approach - whether it's symbolic or LLM or hybrid planners, one has to think about the abstractions they operate in, especially pertaining to closely-tied perception-action loops. 1/N
@chris_j_paxton
Chris Paxton
4 months
One of the most interesting questions to me right now is: can LLMs plan, why/why not, and to what extent do we care about this, especially as it pertains to robotics?
20
33
261
1
17
84
@wenlong_huang
Wenlong Huang
3 years
Really enjoyed the interview with Yannic -- he had many interesting and insightful questions! I've also been a big fan of his channel! Project Website: w/ @pabbeel @pathak2206 @IMordatch
@ykilcher
Yannic Kilcher 🇸🇨
3 years
GPT-3 "knows" so much about the world, but how can we get that knowledge into a structured and usable form? Today's Video: Language Models as Zero-Shot Planners w/ first author Wenlong Huang ( @wenlong_huang )! Super interesting & many potential use cases 💪
Tweet media one
3
22
109
1
8
51
@wenlong_huang
Wenlong Huang
1 year
Thanks @_akhaliq for sharing! The full thread can be found here:
@_akhaliq
AK
1 year
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models paper page: Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and
1
22
85
2
7
43
@wenlong_huang
Wenlong Huang
2 years
If we can debug our robots by reasoning, can we use LLMs to emulate such process too? Following up on language planner () & SayCan (), we study how closed-loop feedback enables LLM to correct policy failures in long-horizon tasks🧵👇
@hausman_k
Karol Hausman
2 years
Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too! Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇
24
167
895
0
5
41
@wenlong_huang
Wenlong Huang
8 months
Been thinking about this for a while: - CLIP finds language-conditioned features but they're also bottlenecked by language - DINO attends to rich visual details but lacks semantics - An image is worth a thouand words. We need MLLMs w/ better vision Really great work - congrats!
@sainingxie
Saining Xie
8 months
Multimodal LLMs have been shown to err in complex, OOD, and edge-case scenarios. Yet, we have identified a systematic method for pinpointing visual errors in these models even when they are posed with *very basic* questions, using just common images from ImageNet and LAION. 🧵
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
70
368
1
2
42
@wenlong_huang
Wenlong Huang
1 year
Very excited to see Code as Policies got Outstanding Paper Award in Robot Learning #ICRA2023 !! Big congrats to Jacky!!
@jackyliang42
Jacky Liang
1 year
Super happy to share Code as Policies received the Outstanding Paper Award in Robot Learning #ICRA2023 @ieee_ras_icra !! Big thank yous to my collaborators @wenlong_huang @xf1280 @sippeyxp @hausman_k @brian_ichter @peteflorence @andyzengtweets y'all rock 🎉🎉
Tweet media one
11
16
178
2
3
38
@wenlong_huang
Wenlong Huang
1 year
Glad to see Inner Monologue () is running behind the scene for every Bing query 😄. A LLM agent that not only engages in a dialogue between the user and the Internet, but also importantly, itself!
@StudentInfosec
tuneworm (Joaquin Castellano)
1 year
After interpreting the message, Bing runs an internal command called #inner_monologue . In here it decides on the language for the message, and how to generate its response — whether it’s necessary to perform a web search, or if it should provide product ads
Tweet media one
1
4
59
0
3
34
@wenlong_huang
Wenlong Huang
2 years
Excited to see Inner Monologue is covered by @twominutepapers ! Using language as common interface, we show how human and different robot modules can talk to each other, enabling closed-loop planning. This was done during my internship with an amazing team at Google Robotics!
@twominutepapers
Two Minute Papers
2 years
New Video - Google’s New Robot: Your Personal Assistant! 🤖
1
2
16
1
4
25
@wenlong_huang
Wenlong Huang
2 years
Beyond task planning, can LLMs generate robot policy code that exhibits spatial-geometric reasoning ("draw 5cm hexagon around apple"), and leverages code logic ("go in a 1.5m square until you see a coke"), all given a language instruction and without any additional training? 🧵👇
@jackyliang42
Jacky Liang
2 years
How can robots perform a wide variety of novel tasks from natural language? Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at long 🧵👇
17
148
666
1
3
25
@wenlong_huang
Wenlong Huang
4 months
Very incredible to see how capable VR-controlled robot hands can be. While there is a lot of debate on grippers vs hands, why not think of hands as just grippers that offer more redundancy and stability? Congrats on the great work!
@ToruO_O
Toru
4 months
Imitation learning works™ – but you need good data 🥹 How to get high-quality visuotactile demos from a bimanual robot with multifingered hands, and learn smooth policies? Check our new work “Learning Visuotactile Skills with Two Multifingered Hands”! 🙌
7
75
280
1
2
23
@wenlong_huang
Wenlong Huang
3 years
Excited to share the fun project I've been working on! We explore actionable knowledge contained in GPT-3/Codex. A super early but promising step towards realizing intelligent robots that perform complex human activities! Project Website:
@pathak2206
Deepak Pathak
3 years
LLMs like GPT-3 and Codex contain rich world knowledge. In this fun study, we ask if GPT like models can plan actions for embodied agents. Turns out, with apt sanity checks, even vanilla LLMs without any finetuning can generate good high-level plans given a low-level controller.
10
164
992
0
0
22
@wenlong_huang
Wenlong Huang
2 years
Very excited to see a further step in using LLMs for long-horizon planning with real-world robots! Our previous work in this direction:
@hausman_k
Karol Hausman
2 years
Super excited to introduce SayCan (): 1st publication of a large effort we've been working on for 1+ years Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions
19
287
1K
1
1
21
@wenlong_huang
Wenlong Huang
1 year
Data is key for generalization, but robot data is scarce and expensive. Instead of training policies on labeled data, VoxPoser uses LLM+VLM to compose 3D value maps using generated code. Then 6-DoF actions are synthesized by motion planners, all w/o any training or primitives.
Tweet media one
1
4
21
@wenlong_huang
Wenlong Huang
1 year
Given free-form instructions + RGB-D obs, LLM orchestrates perception calls to VLM and array operations to assign continuous values to voxel map, showing *where to act* and *how to act*. It also parametrizes rotation, velocity, and gripper actions for a complete SE(3) trajectory!
1
2
20
@wenlong_huang
Wenlong Huang
2 years
Super exciting project led by @DannyDriess - a 562B embodied multimodal language model, trained to be grounded! Important signal that language is a universal generalization interface, across text, vision, and robot planning 📚🌁🤖 Check out the deep dive by @DannyDriess 🧵👇
@DannyDriess
Danny Driess
2 years
What happens when we train the largest vision-language model and add in robot experiences? The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website:
32
522
2K
0
0
18
@wenlong_huang
Wenlong Huang
1 year
Extremely impressive work pushing the boundary of what robot hands can achieve for human-like long-horizon tasks, especially cool when seeing it in the real world!
@chenwang_j
Chen Wang
1 year
How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks? Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure. Introducing our new work - Sequential Dexterity 🧵👇
26
91
470
1
3
18
@wenlong_huang
Wenlong Huang
2 years
Prior work SayCan () grounds LLM for robots using affordances. But instead of speaking with full vocab, LLM only ranks across pre-set skills. Imagine scratching your head through 700+ choices or O(billions) of natural language choices. How can we do better?
@hausman_k
Karol Hausman
2 years
Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate! Generalizes to new tasks✅ Robust to new environments and objects✅ Fast inference for real time control✅ Can absorb multi-robot data✅ Powers SayCan✅ 🧵👇
62
549
2K
1
1
16
@wenlong_huang
Wenlong Huang
4 months
Simulation is a scalable data source that fuels progress in contact-rich manipulation, but sim2real is hard due to contact-modeling, perception, etc - great work by @YunfanJiang shows how this can be transformed seamlessly into an imitation learning problem to tackle them all!
@YunfanJiang
Yunfan Jiang
4 months
Does your sim2real robot falter at critical moments 🤯? Want to help but unsure how, all you can do is reward tuning in sim 😮‍💨? Introduce 𝐓𝐑𝐀𝐍𝐒𝐈𝐂 for manipulation sim2real. Robots learned in sim can accomplish complex tasks in real, such as furniture assembly. 🤿🧵
16
44
187
1
2
15
@wenlong_huang
Wenlong Huang
7 months
Reducing the need for in-the-wild robots for data collection is critical for breaking free of the chicken & egg dilemma *before* we can actually deploy general-purpose robots. Amazing work!
@chichengcc
Cheng Chi
7 months
Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
44
353
2K
1
2
13
@wenlong_huang
Wenlong Huang
6 months
One thing I particularly like about using foundation models for robotics is the *in-the-wild generalization* they provide. Very exciting to see what VLMs can offer here!
@fangchenliu_
Fangchen Liu
6 months
Can we leverage VLMs for robot manipulation in the open world? Checkout our new work MOKA, a simple and effective visual prompting method!
12
43
206
0
0
14
@wenlong_huang
Wenlong Huang
1 year
LLMs show emergent abilities at scale – same applies to VoxPoser, but on physical behaviors! It can conduct physics experiments, have behavioral commonsense, listen to your fine-grained correction, come up with multi-step visual program, and more.
Tweet media one
2
2
14
@wenlong_huang
Wenlong Huang
1 year
We verified VoxPoser in everyday manipulation tasks in the wild, including articulated and deformable object manipulation. All the results here are synthesized with zero-shot execution.
2
3
13
@wenlong_huang
Wenlong Huang
6 months
An incredible project led by @chenwang_j that makes collecting robotic data in the wild as seamless as a breeze - not only for simple tasks but also for those everyday tasks requiring human-level dexterity, coordination, and precision!
@chenwang_j
Chen Wang
6 months
Can we use wearable devices to collect robot data without actual robots? Yes! With a pair of gloves🧤! Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands Everything open-sourced
22
136
620
1
0
13
@wenlong_huang
Wenlong Huang
1 year
Just toss your objects too! VoxPoser is robust to disturbances because it replans actions in *real-time* with visual feedback. The 3D value maps are always updated with latest observations, allowing robot to recover from unexpected errors.
1
2
12
@wenlong_huang
Wenlong Huang
7 months
For robots to be actually useful for humans, we have to stress test them *in the wild* on real scenarios. Awesome work - congrats!
@Haoyu_Xiong_
Haoyu Xiong
7 months
Introducing Open-World Mobile Manipulation 🦾🌍 – A full-stack approach for operating articulated objects in open-ended unstructured environments: Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪 Opening cabinets, drawers, and refrigerators 🗄️ 👇
30
105
775
0
1
10
@wenlong_huang
Wenlong Huang
3 months
Very interesting to see how easy humanoids can be teleoperated in the wild to perform many manipulation tasks. Bringing robots (neural networks) closer to human embodiment (abundant data sources) is one clear path forward for generalizable robot learning. Congrats!!
@zipengfu
Zipeng Fu
3 months
Introduce HumanPlus - Shadowing part Humanoids are born for using human data. We build a real-time shadowing system using a single RGB camera and a whole-body policy for cloning human motion. Examples: - boxing🥊 - playing the piano🎹/ping pong - tossing - typing Open-sourced!
17
166
770
0
2
11
@wenlong_huang
Wenlong Huang
2 years
PS: I will be joining the same team @GoogleAI as student researcher starting next week, co-hosted by @brian_ichter and @hausman_k . Thrilled for what’s lying ahead!
0
0
11
@wenlong_huang
Wenlong Huang
8 months
Extremely impressive mobile manipulation results! Huge congrats to @zipengfu @tonyzzhao
@tonyzzhao
Tony Z. Zhao
8 months
Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware! A low-cost, open-source, mobile manipulator. One of the most high-effort projects in my past 5yrs! Not possible without co-lead @zipengfu and @chelseabfinn . At the end, what's better than cooking yourself a meal with the 🤖🧑‍🍳
236
1K
5K
1
3
11
@wenlong_huang
Wenlong Huang
2 months
Love the take of using free-form language as a conditioning variable to organize (not just to specify) visual understanding tasks, and that the key is to advance visual capabilities! This is a philosophy that I believe is useful for robotics too
@sainingxie
Saining Xie
2 months
Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]
Tweet media one
17
257
1K
0
0
11
@wenlong_huang
Wenlong Huang
2 years
This proj & other recent proj I contributed to echo “Bitter Lesson 2.0” @hausman_k (). The goal is to offload robot generalization burden to foundation models as much as possible!
@hausman_k
Karol Hausman
2 years
Bitter lesson by @RichardSSutton is one of the most insightful essays on AI development of the last decades. Recently, given our progress in robotics, I’ve been trying to predict what the next bitter lesson will be in robotics and how can we prevent it today. Let me explain 🧵
Tweet media one
11
46
368
1
1
9
@wenlong_huang
Wenlong Huang
2 years
Using affordance as GM, LLM can generate plans for “separating vowels from other letters” w/o prompted a list of present objects. We can also include safety and preference GMs, which allow robots to pack picnic boxes with snacks you like w/o accidentally touching a knife.
1
1
9
@wenlong_huang
Wenlong Huang
2 years
To let LLM speak “robot language” that scales & grounds, we look at its most basic functioning unit - tokens. Our formulation decodes likely tokens under both LLM & Grounded Models. GMs reward tokens that respect embodiment, while LLM provides world knowledge & coherent behaviors
Tweet media one
1
1
8
@wenlong_huang
Wenlong Huang
2 years
LLM can also choose when/where it needs grounding! It can generate an open bracket when it’s unsure, which loops in an object detector GM for the rescue. We show this through a grounded chain-of-thought that helps a kitchen robot to handle ambiguous instructions.
1
2
6
@wenlong_huang
Wenlong Huang
2 years
Yet this is only one side of the picture. When LLMs can increasingly speak better “robot language”, how can we develop low-level policies that *fully* understand it and translate it into *physical actions*. Many challenges remain, but I’m optimistic about what will come next!
1
1
6
@wenlong_huang
Wenlong Huang
8 days
By sequencing multiple ReKeps, our framework organically integrates high-level task planning with dense low-level actions as a unified continuous optimization problem. With tracked keypoints, this enables rapid backtracking & replanning behaviors both within/across stages. 4/N
1
0
7
@wenlong_huang
Wenlong Huang
8 days
ReKeps are Python functions mapping kp to costs w/ NumPy operations, specifying relations b/w robot, obj, and obj parts. While each kp has only (x,y,z), multiple kp can specify SO(3) rotations, vectors, surfaces, volumes to capture rich geometric structures in manipulation. 3/N
2
2
10
@wenlong_huang
Wenlong Huang
4 months
And it's not just about discrete symbols (e.g., language), given all the advances in multimodality and video modeling. (shameless plug) Our work VoxPoser from last year was an exploration in this direction: N/N
0
0
3
@wenlong_huang
Wenlong Huang
8 days
ReKep can also be fully automated w/ foundation models for in-the-wild task execution. We use large vision models (SAM+DINOv2) to identify keypoints, overlay on image, and prompt VLM (GPT-4o) to write a seq of ReKep constraints based on the task instruction. 5/N
Tweet media one
1
3
7
@wenlong_huang
Wenlong Huang
4 months
While much of the world is pushing the limit of what robots can achieve autonomously, it's always underrated to tackle generalization brought by web-scale data, if we ever want to deploy robots in the wild. Cool work!
@mangahomanga
Homanga Bharadhwaj
4 months
Track2Act: Our latest on training goal-conditioned policies for diverse manipulation in the real-world. We train a model for embodiment-agnostic point track prediction from web videos combined with embodiment-specific residual policy learning 1/n
2
29
123
1
2
4
@wenlong_huang
Wenlong Huang
8 days
Relational Keypoint Constraints (ReKep) represent tasks as seq of keypoint relations. Eg in pouring task: pull together gripper kp & handle kp -> keep handle & spout kp at same height (avoid spillage) -> align spout & cup kp -> handle & spout kp form a tilting angle to pour. 2/N
Tweet media one
1
1
5
@wenlong_huang
Wenlong Huang
2 years
@pabbeel @Stanford @pathak2206 @IMordatch @StanfordAILab Thank you Pieter!! All of this would not have been possible without your support!
0
0
3
@wenlong_huang
Wenlong Huang
1 year
Thanks Anthony! This is really great and super insightful perspective on the generalization capability of VoxPoser!
@anthonysimeono_
Anthony Simeonov
1 year
Love this and its connection to motion planning. Planning is so powerful in its generalization across tasks, but linking perception to "plannable" representations (and having *this* link generalize) is still hard. Here's one very compelling way to do it. Great job @wenlong_huang
0
1
11
0
1
3
@wenlong_huang
Wenlong Huang
2 years
Excited to see the Modular RL dataset we developed is being used! What’s even cooler is that modularity is again validated to be a key building block for a generalist agent with different embodiments (and it scales to so many domains!) See our work here:
@GoogleDeepMind
Google DeepMind
2 years
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/
93
1K
5K
0
0
3
@wenlong_huang
Wenlong Huang
2 years
@AlperCanberk1 Good question! GD naturally allows input from any other modalities besides only text, so whenever it's difficult to succinctly describe everything needed as text, GD is likely a more natural way at inference time to ground LLM
0
0
2
@wenlong_huang
Wenlong Huang
8 days
ReKep is implemented upon common packages like SciPy, and the code has also been open-sourced! The code runs on BEHAVIOR (), a large-scale benchmark with diverse scenes and objects, so you can easily try ReKep without setting up a real robot. 7/N
@drfeifei
Fei-Fei Li
6 months
One year ago, we first introduced BEHAVIOR-1K, which we hope will be an important step towards human-centered robotics. After our year-long beta, we’re thrilled to announce its full release, which our team just presented at NVIDIA #GTC2024 . 1/n
7
142
703
1
1
6
@wenlong_huang
Wenlong Huang
2 years
@drfeifei @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab @StanfordSVL Thank you Fei-Fei! Thrilled about meeting everyone at @StanfordSVL and working with you!
0
0
2
@wenlong_huang
Wenlong Huang
10 months
@haosu_twitr Huge congrats!!!
1
0
2
@wenlong_huang
Wenlong Huang
2 years
@micheli_vincent @hausman_k Cool work! We didn't know about this but will contextualize it in the new version 😀
0
0
2
@wenlong_huang
Wenlong Huang
4 months
For example, in what abstraction can these modules ground to low-level behaviors? Clearly the skill-level language abstraction (e.g., pick X & place on Y) is insufficient by definition, but can there be other abstractions where they may? 2/N
1
0
2
@wenlong_huang
Wenlong Huang
8 days
We test ReKep on two setups – in-the-wild and bimanual. It can perform diverse 6-12 DoF tasks w/ a perception-action loop at 10 Hz. It can also fold diff. clothes with diff. (human-like) strategies. And entire pipeline does not require task-specific training or env models. 6/N
1
0
6
@wenlong_huang
Wenlong Huang
1 year
@adcock_brett Thank you Brett!
0
0
0
@wenlong_huang
Wenlong Huang
1 year
@sippeyxp This makes L2R amenable to optimize actions through the simulator (MuJoCo) as a powerful model. However, the requirement is that real2sim system ID is needed for real-world tasks. It would be exciting future direction to think about combining the strength of both works! (2/2)
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@IMordatch @Stanford @pathak2206 @pabbeel @StanfordAILab Thanks Igor! Hope we can collaborate more in the future!
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@shaneguML @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Shane! Looking forward to seeing you too and hope we can collaborate at some point!
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@pathak2206 @Stanford @IMordatch @pabbeel @StanfordAILab Thanks Deepak!! There are countless things I've learned from your close mentorship, and I really appreciate for everything!
0
0
1
@wenlong_huang
Wenlong Huang
8 days
@YunfanJiang Thanks Yunfan!!
0
0
1
@wenlong_huang
Wenlong Huang
5 months
@tomssilver Congrats Tom!!!
0
0
1
@wenlong_huang
Wenlong Huang
8 days
@nishanthkumar23 Thanks Nishanth!!
0
0
1
@wenlong_huang
Wenlong Huang
3 years
@peterjansen_ai @ykilcher This is a cool work! We didn’t know about this, but will add it and contextualize the contributions. On a quick skim, some differences are 1) we look at existing knowledge in LMs w/o any fine-tuning, 2) tasks are not limited to pre-defined templates/categories (e.g. pick & place)
1
0
1
@wenlong_huang
Wenlong Huang
2 years
0
0
1
@wenlong_huang
Wenlong Huang
8 days
@Haoyu_Xiong_ Thanks Haoyu!!
0
0
1
@wenlong_huang
Wenlong Huang
1 year
@sippeyxp Language to reward is a cool work! Both works similarly extracts knowledge from LLMs for real-time behavior synthesis. Key difference is that VoxPoser grounds LLMs to 3D obs space, and L2R defines rewards over known robot/object models. (1/2)
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@DorsaSadigh @Stanford @pathak2206 @IMordatch @pabbeel @StanfordAILab Thanks Dorsa! Looking forward to working with you soon!
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@MishaLaskin Thanks Misha!
0
0
1
@wenlong_huang
Wenlong Huang
2 years
Link to the previous thread 🧵 Website: Paper:
@wenlong_huang
Wenlong Huang
2 years
If we can debug our robots by reasoning, can we use LLMs to emulate such process too? Following up on language planner () & SayCan (), we study how closed-loop feedback enables LLM to correct policy failures in long-horizon tasks🧵👇
0
5
41
0
0
1
@wenlong_huang
Wenlong Huang
3 years
@grad_ascent @ak92501 A model trained on datasets particularly for these tasks can certainly do a lot better! But the difficulty is it's hard to acquire a large dataset covering so many tasks humans do in daily lives, and we're interested in whether LLMs already have this knowledge from pretraining
0
0
1
@wenlong_huang
Wenlong Huang
2 years
@shaneguML @GoogleAI @OpenAI @johnschulman2 Congrats Shane! Very exciting time and looking forward to your work there!
0
0
1