An-Chieh Cheng Profile Banner
An-Chieh Cheng Profile
An-Chieh Cheng

@anjjei

491
Followers
314
Following
10
Media
177
Statuses

PhD student @UCSanDiego ; prev intern: @AdobeResearch ; I love 3D vision.

San Diego, CA
Joined October 2010
Don't wanna be here? Send us removal request.
Pinned Tweet
@anjjei
An-Chieh Cheng
4 months
🌟Introducing "🤖SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model" SpatialRGPT is a powerful region-level VLM that can understand both 2D and 3D spatial arrangements. It can process any region proposal (e.g., boxes or masks) and provide
11
104
463
@anjjei
An-Chieh Cheng
5 months
I am attending #ICLR2024 in Vienna to present TUVF. Please feel free to DM me or come to the poster on Tuesday morning to chat about anything! Code: Project page:
Tweet media one
1
10
52
@anjjei
An-Chieh Cheng
4 months
Our demo was recorded with 🐰Bunny-VisionPro, a cool project built by @dngxngxng3 , @jiyuezh , @QinYuzhe . Controlling robots is as easy as playing a VR game with Vision Pro🥽! Check it out at
@anjjei
An-Chieh Cheng
4 months
SpatialRGPT can also serve as a region-aware dense reward annotator for robotics tasks. In a real-robot experiment, SpatialRGPT uses bounding boxes for the fingertip and a green cube to annotate rewards based on the distance between these regions. The results showed that the
2
4
15
1
4
21
@anjjei
An-Chieh Cheng
6 months
Honored to win the @Qualcomm Innovation Fellowship! Grateful to our team, including @jianglong_ye and our advisor @xiaolonw . Looking forward to a year of exciting work! :)
@jianglong_ye
Jianglong Ye
6 months
Excited to share that @anjjei and I have been awarded the 2024 Qualcomm Innovation Fellowship! 🎉🎉 Immense gratitude to our advisor @xiaolonw , for his invaluable guidance. A huge thank you to @Qualcomm for recognizing and supporting our research! 🚀
7
1
52
0
1
19
@anjjei
An-Chieh Cheng
3 months
Not only to see 👁️ but also to feel 🤏! Adding haptic feedback is a game-changer for teleoperation. It boosts confidence and immersion and will be very helpful in conditions like space teleop with high latency.
@dngxngxng3
Runyu Ding
3 months
Introducing Bunny-VisionPro: Our system delivers immersive robot control with both visual and haptic feedback. Using VisionPro and low-cost finger cots with vibration motors, operators can control robots intuitively and immersively, similar to VR gaming.
4
26
100
0
4
19
@anjjei
An-Chieh Cheng
4 months
SpatialRGPT can also serve as a region-aware dense reward annotator for robotics tasks. In a real-robot experiment, SpatialRGPT uses bounding boxes for the fingertip and a green cube to annotate rewards based on the distance between these regions. The results showed that the
2
4
15
@anjjei
An-Chieh Cheng
4 months
Check out our project page: arxiv: 💖 Great thanks to my collaborators, Hongxu Yin @yin_hongxu , Yang Fu @yangfu21 , Qiushan Guo @QiushanGuo_HKU , Ruihan Yang @RchalYang , Jan Kautz @jankautz , Xiaolong Wang @xiaolonw , and Sifei
1
2
11
@anjjei
An-Chieh Cheng
5 months
Imagine this: robots helping with makeup removal or even taking off contact lenses in the future 😮 Sweet dreams ahead! Cool work led by @dngxngxng3 , @QinYuzhe , and Jiyue Zhu.
@xiaolonw
Xiaolong Wang
5 months
Tesla Optimus can arrange batteries in their factories, ours can do skincare (on @QinYuzhe )! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇
10
63
335
0
1
11
@anjjei
An-Chieh Cheng
5 months
Tweet media one
0
0
9
@anjjei
An-Chieh Cheng
4 months
SpatialRGPT enhances geometric reasoning by fusing RGB images with relative depth maps. We process depth maps using the same image encoder and incorporate a depth-to-language module, which is initialized from the RGB module and trained on spatial-related Q&A tasks. This approach
1
1
10
@anjjei
An-Chieh Cheng
2 years
Foundation model as a Teacher 🧑‍🏫
@xiaolonw
Xiaolong Wang
2 years
🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.
3
26
123
0
0
8
@anjjei
An-Chieh Cheng
4 months
Our pipeline is fully automated and only requires RGB images. We collect our dataset using OpenImages, resulting in 8.7 million spatial concepts grounded in 5 million unique regions from 1 million images. (4/n)
Tweet media one
1
0
7
@anjjei
An-Chieh Cheng
6 months
The results look awesome! 🤙Great work by @JitengMu
@JitengMu
Jiteng Mu
6 months
We introduce🌟Editable Image Elements🥳, a new disentangled and controllable latent space for diffusion models, that allows for various image editing operations (e.g., move, resize,  de-occlusion, object removal, variations, composition) More details🧵👇
7
35
211
0
0
7
@anjjei
An-Chieh Cheng
2 years
True adventurer 🐾🐾
@xiaolonw
Xiaolong Wang
2 years
The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control! Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!
5
66
295
0
0
6
@anjjei
An-Chieh Cheng
4 months
We prepare our dataset by creating a 3D scene graph for each image using off-the-shelf vision foundation models. These scene graphs are then converted into region-aware spatial QAs using template-based and LLM-based approaches. Combining these two types of QAs enhances VLMs'
Tweet media one
1
0
6
@anjjei
An-Chieh Cheng
6 months
Dynamic mesh from monocular videos! Unleash so many possibilities🐈🐶🫏 Fantastic work by @Isabella__Liu
@Isabella__Liu
Isabella Liu
6 months
Want to obtain time-consistent dynamic mesh from monocular videos? Introducing: Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos We reconstruct meshes with flexible topology change and build the corresp. across meshes. 🧵(1/n)
8
48
190
1
0
6
@anjjei
An-Chieh Cheng
2 years
Who needs sight when you’ve got a touch? 🤖rotating objects without any visual aids. 🤌🙈
@xiaolonw
Xiaolong Wang
2 years
Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.
2
48
220
0
0
6
@anjjei
An-Chieh Cheng
7 months
Incredible retreat! So much fun🥳
@xiaolonw
Xiaolong Wang
7 months
Spring break group retreat again.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
2
92
0
0
6
@anjjei
An-Chieh Cheng
2 years
🫠🫠
@_akhaliq
AK
2 years
Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields abs: project page:
98
628
4K
0
0
5
@anjjei
An-Chieh Cheng
4 months
SpatialRGPT excels in complex spatial reasoning! It demonstrates robust spatial knowledge and effectively generalizes this knowledge to enhance its language reasoning abilities. (6/n)
1
0
6
@anjjei
An-Chieh Cheng
2 years
3D pre-training for 🤖
@ZeYanjie
Yanjie Ze
2 years
How about 3D pre-training for motor control? We use Video Autoencoder to learn *3D* self-supervised representations from large-scale videos for RL. Such 3D representations learn faster and transfer better sim-to-real than 2D pre-training such as MoCo.
3
19
68
0
0
4
@anjjei
An-Chieh Cheng
4 months
We propose a benchmark (SpatialRGPT-Bench) for 3D spatial cognition in VLMs.  SpatialRGPT-Bench is a VQA benchmark with ground-truth 3D annotations encompassing urban (nuScenes, KITTI), indoor (SUNRGBD, ARKitScenes), and simulated (Hypersim) environments, covering 88 distinct
Tweet media one
1
0
5
@anjjei
An-Chieh Cheng
5 months
Love this use case
@OpenAI
OpenAI
5 months
@BeMyEyes with GPT-4o
92
764
5K
0
0
3
@anjjei
An-Chieh Cheng
4 months
@MuCai7 Thanks for pointing this out and apologies for missing the reference! We'll include a reference to your work in the updated version. Congrats!
0
0
2
@anjjei
An-Chieh Cheng
5 years
Awesome!
@hardmaru
hardmaru
5 years
Exploration via Flow-Based Intrinsic Rewards They incorporate optical flow estimation from the computer vision into RL domain and use the errors from optical flow estimation to evaluate novelty of new observations. SuperMario demo:
0
29
126
0
0
2
@anjjei
An-Chieh Cheng
4 months
@andrew_n_carr Thank you! Appreciate it.
0
0
2
@anjjei
An-Chieh Cheng
3 months
@JiaweiYang118 Congrats Jiawei! Amazing work. The visualizations are just as impressive as your past work!
1
0
2
@anjjei
An-Chieh Cheng
4 months
@ZevRekhter Yes, it only takes an RGB image as input.
1
0
1
@anjjei
An-Chieh Cheng
4 months
@YungHsuYang Thank you Roy!
0
0
1
@anjjei
An-Chieh Cheng
4 months
@laion_ai Thank you! Should be around late June.
1
0
1
@anjjei
An-Chieh Cheng
4 months
@pogosinho Yes, given an RGB image it can determine the size to inch level.
1
0
1