An-Chieh Cheng @anjjei Twitter profile | Pikagi

Pikagi

An-Chieh Cheng

@anjjei

491

Followers

314

Following

10

Media

177

Statuses

PhD student @UCSanDiego ; prev intern: @AdobeResearch ; I love 3D vision.

San Diego, CA

https://t.co/qGbq1RMLmD

Joined October 2010

Don't wanna be here? Send us removal request.

Pinned Tweet

@anjjei

An-Chieh Cheng

4 months

🌟Introducing "🤖SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model" SpatialRGPT is a powerful region-level VLM that can understand both 2D and 3D spatial arrangements. It can process any region proposal (e.g., boxes or masks) and provide

11

104

463

Last Seen Profiles

@jandakembangstw

@stw_pdg

@RicardSales

@uhbroncofan

@bilbreycrystal5

@sogolhsi

@stowtownfc

@yarudo_ritsu

@MarineCornelis

@TheBoysClubWolf

@bokeplokalmalam

@chandraWumbara

@revolution_coin

@Doctors_Vote

@pemburu_21

@themumzels

@ryu_yeon_

@stw_pdg

@AubreyVare72180

@stwmaniax

@dnjstndspwnr

@ThyagiR

@bokeplokalmalam

@aoimiku0505

@dadaquangreen81

@tm_madsen

@IamEluruSreenu

@list_energy

@MrMartinNTHS

@stw_pdg

@mthrrh_ymn80073

@stwmaniax

@t00l4i

@aoimiku0505

@tibinos

@ibubohay2

@anjjei

An-Chieh Cheng

5 months

I am attending #ICLR2024 in Vienna to present TUVF. Please feel free to DM me or come to the poster on Tuesday morning to chat about anything! Code: Project page:

Tweet media one

1

10

52

@anjjei

An-Chieh Cheng

4 months

Our demo was recorded with 🐰Bunny-VisionPro, a cool project built by @dngxngxng3 , @jiyuezh , @QinYuzhe . Controlling robots is as easy as playing a VR game with Vision Pro🥽! Check it out at

@anjjei

An-Chieh Cheng

4 months

SpatialRGPT can also serve as a region-aware dense reward annotator for robotics tasks. In a real-robot experiment, SpatialRGPT uses bounding boxes for the fingertip and a green cube to annotate rewards based on the distance between these regions. The results showed that the

2

4

15

1

4

21

@anjjei

An-Chieh Cheng

6 months

Honored to win the @Qualcomm Innovation Fellowship! Grateful to our team, including @jianglong_ye and our advisor @xiaolonw . Looking forward to a year of exciting work! :)

@jianglong_ye

Jianglong Ye

6 months

Excited to share that @anjjei and I have been awarded the 2024 Qualcomm Innovation Fellowship! 🎉🎉 Immense gratitude to our advisor @xiaolonw , for his invaluable guidance. A huge thank you to @Qualcomm for recognizing and supporting our research! 🚀

7

1

52

0

1

19

@anjjei

An-Chieh Cheng

3 months

Not only to see 👁️ but also to feel 🤏! Adding haptic feedback is a game-changer for teleoperation. It boosts confidence and immersion and will be very helpful in conditions like space teleop with high latency.

@dngxngxng3

Runyu Ding

3 months

Introducing Bunny-VisionPro: Our system delivers immersive robot control with both visual and haptic feedback. Using VisionPro and low-cost finger cots with vibration motors, operators can control robots intuitively and immersively, similar to VR gaming.

4

26

100

0

4

19

@anjjei

An-Chieh Cheng

4 months

SpatialRGPT can also serve as a region-aware dense reward annotator for robotics tasks. In a real-robot experiment, SpatialRGPT uses bounding boxes for the fingertip and a green cube to annotate rewards based on the distance between these regions. The results showed that the

2

4

15

@anjjei

An-Chieh Cheng

4 months

Check out our project page: arxiv: 💖 Great thanks to my collaborators, Hongxu Yin @yin_hongxu , Yang Fu @yangfu21 , Qiushan Guo @QiushanGuo_HKU , Ruihan Yang @RchalYang , Jan Kautz @jankautz , Xiaolong Wang @xiaolonw , and Sifei

Tweet card media

SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models

www.anjiecheng.me

1

2

11

@anjjei

An-Chieh Cheng

5 months

Imagine this: robots helping with makeup removal or even taking off contact lenses in the future 😮 Sweet dreams ahead! Cool work led by @dngxngxng3 , @QinYuzhe , and Jiyue Zhu.

@xiaolonw

Xiaolong Wang

5 months

Tesla Optimus can arrange batteries in their factories, ours can do skincare (on @QinYuzhe )! We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇

10

63

335

0

1

11

@anjjei

An-Chieh Cheng

5 months

Tweet media one

0

0

9

@anjjei

An-Chieh Cheng

4 months

SpatialRGPT enhances geometric reasoning by fusing RGB images with relative depth maps. We process depth maps using the same image encoder and incorporate a depth-to-language module, which is initialized from the RGB module and trained on spatial-related Q&A tasks. This approach

1

1

10

@anjjei

An-Chieh Cheng

2 years

Foundation model as a Teacher 🧑‍🏫

@xiaolonw

Xiaolong Wang

2 years

🏗️ Policy Adaptation from Foundation Model Feedback #CVPR2023 Instead of using foundation model as a pre-trained encoder (generator), we use it as a Teacher (discriminator) to tell where our policy did wrong and helps it adapts to new envs and tasks.

3

26

123

0

0

8

@anjjei

An-Chieh Cheng

4 months

Our pipeline is fully automated and only requires RGB images. We collect our dataset using OpenImages, resulting in 8.7 million spatial concepts grounded in 5 million unique regions from 1 million images. (4/n)

Tweet media one

1

0

7

@anjjei

An-Chieh Cheng

6 months

The results look awesome! 🤙Great work by @JitengMu

@JitengMu

Jiteng Mu

6 months

We introduce🌟Editable Image Elements🥳, a new disentangled and controllable latent space for diffusion models, that allows for various image editing operations (e.g., move, resize, de-occlusion, object removal, variations, composition) More details🧵👇

7

35

211

0

0

7

@anjjei

An-Chieh Cheng

2 years

True adventurer 🐾🐾

@xiaolonw

Xiaolong Wang

2 years

The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control! Our #CVPR2023 Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!

5

66

295

0

0

6

@anjjei

An-Chieh Cheng

4 months

We prepare our dataset by creating a 3D scene graph for each image using off-the-shelf vision foundation models. These scene graphs are then converted into region-aware spatial QAs using template-based and LLM-based approaches. Combining these two types of QAs enhances VLMs'

Tweet media one

1

0

6

@anjjei

An-Chieh Cheng

6 months

Dynamic mesh from monocular videos! Unleash so many possibilities🐈🐶🫏 Fantastic work by @Isabella__Liu

@Isabella__Liu

Isabella Liu

6 months

Want to obtain time-consistent dynamic mesh from monocular videos? Introducing: Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos We reconstruct meshes with flexible topology change and build the corresp. across meshes. 🧵(1/n)

8

48

190

1

0

6

@anjjei

An-Chieh Cheng

6 years

@paperswithcode made a PyTorch implementation ported from terrychenism/OctaveConv

GitHub - AnjieCheng/OctaveConv-Pytorch: An PyTorch implementation of Drop an Octave: Reducing...

An PyTorch implementation of Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution - GitHub - AnjieCheng/OctaveConv-Pytorch: An PyTorch implementatio...

1

1

3

@anjjei

An-Chieh Cheng

2 years

Who needs sight when you’ve got a touch? 🤖rotating objects without any visual aids. 🤌🙈

@xiaolonw

Xiaolong Wang

2 years

Imagine if you have an object in hand, you can rotate the object by feeling without even looking. This is what we enable the robot to do now: Rotating without Seeing. Our multi-finger robot hand learns to rotate diverse objects using only touch sensing.

2

48

220

0

0

6

@anjjei

An-Chieh Cheng

7 months

Incredible retreat! So much fun🥳

@xiaolonw

Xiaolong Wang

7 months

Spring break group retreat again.

Tweet media one

Tweet media two

Tweet media three

Tweet media four

6

2

92

0

0

6

@anjjei

An-Chieh Cheng

2 years

🫠🫠

@_akhaliq

AK

2 years

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields abs: project page:

98

628

4K

0

0

5

@anjjei

An-Chieh Cheng

4 months

SpatialRGPT excels in complex spatial reasoning! It demonstrates robust spatial knowledge and effectively generalizes this knowledge to enhance its language reasoning abilities. (6/n)

1

0

6

@anjjei

An-Chieh Cheng

2 years

3D pre-training for 🤖

@ZeYanjie

Yanjie Ze

2 years

How about 3D pre-training for motor control? We use Video Autoencoder to learn *3D* self-supervised representations from large-scale videos for RL. Such 3D representations learn faster and transfer better sim-to-real than 2D pre-training such as MoCo.

3

19

68

0

0

4

@anjjei

An-Chieh Cheng

4 months

We propose a benchmark (SpatialRGPT-Bench) for 3D spatial cognition in VLMs. SpatialRGPT-Bench is a VQA benchmark with ground-truth 3D annotations encompassing urban (nuScenes, KITTI), indoor (SUNRGBD, ARKitScenes), and simulated (Hypersim) environments, covering 88 distinct

Tweet media one

1

0

5

@anjjei

An-Chieh Cheng

5 months

Love this use case

@OpenAI

OpenAI

5 months

@BeMyEyes with GPT-4o

92

764

5K

0

0

3

@anjjei

An-Chieh Cheng

4 months

@MuCai7 Thanks for pointing this out and apologies for missing the reference! We'll include a reference to your work in the updated version. Congrats!

0

0

2

@anjjei

An-Chieh Cheng

5 years

Awesome!

@hardmaru

hardmaru

5 years

Exploration via Flow-Based Intrinsic Rewards They incorporate optical flow estimation from the computer vision into RL domain and use the errors from optical flow estimation to evaluate novelty of new observations. SuperMario demo:

0

29

126

0

0

2

@anjjei

An-Chieh Cheng

4 months

@andrew_n_carr Thank you! Appreciate it.

0

0

2

@anjjei

An-Chieh Cheng

3 months

@JiaweiYang118 Congrats Jiawei! Amazing work. The visualizations are just as impressive as your past work!

1

0

2

@anjjei

An-Chieh Cheng

4 months

@ZevRekhter Yes, it only takes an RGB image as input.

1

0

1

@anjjei

An-Chieh Cheng

4 months

@TonyD993 @yin_hongxu @yangfu21 @QiushanGuo_HKU @RchalYang @jankautz @xiaolonw Thank you! Yes, it will be open source.

0

0

1

@anjjei

An-Chieh Cheng

4 months

@YungHsuYang Thank you Roy!

0

0

1

@anjjei

An-Chieh Cheng

4 months

@laion_ai Thank you! Should be around late June.

1

0

1

@anjjei

An-Chieh Cheng

4 months

@pogosinho Yes, given an RGB image it can determine the size to inch level.

1

0

1