Ziwei Liu @liuziwei7 Twitter profile

Last Seen Profiles

@arushauthor

@yyates05

@pengen_stw

@ZHUMIAN_

@srhnylmz14

@lawyer_lord

@pib_comm

@F2PasTropSouche

@anil_chaudhary3

@TeterasMendoza

@chichichia1

@jetsoccer_game

@sarahcalathe

@rm94378

@NicholasJSimms

@recon2sg

@kurondou

@kouwh0

@_tuffy1028

@hellyeahkat

@minchiniku46

@MamahSumi

@DDOROOO3O

@freakyzone499

@cheating7777

@ashi1234ok

@Shigure_is_cute

@rasmusmp

@dialga1111

@anisa20022520

@mohammad_ma2

@st0pbeingflirty

@stw_pdg

@JackermanDev

@stwmaniax

@Miyasaffnet

Ziwei Liu

@liuziwei7

8 months

🔥🔥We are excited to announce #Vchitect , an open-source project for video generative models @huggingface 📽️LaVie (Text2Video Model) - Code: - 📽️SEINE (Image2Video Model) - Code: -

35

343

1K

Ziwei Liu

@liuziwei7

1 year

We are presenting **SAD** (Segment Any RGBD): SAD is able to perform 3D segmentation (segment out any 3D object) with RGBD inputs (or rendered depth images only). - Code: - Demo @huggingface :

19

277

1K

Ziwei Liu

@liuziwei7

10 months

🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:

19

174

716

Ziwei Liu

@liuziwei7

1 year

Check out **RAM** (Relate Anything Model) ! - We empower Segment Anything Model (SAM) with the capability to recognize various visual relations between different visual concepts. - Code: - Demo @huggingface :

24

124

610

Ziwei Liu

@liuziwei7

1 year

Thrilled to announce **Otter**, a multi-modal in-context learning model with instruction tuning: 1) Chatbot w/ image, video, 3D 2) Only need 4x 3090 GPUs 3) Better than OpenFlamingo - Code: - Demo: - Video:

10

125

553

Ziwei Liu

@liuziwei7

5 months

🔥Open-Source Video Diffusion Transformer🔥 Similar to #Sora , our ☕️Latte☕️ model also has *video diffusion transformer* architecture, with thorough study on the design space of *LDM + DiT* for video generation - Project: - Code:

GitHub - Vchitect/Latte: Latte: Latent Diffusion Transformer for Video Generation.

Latte: Latent Diffusion Transformer for Video Generation. - GitHub - Vchitect/Latte: Latte: Latent Diffusion Transformer for Video Generation.

github.com

OpenAI

@OpenAI

5 months

Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy

10K

33K

139K

14

128

521

Ziwei Liu

@liuziwei7

5 months

🔥Generative Gaussian Splatting🔥 #ICLR2024 Our 🚀DreamGaussian🚀 is accepted to @iclr_conf as **oral presentation**, enabling high-quality 3D generation in 2 minutes - Project: - Code: - Demo @huggingface :

AK

@_akhaliq

10 months

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation paper page: Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been

2

86

449

5

99

425

Ziwei Liu

@liuziwei7

1 year

Our F2-NeRF is open sourced at: . Feel free to create your own free-trajectory NeRF within 10min :)

GitHub - Totoro97/f2-nerf: Fast neural radiance field training with free camera trajectories

Fast neural radiance field training with free camera trajectories - Totoro97/f2-nerf

github.com

AK

@_akhaliq

1 year

F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories abs: project page:

3

127

648

2

79

422

Ziwei Liu

@liuziwei7

4 years

We are releasing, #OpenSelfSup , an open-source library for self-supervised learning: - *All methods in one repository*, supporting PIRL, MoCo, SimCLR, etc. - *All benchmarks in one repository*. - *Efficiency*, supporting multi-GPU distributed training.

2

132

421

Ziwei Liu

@liuziwei7

2 years

We have released, DeepFashion-MultiModal, a large-scale high-quality human dataset with rich multi-modal annotations: - 40K high-resolution human images - Human parsing, keypoints and DensePose annotations - Attribute and textual description annotations

AK

@_akhaliq

2 years

Text2Human: Text-Driven Controllable Human Image Generation abs: project page: github:

1

54

256

5

85

418

Ziwei Liu

@liuziwei7

10 months

🔥🔥 We propose #DreamGaussian , a Generative Gaussian Splatting framework that produces high-quality textured meshes in just 2 minutes from a single-view image. - Project: - Paper: - Code:

AK

@_akhaliq

10 months

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation paper page: Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been

2

86

449

6

88

413

Ziwei Liu

@liuziwei7

1 year

Now you can create 3D fantasy world by combining #SceneDreamer with #ControlNet :) - Code: - Demo @huggingface :

AK

@_akhaliq

1 year

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections abs: project page:

12

181

734

6

93

381

Ziwei Liu

@liuziwei7

6 months

🔥Text-to-3D Foundation Model🔥 We are excited to announce #3DTopia , a generalist 🧊text-to-3D🧊 foundation model, which produces ** high-quality 3D assets within 5 minutes ** - Code: - Video:

9

101

383

Ziwei Liu

@liuziwei7

1 month

🔥Lumina-Next🔥 is a stronger and faster high-res text-to-image generation model. It also supports 1D (music) and 3D (point cloud) generation - T2I Demo: http://106.14.2.150:10020/ - Code: - Report: - Video:

5

91

378

Ziwei Liu

@liuziwei7

4 months

📢Motion Capture from Any Video📢 The @Gradio demo for 🚀SMPLer-X🚀 (foundation model for monocular 3D human motion capture) is now online thanks to @_akhaliq ! - Project: - Code: - Online Demo @huggingface :

SMPLer X - a Hugging Face Space by caizhongang

huggingface.co

Ziwei Liu

@liuziwei7

10 months

🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:

19

174

716

5

70

377

Ziwei Liu

@liuziwei7

2 years

#MotionDiffuse (Diffusion Model for Motion) now has both Colab and @huggingface demos. Feel free to play and generate your favorite animation clip using text :) - Code: - Colab Demo: - @Gradio Demo:

MotionDiffuse - a Hugging Face Space by mingyuan

huggingface.co

AK

@_akhaliq

2 years

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework

9

170

770

7

91

349

Ziwei Liu

@liuziwei7

2 months

📢LLaVA-NeXT(-Video) Announced📢 * LLaVA-NeXT is one of the most competitive open-source VLM nowadays towards GPT4-V * LLaVA-NeXT-Video extends this capability to long videos, outperforming all existing video LLMs - Blog: - Code:

Chunyuan Li

@ChunyuanLi

2 months

LLaVA-NeXT is Expanded: Support Larger Models and Video Tasks -🖼️Stronger LLMs Supercharge Multimodal Capabilities in the Wild 🗞️Blog 1: -📽️A Strong Zero-shot Video Understanding Model 🗞️Blog 2: -🌠Code:

3

54

208

1

99

340

Ziwei Liu

@liuziwei7

4 months

🔥Unbounded 3D City Generation🔥 #CVPR2024 We propose 🏙️CityDreamer🏙️, a compositional generative model for synthesizing unbounded 3D cities @CVPR - Project: - Code: - Demo @Gradio : , thanks to @_akhaliq !

AK

@_akhaliq

11 months

CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the

6

131

613

4

79

341

Ziwei Liu

@liuziwei7

1 year

"Segment Any Point Cloud Sequences by Distilling Vision Foundation Models" We introduce **Seal**, a novel framework that harnesses vision foundation models for segmenting diverse automotive point cloud sequences. - Paper: - Code:

2

79

310

Ziwei Liu

@liuziwei7

1 year

We just launched a general video interaction platform based on LLMs, namely **Dolphin**. Dolphin is a chatbot that can interact with videos, spanning from video understanding to generation/editing. - Code: - Demo @huggingface :

13

89

314

Ziwei Liu

@liuziwei7

2 years

The code and models of #MotionDiffuse have been open-sourced at:

GitHub - mingyuan-zhang/MotionDiffuse: MotionDiffuse: Text-Driven Human Motion Generation with...

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model - mingyuan-zhang/MotionDiffuse

github.com

AK

@_akhaliq

2 years

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework

9

170

770

4

63

310

Ziwei Liu

@liuziwei7

10 months

🔥Large-Vocabulary 3D Diffusion Model with Transformer🔥 #DiffTF generates massive categories of real-world 3D objects with a single feed-forward diffusion model - Paper: - Project: - Code:

2

55

277

Ziwei Liu

@liuziwei7

4 months

🤩Music to 3D Duet Dance Generation🤩 #ICLR2024 We propose 🕺Duolando💃, a GPT-based model that autoregressively predicts 3D motion for both leader and follower dancer @iclr_conf - Project: - Paper: - Code:

7

54

273

Ziwei Liu

@liuziwei7

1 year

"Learning without Forgetting for Vision-Language Models" * We propose Project Fusion (PROOF) that enables VLMs to learn without forgetting. * Task-specific projections based on the frozen image/text encoders and multi-modal fusion are the keys - Paper:

1

68

257

Ziwei Liu

@liuziwei7

1 year

The code of *SceneDreamer* has been open-sourced. Come and create your own consistent 3D world with only 2D image collections :) - Project: - Code: - Demo @huggingface :

SceneDreamer - a Hugging Face Space by FrozenBurning

huggingface.co

AK

@_akhaliq

1 year

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections abs: project page:

12

181

734

1

74

262

Ziwei Liu

@liuziwei7

2 years

#Text2Light generates VR environment (in the form of HDR panorama) from natural language description. - Project Page: - Demo Video: - Code:

ScottieFox

@ScottieFoxTTV

2 years

Stable Diffusion VR - MiDaS Test 6DoF Real-time immersive latent space -using depth maps. Experimenting with freedom of motion & depth. Tools used: #aiart #vr #stablediffusionart #touchdesigner #deforum

74

642

2K

2

54

258

Ziwei Liu

@liuziwei7

1 year

#ICLR2023 "Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction" Our Voxurf is accepted to @iclr_conf as **spotlight** presentation, achieving higher reconstruction quality with 20x speedup. - Paper: - Code:

3

37

252

Ziwei Liu

@liuziwei7

3 months

🔥Motion Generation Foundation Model🔥 Our 🏃‍♂️Large Motion Model ( #LMM )🏃‍♀️ unifies mainstream motion generation tasks (e.g., text2motion, music2dance) into a generalist transformer - Project: - Paper: - Code:

4

49

249

Ziwei Liu

@liuziwei7

2 years

The motion generation code of our AvatarCLIP has been released: Feel free to generate your favorite avatar with signature move :)

AK

@_akhaliq

2 years

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars abs: project page: github: TL;DR: AvatarCLIP generate and animate avatars given descriptions of body shapes, appearances and motions.

2

52

235

1

37

246

Ziwei Liu

@liuziwei7

2 years

We propose, MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which enables 1) probabilistic mapping, 2) realistic synthesis and 3) multi-level manipulation. - Code will be available at:

GitHub - mingyuan-zhang/MotionDiffuse: MotionDiffuse: Text-Driven Human Motion Generation with...

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model - mingyuan-zhang/MotionDiffuse

github.com

AK

@_akhaliq

2 years

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework

9

170

770

4

34

240

Ziwei Liu

@liuziwei7

3 years

We have released an @OpenMMLab human pose and shape estimation toolbox "MMHuman3D": - Popular methods with a modular framework - Various datasets with a unified data convention - Versatile visualization toolbox Welcome to use and contribute #MMHuman3D

0

49

230

Ziwei Liu

@liuziwei7

1 year

#CVPR2023 Our "Prompting in Vision" Tutorial was a huge success. Thanks so much to our amazing speakers and all the participants! - The tutorial slides and recordings will be uploaded to our tutorial website:

4

55

222

Ziwei Liu

@liuziwei7

4 years

Our #CVPR2020 **oral** paper "Self-Supervised Scene De-occlusion": Paper: Code: Demo: - A self-supervised framework that can recover occluded objects & their spatial orders from a single RGB image.

0

62

216

Ziwei Liu

@liuziwei7

7 months

🚀🚀We present #HumanGaussian , an efficient **Text-to-3D Human** framework that generates high-quality 3D human (geo. and texture) from text only - Project: - Paper: - Code: - Video:

2

64

214

Ziwei Liu

@liuziwei7

2 years

Our work has been accepted to #CVPR2022 : - ViT generally has better OOD generalization ability than CNN under various distribution shifts. - Incorporating DA techniques (e.g. adversarial learning, minimax entropy and SSL) into ViT further boosts its generalization ability.

Ziwei Liu

@liuziwei7

3 years

"Delving Deep into the Generalization of Vision Transformers under Distribution Shifts": Paper: Code: - We investigate the OOD generalization of vision transformers. - We integrated domain adaptation techniques into transformers.

3

42

175

2

33

210

Ziwei Liu

@liuziwei7

1 year

Multi-Modal In-Context Instruction Tuning (MIMIC-IT) * 2.8 million multimodal instruction-response pairs from images and videos * Conversational contexts aimed at empowering VLMs - Project: - Code: - Video:

AK

@_akhaliq

1 year

MIMIC-IT: Multi-Modal In-Context Instruction Tuning paper page: High-quality instructions and responses are essential for the zero-shot performance of large language models on interactive natural language tasks. For interactive vision-language tasks

1

21

95

1

45

202

Ziwei Liu

@liuziwei7

7 months

🎞️Reenact Any Character in Movie🎞️ #NeurIPS2023 🔥SMPLer-X🔥 is the first foundation model for monocular 4D motion capture. Combining #SMPLerX and #Propainter to make your own *La La Land*! - Code (SMPLer-X): - Code (Propainter):

Ziwei Liu

@liuziwei7

10 months

🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:

19

174

716

3

51

204

Ziwei Liu

@liuziwei7

9 months

🔥🔥We propose #SEINE , a video diffusion model that focuses on generative transition and prediction. #SEINE supports *video transition generation* and *image-to-video animation* - Project: - Paper: - Code:

AK

@_akhaliq

9 months

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction paper page: Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips

2

46

167

9

45

198

Ziwei Liu

@liuziwei7

5 months

🔥Mamba with Longer Context🔥 We present 🐍LongMamba🐍, an early exploration of Mamba's **longer context extrapolation ability**. Our #LongMamba manages to retrieve *nearly perfectly* on a window context of 16384. - Code: - Model:

PY007/tokenized_slim6B_train_neox_16384 · Datasets at Hugging Face

huggingface.co

3

43

203

Ziwei Liu

@liuziwei7

2 months

🤩3D Generation Arena🤩 Lots of 3D generation models come out recently. But which one is preferable from human perception? ** Welcome to play with ~20 #3DGen models in our arena with both text-to-3D and image-to-3D @_akhaliq - 3DGen-Arena @huggingface :

Yuhan Zhang

@YuhanZh89127485

2 months

📢📢Excited to release 3DGen-Arena, an open 3D Benchmarking platform. ⚔️Two tracks: Text-to-3D & Image-to-3D. 🎯Nineteen models: 9 for Text & 13 for Image. 🏆The Leaderbord is waiting for your votes! Let's play with 3D models and vote at !

5

28

96

3

37

196

Ziwei Liu

@liuziwei7

2 years

We have released a series of 3D/4D human datasets (including #GTAHuman and #HuMMan ) in *OpenXDLab*. Please feel free to check them out: - GTA-Human: - HuMMan: - MMHuman3D @OpenMMLab :

1

33

194

Ziwei Liu

@liuziwei7

7 months

🔥Text/Image-to-4D Generation🔥 We introduce #DreamGaussian4D , a 4D generative model with 4D Gaussian Splatting representation, achieving high-quality 4D generation within 5min. - Project: - Paper: - Code:

AK

@_akhaliq

7 months

DreamGaussian4D: Generative 4D Gaussian Splatting paper page: Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In

2

42

203

0

45

187

Ziwei Liu

@liuziwei7

9 months

We propose 🚀HyperHuman🚀, a hyper-realistic human image generation foundation model with better quality than Stable Diffusion XL. - Project: - Paper: - Code: - Demo:

AK

@_akhaliq

9 months

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion paper page: Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing

12

80

448

11

41

185

Ziwei Liu

@liuziwei7

2 years

#CVPR2022 We propose, DualStyleGAN, a high-resolution and data-efficient framework for portrait style transfer to various cartoon styles.

AK

@_akhaliq

2 years

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer project page: github:

2

97

466

4

22

188

Ziwei Liu

@liuziwei7

2 years

#SIGGRAPHAsia2022 We propose, VToonify @siggraph , a controllable high-resolution (1K+) portrait video style transfer framework. VToonify generates high-quality artistic portrait videos with flexible style controls. - Paper: - Code:

GitHub - williamyang1991/VToonify: [SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution...

[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer - williamyang1991/VToonify

github.com

AK

@_akhaliq

2 years

VToonify: Controllable High-Resolution Portrait Video Style Transfer abs: project page: github:

11

259

1K

0

46

187

Ziwei Liu

@liuziwei7

9 months

🔥Embodied vision-language agent that can program itself to play GTA🔥 🐙Octopus🐙, embodied VLM to plan intricate action sequences and generate executable code in complex env. - Project: - Paper: - Code:

GitHub - dongyh20/Octopus: 🐙Octopus, an embodied vision-language model trained with RLEF, emerging...

🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming. - dongyh20/Octopus

github.com

AK

@_akhaliq

9 months

Octopus: Embodied Vision-Language Programmer from Environmental Feedback paper page: Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied

7

57

232

5

42

185

Ziwei Liu

@liuziwei7

3 years

Our #CVPR2021 paper "Deep Animation Video Interpolation in the Wild": Paper: Code: - An effective animation video interpolation framework as well as a large-scale animation triplet dataset (ATD-12K).

3

36

181

Ziwei Liu

@liuziwei7

4 months

📢Text-to-3D Foundation Model📢 Our #3DTopia has major updates, with 1) newly released technical report, and 2) our own *refined captions* for the Objaverse quality set - Code: - Paper: - Refined Objaverse:

Releases · 3DTopia/3DTopia

Text-to-3D Generation within 5 Minutes. Contribute to 3DTopia/3DTopia development by creating an account on GitHub.

github.com

AK

@_akhaliq

5 months

3DTopia demo is out on Hugging Face demo: text-to-3D foundation model, which produces high-quality 3D assets within 5 minutes

7

64

296

0

45

182

Ziwei Liu

@liuziwei7

1 year

🔥🔥MMBench: Is Your Multi-modal Model an All-around Player? 🧭MMBench🧭 is a systematically-designed benchmark for evaluating the various abilities of large multimodal models. - Project: - Paper: - Code:

0

46

180

Ziwei Liu

@liuziwei7

7 months

Thrilled to announce our 👨‍🎤Digital Life Project👩‍🎤 🔥🔥Autonomous 3D Characters with Social Intelligence * All the stories, interactive dialogs, passive/active body motions in the demo video are generated by AI - Project: - Video:

2

40

177

Ziwei Liu

@liuziwei7

6 months

🔥Large-Vocabulary 3D Diffusion Model🔥 #ICLR2024 🎯DiffTF🎯 generates massive (>200) categories of real-world 3D objects with a single feed-forward 3D diffusion model @iclr_conf - Project: - Paper: - Code:

2

34

176

Ziwei Liu

@liuziwei7

3 years

"Delving Deep into the Generalization of Vision Transformers under Distribution Shifts": Paper: Code: - We investigate the OOD generalization of vision transformers. - We integrated domain adaptation techniques into transformers.

3

42

175

Ziwei Liu

@liuziwei7

4 months

🔥Interactive Text-to-Texture Synthesis🔥 We present #InTeX , an interactive framework for 3D text-to-texture synthesis, with *region repainting* and *real-time editing on laptop* - Project: - Paper: - Code:

2

37

175

Ziwei Liu

@liuziwei7

1 year

#CVPR2023 Our F2-NeRF is accepted to @CVPR as **highlight**. F2-NeRF enables 1) arbitrary input camera trajectories for novel view synthesis and 2) only costs a few minutes for training. - More results: - Code will be released at:

GitHub - Totoro97/f2-nerf: Fast neural radiance field training with free camera trajectories

Fast neural radiance field training with free camera trajectories - Totoro97/f2-nerf

github.com

AK

@_akhaliq

1 year

F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories abs: project page:

3

127

648

3

26

172

Ziwei Liu

@liuziwei7

5 months

🔥Image-to-Video Foundation Model🔥 #ICLR2024 Our 🚀SEINE🚀 is accepted to @iclr_conf , which enables high-quality *image-to-video* and *video transition generation* - Project: - Code: - Demo @huggingface :

AK

@_akhaliq

9 months

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction paper page: Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips

2

46

167

2

38

174

Ziwei Liu

@liuziwei7

5 months

🔥Benchmarking #Sora Quantitatively🔥 We perform a *preliminary evaluation* of #Sora on our 📊VBench📊. #Sora undoubtedly outperforms all existing models, especially for "video quality" and "dynamic" dimension - Code: - Benchmark:

VBench Leaderboard - a Hugging Face Space by Vchitect

huggingface.co

OpenAI

@OpenAI

5 months

Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy

10K

33K

139K

7

39

172

Ziwei Liu

@liuziwei7

10 months

🔥🔥 We propose 🏙️CityDreamer🌆, a compositional generative model designed specifically for unbounded 3D cities. - Project: - Paper: - Code: - Demo:

AK

@_akhaliq

11 months

CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the

6

131

613

3

27

166

Ziwei Liu

@liuziwei7

1 year

The training code of EVA3D is also released. Now you can train this high-quality 3D human GAN on your customized datasets: - Code and models:

GitHub - hongfz16/EVA3D: [ICLR 2023 Spotlight] EVA3D: Compositional 3D Human Generation from 2D...

[ICLR 2023 Spotlight] EVA3D: Compositional 3D Human Generation from 2D Image Collections - hongfz16/EVA3D

github.com

AK

@_akhaliq

2 years

EVA3D: Compositional 3D Human Generation from 2D Image Collections abs: project page:

3

98

501

1

37

164

Ziwei Liu

@liuziwei7

1 year

#CVPR2023 Our OmniObject3D is selected as **award candidate** (top 0.1%, 12 out of 9155) @CVPR Large-vocabulary high-quality real-scanned 3D objects for perception & generation. - Project: - Paper: - Code:

GitHub - omniobject3d/OmniObject3D: [ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D...

[ CVPR 2023 Award Candidate ] OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation - omniobject3d/OmniObject3D

github.com

AK

@_akhaliq

1 year

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:

1

30

210

4

22

161

Ziwei Liu

@liuziwei7

2 months

🔥 #PSG4D for Spatial Intelligence🔥 We introduce 4D Panoptic Scene Graph ( #PSG4D ) that bridges dynamic 4D sensory data and high-level space-time understanding *Nodes are entities while edges are dynamic relations - Paper: - Code:

1

42

163

Ziwei Liu

@liuziwei7

6 months

🔥Fine-Grained Text-to-Motion🔥 #NeurIPS2023 We present #FineMoGen , a diffusion-based and LLM-augmented framework that generates fine-grained motion with spatial-temporal prompt. - Project: - Paper: - Code:

0

35

162

Ziwei Liu

@liuziwei7

6 months

🔥Physics-based Text-to-Motion🔥 #NeurIPS2023 We present 👨‍🎤InsActor👩‍🎤, a generative framework that produces *diffusion policies* to synthesize motion for physics-based characters - Project: - Paper: - Code:

GitHub - jiawei-ren/insactor: [NeurIPS 2023] InsActor: Instruction-driven Physics-based Characters

[NeurIPS 2023] InsActor: Instruction-driven Physics-based Characters - jiawei-ren/insactor

github.com

AK

@_akhaliq

7 months

InsActor: Instruction-driven Physics-based Characters paper page: Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications. However, generating physically simulated animations that

1

42

216

1

39

164

Ziwei Liu

@liuziwei7

2 years

#AvatarCLIP (Text2Avatar Model) generates and animates 3D avatar from natural language description of body shape, appearance and motion. - Project Page: - Demo Video: - Code:

aifunhouse

@aifunhouse

2 years

1. In 2022, text-to-image tech has improved dramatically. Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated. Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text. 🧵

6

24

124

6

41

163

Ziwei Liu

@liuziwei7

8 months

🔥When importing #CityDreamer into UE5🔥 - Project:

AK

@_akhaliq

11 months

CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the

6

131

613

1

39

155

Ziwei Liu

@liuziwei7

8 months

🚀🚀Fancy generating high-res (2K~4K) images using Stable Diffusion without an additional super-resolution module? * Now combining #FreeU with #ScaleCrafter , you can generate 4K images using SDXL for free! - FreeU: - ScaleCrafter:

AK

@_akhaliq

10 months

FreeU: Free Lunch in Diffusion U-Net paper page: we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the

9

159

760

1

42

152

Ziwei Liu

@liuziwei7

9 months

🔥FreeU now has a major upgrade🔥 * By adding structure-aware scaling, FreeU excels at both structural coherence and detail preservation, largely improving the aesthetic quality upon Stable Diffusion XL for free. - Paper: - Code:

AK

@_akhaliq

10 months

FreeU: Free Lunch in Diffusion U-Net paper page: we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the

9

159

760

2

44

154

Ziwei Liu

@liuziwei7

2 years

#SIGGRAPH2022 We present a text-driven controllable framework, Text2Human, for high-quality and diverse human generation from natural languages. Welcome to try our @siggraph work with user interface at:

AK

@_akhaliq

2 years

Text2Human: Text-Driven Controllable Human Image Generation abs: project page: github:

1

54

256

2

41

154

Ziwei Liu

@liuziwei7

3 months

😼Move Anything in Your Picture😼 #CVPR2024 We propose 🏞️SceneDiffusion🏞️ to freely rearrange image layouts by layered scene diffusion @CVPR * It supports a wide range of spatial editing operations, e.g., moving, resizing and layer-wise editing - Paper:

3

33

152

Ziwei Liu

@liuziwei7

2 years

We propose, EVA3D, an unconditional 3D human generative model learned from 2D image collections. #EVA3D can sample 3D humans with detailed geometry and render high-quality images. - Project: - Code: - Video:

Demo Video for EVA3D: Compositional 3D Human Generation from 2D Image...

This is a demo video for EVA3D: Compositional 3D Human Generation from 2D Image Collections

www.youtube.com

AK

@_akhaliq

2 years

EVA3D: Compositional 3D Human Generation from 2D Image Collections abs: project page:

3

98

501

1

25

150

Ziwei Liu

@liuziwei7

7 months

🔥Segment Any Point Cloud Sequences🔥 #NeurIPS2023 We introduce 🦭Seal🦭, a novel framework that harnesses vision foundation models for segmenting diverse automotive point cloud sequences @ldkong1205 @OpenMMLab - Project: - Code:

0

32

150

Ziwei Liu

@liuziwei7

4 months

✨Consistent Video-to-Video Generation✨ #CVPR2024 We present 🎞️FRESCO🎞️ with *spatial-temporal correspondence* to produce high-quality coherent videos from text prompts @CVPR - Project: - Paper: - Code: 🎞️

AK

@_akhaliq

4 months

FRESCO Spatial-Temporal Correspondence for Zero-Shot Video Translation The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to

3

13

86

1

36

145

Ziwei Liu

@liuziwei7

2 years

We propose StyleFaceV to generate high-fidelity identity-preserving face videos with vivid movements. - Our core insight is to decompose appearance/pose information and recompose them in StyleGAN3 to produce stable and dynamic results. - Code:

AK

@_akhaliq

2 years

StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3 abs: project page: github:

0

12

53

1

23

142

Ziwei Liu

@liuziwei7

2 months

📢 #ICLR2024 Welcome to check out our GenAI work @iclr_conf 📢 * Image Gen - HyperHuman: * Video Gen - SEINE: - FreeNoise: * 3D Gen - DreamGaussian: - DiffTF:

1

21

141

Ziwei Liu

@liuziwei7

1 year

Excited to see that our new 🦦Otter🦦 model "OTTER-Image-MPT7B" ranks 🔥top🔥 on several large multimodal model evaluation benchmarks. - Code, demo and checkpoints:

AK

@_akhaliq

1 year

Otter: A Multi-Modal Model with In-Context Instruction Tuning abs: paper page: github:

3

71

281

3

31

138

Ziwei Liu

@liuziwei7

7 months

🔥🔥We propose #VideoBooth to enable **customized video generation** with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:

AK

@_akhaliq

7 months

VideoBooth: Diffusion-based Video Generation with Image Prompts paper page: Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with

0

45

214

1

39

137

Ziwei Liu

@liuziwei7

8 months

@huggingface 🚀🚀We made an **AI trailer video** for the movie "The Wondering Earth 3" using #Vchitect ! - Project: - Code: - Demos @huggingface : - Videos:

4

30

135

Ziwei Liu

@liuziwei7

11 months

🔥PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds🔥 * We propose 🚹PointHPS🚺 for accurate 3D human pose and shape estimation from real-world point clouds - Project: - Paper: - Code:

0

27

137

Ziwei Liu

@liuziwei7

3 months

🤩Theme-Aware 3D Asset Generation🤩 #SIGGRAPH We present ⛽️ThemeStation⛽️, which synthesizes customized 3D assets based on few exemplars that exhibit a shared theme @siggraph - Project: - Paper: - Code:

AK

@_akhaliq

4 months

ThemeStation Generating Theme-Aware 3D Assets from Few Exemplars Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing

2

59

238

1

25

135

Ziwei Liu

@liuziwei7

2 years

#SIGGRAPH2022 We propose, AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. - It empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.

AK

@_akhaliq

2 years

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars abs: project page: github: TL;DR: AvatarCLIP generate and animate avatars given descriptions of body shapes, appearances and motions.

2

52

235

2

29

129

Ziwei Liu

@liuziwei7

1 year

#CVPR2023 "Collaborative Diffusion for Multi-Modal Face Generation and Editing" Diffusion models collaborate to achieve multi-modal face generation without re-training @CVPR - Project: - Paper: - Code:

2

28

130

Ziwei Liu

@liuziwei7

3 months

⚡️Dynamic 4D Human Rendering⚡️ #CVPR2024 Our 🏄‍♂️SurMo🏄‍♂️ learns dynamic 4D human rendering from videos by surface-based modeling of temporal dynamics and human appearances @CVPR - Project: - Paper: - Code:

0

25

130

Ziwei Liu

@liuziwei7

2 years

We contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a diverse set of subjects, actions and scenarios. - We discover that synthetic data provides critical complements to the real data. - Data and models:

AK

@_akhaliq

2 years

Playing for 3D Human Recovery project page: abs: github:

1

16

95

1

27

132

Ziwei Liu

@liuziwei7

3 months

✨3D Human Diffusion Model✨ We present #StructLDM , a latent diffusion model with high-dimension structural latent space for 3D human generation - Project: - Paper: - Code: - Video:

StructLDM: Structured Latent Diffusion for 3D Human Generation [ECCV...

Project Page: https://taohuumd.github.io/projects/StructLDMStructLDM is a diffusion-based unconditional 3D human generative model learned from 2D images. Str...

www.youtube.com

Dreaming Tulpa 🥓👑

@dreamingtulpa

3 months

Hate to say it, but AI girlfriends are definitely gonna be a thing. StructLDM for instance lets you generate compositional and animatable humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc.

9

41

246

2

33

129

Ziwei Liu

@liuziwei7

1 year

🔥Our multi-modal Otter has evolved🔥 #Otter now supports the newly released #Llama2 * We successfully trained a Flamingo-Llama2-Chat7B on CC3M in 5 hours using just 4 A100s * The model showed promising zero-shot captioning skills - Code and models:

AK

@_akhaliq

1 year

Otter: A Multi-Modal Model with In-Context Instruction Tuning abs: paper page: github:

3

71

281

2

33

125

Ziwei Liu

@liuziwei7

4 months

🔥Video Generation with Image Prompts🔥 #CVPR2024 We propose *video generation with image prompts* 📽️VideoBooth📽️, providing more direct content control beyond text prompts @CVPR - Project: - Paper: - Code:

GitHub - Vchitect/VideoBooth: [CVPR2024] VideoBooth: Diffusion-based Video Generation with Image...

[CVPR2024] VideoBooth: Diffusion-based Video Generation with Image Prompts - Vchitect/VideoBooth

github.com

AK

@_akhaliq

7 months

VideoBooth: Diffusion-based Video Generation with Image Prompts paper page: Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with

0

45

214

0

29

126

Ziwei Liu

@liuziwei7

6 months

🔥Evaluating 3D Generation with GPT-4V🔥 With carefully designed instructions, GPT-4V serves as an automatic 3D generation evaluator that *strongly aligns with human preference* - Project: - Paper: - Code:

GitHub - 3DTopia/GPTEval3D: [ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned...

[ CVPR 2024 ] Implementation for "GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation" - 3DTopia/GPTEval3D

github.com

AK

@_akhaliq

6 months

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation paper page: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion

0

31

131

0

15

124

Ziwei Liu

@liuziwei7

5 months

🔥Large Multi-View Gaussian Model (LGM)🔥 We introduce #LGM , a feed-forward foundation model for text-to-3D and image-to-3D, which generates high-res 3D content in 5s - Project: - Code: - Demo @huggingface :

LGM - a Hugging Face Space by ashawkey

huggingface.co

AK

@_akhaliq

5 months

LGM Large Multi-View Gaussian Model for High-Resolution 3D Content Creation paper page: 3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds,

5

74

321

3

21

124

Ziwei Liu

@liuziwei7

7 months

#NeurIPS2023 We propose 🔥PrimDiffusion🔥, a volumetric primitives diffusion model for 3D human generation, enabling explicit pose, view, and shape control with off-body topology. - Project: - Code: - Video:

1

29

123

Ziwei Liu

@liuziwei7

5 months

🔥Text-to-3D Foundation Model🔥 We present #3DTopia , a two-stage text-to-3D foundation model. The first stage quickly generates 3D candidates; the second stage refines the chosen 3D asset with high quality. - Code: - Demo @Gradio :

3DTopia - a Hugging Face Space by hongfz16

huggingface.co

AK

@_akhaliq

5 months

3DTopia demo is out on Hugging Face demo: text-to-3D foundation model, which produces high-quality 3D assets within 5 minutes

7

64

296

0

28

122

Ziwei Liu

@liuziwei7

8 months

🚀🚀Happy to see #MotionDiffuse and #ReMoDiffuse are now integrated into *ComfyUI*. - MotionDiffuse: - ReMoDiffuse:

GitHub - mingyuan-zhang/ReMoDiffuse: ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model - mingyuan-zhang/ReMoDiffuse

github.com

田中義弘 | taziku CEO / AI × Creative

@taziku_co

9 months

「ComfyUI-MotionDiff」は、ComfyUIに自然言語で動きを指定できる、MotionDiffuseを統合する事を目的に開発されたリポジトリ。まだ試せていないですが、Openposeで動きを抽出できる。これができるとここからアニメも創れる #AI アニメ #生成AI #ComfyUI

1

5

25

0

25

122

Ziwei Liu

@liuziwei7

1 year

We propose **DeepFake-Adapter**, which effectively adapts a pre-trained ViT by enabling high-level semantics from ViT to organically interact with global and local low-level forgeries from adapters. - Paper: - Code:

3

28

114

Ziwei Liu

@liuziwei7

1 year

We are organizing the #OmniObject3D challenge @ICCVConference with two competition tracks: 1) Track 1: sparse-view 3D reconstruction 2) Track 2: 3D object generation - Challenge period: Aug 1 - Sep 15, 2023 - Homepage: - CodaLab:

AK

@_akhaliq

1 year

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:

1

30

210

0

28

120

Ziwei Liu

@liuziwei7

10 months

#ICCV2023 We present #Text2Performer to generate high-resolution vivid human videos with articulated motions from text prompts @ICCVConference . - Project: - Paper: - Code: - Demo:

Text2Performer: Text-Driven Human Video Generation

null

www.youtube.com

Aran Komatsuzaki

@arankomatsuzaki

1 year

Text2Performer: Text-Driven Human Video Generation proj: repo: abs:

12

106

405

2

31

119

Ziwei Liu

@liuziwei7

9 months

🔥OmniObject3D Update🔥 We released the fine-grained textural descriptions for #OmniObject3D , which are manually annotated from 5 aspects: *summary*, *appearance*, *material*, *style* and *function*. - Project: - Code and data:

AK

@_akhaliq

1 year

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:

1

30

210

2

21

117

Ziwei Liu

@liuziwei7

1 year

#ICCV2023 We present 🔥StyleGANEX🔥 @ICCVConference , a next-generation StyleGAN architecture that can render unaligned images/videos for in-the-wild editing, SR and stylization. - Project: - Paper: - Code:

GitHub - williamyang1991/StyleGANEX: [ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond...

[ICCV 2023] StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces - williamyang1991/StyleGANEX

github.com

AK

@_akhaliq

1 year

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces abs: project page: github:

0

45

193

0

23

117

Ziwei Liu

@liuziwei7

1 year

#CVPR2023 Thrilled to give the 🏆 award candidate talk for 🔥OmniObject3D🔥 @CVPR - All the data and code have been released at:

AK

@_akhaliq

1 year

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:

1

30

210

0

18

115

Ziwei Liu

@liuziwei7

11 months

#ICCV2023 We propose 🔥SHERF @ICCVConference , the first *generalizable* Human NeRF model for recovering *animatable* 3D human from a single image. - Project: - Paper: - Code: - Demo:

0

30

110

Ziwei Liu

@liuziwei7

20 days

🔥Long Context from Langugae to Vision🔥 #LongVA can process 2000 frames or over 200K visual tokens with SoTA performance on Video-MME among 7B models - Paper: - Code: - Demo @Gradio : . Thanks to @_akhaliq !

LongVA Multimodal Chat from LMMs-Lab

longva-demo.lmms-lab.com

Gradio

@Gradio

21 days

🤩Long Video Assistant (LongVA): Breakthrough in long 🎥video understanding! - Transfers long context capability from language to vision 🧠 - Only opensource model supporting 384 input frames🤩 - Handles 2000+ frames (200K+ visual tokens) 🤯 - SoTA on Video-MME among 7B models -

3

43

160

1

37

111

Ziwei Liu

@liuziwei7

4 months

🔥One-Stop Evaluation Suite of Large Multimodal Models (LMM)🔥 We present 📊lmms-eval📊, one command evaluation API for thorough evaluation of LMMs over 40 datasets. - Code: - Blog: - Datasets @huggingface :

Li Bo

@BoLi68567011

4 months

Accelerating the Development of Large Multimoal Models with LMMs-Eval Repo: Blog: We are offering a one command evaluation API for fast and thorough evaluation of LMMs over 39 datasets (increasingly).

1

24

114

3

31

110

Ziwei Liu

@liuziwei7

11 months

#ICCV2023 We present 🔥SparseNeRF @ICCVConference that synthesizes novel views given few images. SparseNeRF distills local depth ranking prior from real-world depth observations. - Project: - Paper: - Code:

AK

@_akhaliq

1 year

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis abs:

0

16

86

0

21

105