Ziwei Liu Profile
Ziwei Liu

@liuziwei7

8,830
Followers
1,090
Following
192
Media
1,146
Statuses

Assistant Professor @ NTU - Vision, Learning and Graphics.

Singapore
Joined January 2018
Don't wanna be here? Send us removal request.
@liuziwei7
Ziwei Liu
8 months
🔥🔥We are excited to announce #Vchitect , an open-source project for video generative models @huggingface 📽️LaVie (Text2Video Model) - Code: - 📽️SEINE (Image2Video Model) - Code: -
35
343
1K
@liuziwei7
Ziwei Liu
1 year
We are presenting **SAD** (Segment Any RGBD): SAD is able to perform 3D segmentation (segment out any 3D object) with RGBD inputs (or rendered depth images only). - Code: - Demo @huggingface :
19
277
1K
@liuziwei7
Ziwei Liu
10 months
🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:
19
174
716
@liuziwei7
Ziwei Liu
1 year
Check out **RAM** (Relate Anything Model) ! - We empower Segment Anything Model (SAM) with the capability to recognize various visual relations between different visual concepts. - Code: - Demo @huggingface :
24
124
610
@liuziwei7
Ziwei Liu
1 year
Thrilled to announce **Otter**, a multi-modal in-context learning model with instruction tuning: 1) Chatbot w/ image, video, 3D 2) Only need 4x 3090 GPUs 3) Better than OpenFlamingo - Code: - Demo: - Video:
Tweet media one
10
125
553
@liuziwei7
Ziwei Liu
5 months
🔥Open-Source Video Diffusion Transformer🔥 Similar to #Sora , our ☕️Latte☕️ model also has *video diffusion transformer* architecture, with thorough study on the design space of *LDM + DiT* for video generation - Project: - Code:
@OpenAI
OpenAI
5 months
Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy
10K
33K
139K
14
128
521
@liuziwei7
Ziwei Liu
5 months
🔥Generative Gaussian Splatting🔥 #ICLR2024 Our 🚀DreamGaussian🚀 is accepted to @iclr_conf as **oral presentation**, enabling high-quality 3D generation in 2 minutes - Project: - Code: - Demo @huggingface :
@_akhaliq
AK
10 months
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation paper page: Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been
2
86
449
5
99
425
@liuziwei7
Ziwei Liu
1 year
Our F2-NeRF is open sourced at: . Feel free to create your own free-trajectory NeRF within 10min :)
@_akhaliq
AK
1 year
F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories abs: project page:
3
127
648
2
79
422
@liuziwei7
Ziwei Liu
4 years
We are releasing, #OpenSelfSup , an open-source library for self-supervised learning: - *All methods in one repository*, supporting PIRL, MoCo, SimCLR, etc. - *All benchmarks in one repository*. - *Efficiency*, supporting multi-GPU distributed training.
Tweet media one
2
132
421
@liuziwei7
Ziwei Liu
2 years
We have released, DeepFashion-MultiModal, a large-scale high-quality human dataset with rich multi-modal annotations: - 40K high-resolution human images - Human parsing, keypoints and DensePose annotations - Attribute and textual description annotations
Tweet media one
@_akhaliq
AK
2 years
Text2Human: Text-Driven Controllable Human Image Generation abs: project page: github:
1
54
256
5
85
418
@liuziwei7
Ziwei Liu
10 months
🔥🔥 We propose #DreamGaussian , a Generative Gaussian Splatting framework that produces high-quality textured meshes in just 2 minutes from a single-view image. - Project: - Paper: - Code:
@_akhaliq
AK
10 months
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation paper page: Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been
2
86
449
6
88
413
@liuziwei7
Ziwei Liu
1 year
Now you can create 3D fantasy world by combining #SceneDreamer with #ControlNet :) - Code: - Demo @huggingface :
@_akhaliq
AK
1 year
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections abs: project page:
12
181
734
6
93
381
@liuziwei7
Ziwei Liu
6 months
🔥Text-to-3D Foundation Model🔥 We are excited to announce #3DTopia , a generalist 🧊text-to-3D🧊 foundation model, which produces ** high-quality 3D assets within 5 minutes ** - Code: - Video:
9
101
383
@liuziwei7
Ziwei Liu
1 month
🔥Lumina-Next🔥 is a stronger and faster high-res text-to-image generation model. It also supports 1D (music) and 3D (point cloud) generation - T2I Demo: http://106.14.2.150:10020/ - Code: - Report: - Video:
Tweet media one
5
91
378
@liuziwei7
Ziwei Liu
4 months
📢Motion Capture from Any Video📢 The @Gradio demo for 🚀SMPLer-X🚀 (foundation model for monocular 3D human motion capture) is now online thanks to @_akhaliq ! - Project: - Code: - Online Demo @huggingface :
@liuziwei7
Ziwei Liu
10 months
🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:
19
174
716
5
70
377
@liuziwei7
Ziwei Liu
2 years
#MotionDiffuse (Diffusion Model for Motion) now has both Colab and @huggingface demos. Feel free to play and generate your favorite animation clip using text :) - Code: - Colab Demo: - @Gradio Demo:
@_akhaliq
AK
2 years
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework
9
170
770
7
91
349
@liuziwei7
Ziwei Liu
2 months
📢LLaVA-NeXT(-Video) Announced📢 * LLaVA-NeXT is one of the most competitive open-source VLM nowadays towards GPT4-V * LLaVA-NeXT-Video extends this capability to long videos, outperforming all existing video LLMs - Blog: - Code:
Tweet media one
@ChunyuanLi
Chunyuan Li
2 months
LLaVA-NeXT is Expanded: Support Larger Models and Video Tasks -🖼️Stronger LLMs Supercharge Multimodal Capabilities in the Wild 🗞️Blog 1: -📽️A Strong Zero-shot Video Understanding Model 🗞️Blog 2: -🌠Code:
Tweet media one
Tweet media two
3
54
208
1
99
340
@liuziwei7
Ziwei Liu
4 months
🔥Unbounded 3D City Generation🔥 #CVPR2024 We propose 🏙️CityDreamer🏙️, a compositional generative model for synthesizing unbounded 3D cities @CVPR - Project: - Code: - Demo @Gradio : , thanks to @_akhaliq !
@_akhaliq
AK
11 months
CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the
Tweet media one
6
131
613
4
79
341
@liuziwei7
Ziwei Liu
1 year
"Segment Any Point Cloud Sequences by Distilling Vision Foundation Models" We introduce **Seal**, a novel framework that harnesses vision foundation models for segmenting diverse automotive point cloud sequences. - Paper: - Code:
2
79
310
@liuziwei7
Ziwei Liu
1 year
We just launched a general video interaction platform based on LLMs, namely **Dolphin**. Dolphin is a chatbot that can interact with videos, spanning from video understanding to generation/editing. - Code: - Demo @huggingface :
13
89
314
@liuziwei7
Ziwei Liu
2 years
The code and models of #MotionDiffuse have been open-sourced at:
@_akhaliq
AK
2 years
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework
9
170
770
4
63
310
@liuziwei7
Ziwei Liu
10 months
🔥Large-Vocabulary 3D Diffusion Model with Transformer🔥 #DiffTF generates massive categories of real-world 3D objects with a single feed-forward diffusion model - Paper: - Project: - Code:
2
55
277
@liuziwei7
Ziwei Liu
4 months
🤩Music to 3D Duet Dance Generation🤩 #ICLR2024 We propose 🕺Duolando💃, a GPT-based model that autoregressively predicts 3D motion for both leader and follower dancer @iclr_conf - Project: - Paper: - Code:
7
54
273
@liuziwei7
Ziwei Liu
1 year
"Learning without Forgetting for Vision-Language Models" * We propose Project Fusion (PROOF) that enables VLMs to learn without forgetting. * Task-specific projections based on the frozen image/text encoders and multi-modal fusion are the keys - Paper:
Tweet media one
1
68
257
@liuziwei7
Ziwei Liu
1 year
The code of *SceneDreamer* has been open-sourced. Come and create your own consistent 3D world with only 2D image collections :) - Project: - Code: - Demo @huggingface :
@_akhaliq
AK
1 year
SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections abs: project page:
12
181
734
1
74
262
@liuziwei7
Ziwei Liu
2 years
#Text2Light generates VR environment (in the form of HDR panorama) from natural language description. - Project Page: - Demo Video: - Code:
@ScottieFoxTTV
ScottieFox
2 years
Stable Diffusion VR - MiDaS Test 6DoF Real-time immersive latent space -using depth maps. Experimenting with freedom of motion & depth. Tools used: #aiart #vr #stablediffusionart #touchdesigner #deforum
74
642
2K
2
54
258
@liuziwei7
Ziwei Liu
1 year
#ICLR2023 "Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction" Our Voxurf is accepted to @iclr_conf as **spotlight** presentation, achieving higher reconstruction quality with 20x speedup. - Paper: - Code:
3
37
252
@liuziwei7
Ziwei Liu
3 months
🔥Motion Generation Foundation Model🔥 Our 🏃‍♂️Large Motion Model ( #LMM )🏃‍♀️ unifies mainstream motion generation tasks (e.g., text2motion, music2dance) into a generalist transformer - Project: - Paper: - Code:
Tweet media one
4
49
249
@liuziwei7
Ziwei Liu
2 years
The motion generation code of our AvatarCLIP has been released: Feel free to generate your favorite avatar with signature move :)
@_akhaliq
AK
2 years
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars abs: project page: github: TL;DR: AvatarCLIP generate and animate avatars given descriptions of body shapes, appearances and motions.
2
52
235
1
37
246
@liuziwei7
Ziwei Liu
2 years
We propose, MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which enables 1) probabilistic mapping, 2) realistic synthesis and 3) multi-level manipulation. - Code will be available at:
@_akhaliq
AK
2 years
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model abs: project page: propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework
9
170
770
4
34
240
@liuziwei7
Ziwei Liu
3 years
We have released an @OpenMMLab human pose and shape estimation toolbox "MMHuman3D": - Popular methods with a modular framework - Various datasets with a unified data convention - Versatile visualization toolbox Welcome to use and contribute #MMHuman3D
0
49
230
@liuziwei7
Ziwei Liu
1 year
#CVPR2023 Our "Prompting in Vision" Tutorial was a huge success. Thanks so much to our amazing speakers and all the participants! - The tutorial slides and recordings will be uploaded to our tutorial website:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
55
222
@liuziwei7
Ziwei Liu
4 years
Our #CVPR2020 **oral** paper "Self-Supervised Scene De-occlusion": Paper: Code: Demo: - A self-supervised framework that can recover occluded objects & their spatial orders from a single RGB image.
0
62
216
@liuziwei7
Ziwei Liu
7 months
🚀🚀We present #HumanGaussian , an efficient **Text-to-3D Human** framework that generates high-quality 3D human (geo. and texture) from text only - Project: - Paper: - Code: - Video:
2
64
214
@liuziwei7
Ziwei Liu
2 years
Our work has been accepted to #CVPR2022 : - ViT generally has better OOD generalization ability than CNN under various distribution shifts. - Incorporating DA techniques (e.g. adversarial learning, minimax entropy and SSL) into ViT further boosts its generalization ability.
@liuziwei7
Ziwei Liu
3 years
"Delving Deep into the Generalization of Vision Transformers under Distribution Shifts": Paper: Code: - We investigate the OOD generalization of vision transformers. - We integrated domain adaptation techniques into transformers.
Tweet media one
3
42
175
2
33
210
@liuziwei7
Ziwei Liu
1 year
Multi-Modal In-Context Instruction Tuning (MIMIC-IT) * 2.8 million multimodal instruction-response pairs from images and videos * Conversational contexts aimed at empowering VLMs - Project: - Code: - Video:
@_akhaliq
AK
1 year
MIMIC-IT: Multi-Modal In-Context Instruction Tuning paper page: High-quality instructions and responses are essential for the zero-shot performance of large language models on interactive natural language tasks. For interactive vision-language tasks
Tweet media one
1
21
95
1
45
202
@liuziwei7
Ziwei Liu
7 months
🎞️Reenact Any Character in Movie🎞️ #NeurIPS2023 🔥SMPLer-X🔥 is the first foundation model for monocular 4D motion capture. Combining #SMPLerX and #Propainter to make your own *La La Land*! - Code (SMPLer-X): - Code (Propainter):
@liuziwei7
Ziwei Liu
10 months
🔥MoCap Anybody🔥 #NeurIPS2023 We propose *SMPLer-X*, the first generalist foundation model for 3D/4D human motion capture from monocular inputs. - Project: - Paper: - Code: - Demo:
19
174
716
3
51
204
@liuziwei7
Ziwei Liu
9 months
🔥🔥We propose #SEINE , a video diffusion model that focuses on generative transition and prediction. #SEINE supports *video transition generation* and *image-to-video animation* - Project: - Paper: - Code:
@_akhaliq
AK
9 months
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction paper page: Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips
2
46
167
9
45
198
@liuziwei7
Ziwei Liu
5 months
🔥Mamba with Longer Context🔥 We present 🐍LongMamba🐍, an early exploration of Mamba's **longer context extrapolation ability**. Our #LongMamba manages to retrieve *nearly perfectly* on a window context of 16384. - Code: - Model:
3
43
203
@liuziwei7
Ziwei Liu
2 months
🤩3D Generation Arena🤩 Lots of 3D generation models come out recently. But which one is preferable from human perception? ** Welcome to play with ~20 #3DGen models in our arena with both text-to-3D and image-to-3D @_akhaliq - 3DGen-Arena @huggingface :
Tweet media one
@YuhanZh89127485
Yuhan Zhang
2 months
📢📢Excited to release 3DGen-Arena, an open 3D Benchmarking platform. ⚔️Two tracks: Text-to-3D & Image-to-3D. 🎯Nineteen models: 9 for Text & 13 for Image. 🏆The Leaderbord is waiting for your votes! Let's play with 3D models and vote at !
5
28
96
3
37
196
@liuziwei7
Ziwei Liu
2 years
We have released a series of 3D/4D human datasets (including #GTAHuman and #HuMMan ) in *OpenXDLab*. Please feel free to check them out: - GTA-Human: - HuMMan: - MMHuman3D @OpenMMLab :
1
33
194
@liuziwei7
Ziwei Liu
7 months
🔥Text/Image-to-4D Generation🔥 We introduce #DreamGaussian4D , a 4D generative model with 4D Gaussian Splatting representation, achieving high-quality 4D generation within 5min. - Project: - Paper: - Code:
@_akhaliq
AK
7 months
DreamGaussian4D: Generative 4D Gaussian Splatting paper page: Remarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In
2
42
203
0
45
187
@liuziwei7
Ziwei Liu
9 months
We propose 🚀HyperHuman🚀, a hyper-realistic human image generation foundation model with better quality than Stable Diffusion XL. - Project: - Paper: - Code: - Demo:
@_akhaliq
AK
9 months
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion paper page: Despite significant advances in large-scale text-to-image models, achieving hyper-realistic human image generation remains a desirable yet unsolved task. Existing
Tweet media one
12
80
448
11
41
185
@liuziwei7
Ziwei Liu
2 years
#CVPR2022 We propose, DualStyleGAN, a high-resolution and data-efficient framework for portrait style transfer to various cartoon styles.
@_akhaliq
AK
2 years
Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer project page: github:
2
97
466
4
22
188
@liuziwei7
Ziwei Liu
2 years
#SIGGRAPHAsia2022 We propose, VToonify @siggraph , a controllable high-resolution (1K+) portrait video style transfer framework. VToonify generates high-quality artistic portrait videos with flexible style controls. - Paper: - Code:
@_akhaliq
AK
2 years
VToonify: Controllable High-Resolution Portrait Video Style Transfer abs: project page: github:
11
259
1K
0
46
187
@liuziwei7
Ziwei Liu
9 months
🔥Embodied vision-language agent that can program itself to play GTA🔥 🐙Octopus🐙, embodied VLM to plan intricate action sequences and generate executable code in complex env. - Project: - Paper: - Code:
@_akhaliq
AK
9 months
Octopus: Embodied Vision-Language Programmer from Environmental Feedback paper page: Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied
7
57
232
5
42
185
@liuziwei7
Ziwei Liu
3 years
Our #CVPR2021 paper "Deep Animation Video Interpolation in the Wild": Paper: Code: - An effective animation video interpolation framework as well as a large-scale animation triplet dataset (ATD-12K).
3
36
181
@liuziwei7
Ziwei Liu
4 months
📢Text-to-3D Foundation Model📢 Our #3DTopia has major updates, with 1) newly released technical report, and 2) our own *refined captions* for the Objaverse quality set - Code: - Paper: - Refined Objaverse:
@_akhaliq
AK
5 months
3DTopia demo is out on Hugging Face demo: text-to-3D foundation model, which produces high-quality 3D assets within 5 minutes
7
64
296
0
45
182
@liuziwei7
Ziwei Liu
1 year
🔥🔥MMBench: Is Your Multi-modal Model an All-around Player? 🧭MMBench🧭 is a systematically-designed benchmark for evaluating the various abilities of large multimodal models. - Project: - Paper: - Code:
Tweet media one
0
46
180
@liuziwei7
Ziwei Liu
7 months
Thrilled to announce our 👨‍🎤Digital Life Project👩‍🎤 🔥🔥Autonomous 3D Characters with Social Intelligence * All the stories, interactive dialogs, passive/active body motions in the demo video are generated by AI - Project: - Video:
2
40
177
@liuziwei7
Ziwei Liu
6 months
🔥Large-Vocabulary 3D Diffusion Model🔥 #ICLR2024 🎯DiffTF🎯 generates massive (>200) categories of real-world 3D objects with a single feed-forward 3D diffusion model @iclr_conf - Project: - Paper: - Code:
2
34
176
@liuziwei7
Ziwei Liu
3 years
"Delving Deep into the Generalization of Vision Transformers under Distribution Shifts": Paper: Code: - We investigate the OOD generalization of vision transformers. - We integrated domain adaptation techniques into transformers.
Tweet media one
3
42
175
@liuziwei7
Ziwei Liu
4 months
🔥Interactive Text-to-Texture Synthesis🔥 We present #InTeX , an interactive framework for 3D text-to-texture synthesis, with *region repainting* and *real-time editing on laptop* - Project: - Paper: - Code:
2
37
175
@liuziwei7
Ziwei Liu
1 year
#CVPR2023 Our F2-NeRF is accepted to @CVPR as **highlight**. F2-NeRF enables 1) arbitrary input camera trajectories for novel view synthesis and 2) only costs a few minutes for training. - More results: - Code will be released at:
@_akhaliq
AK
1 year
F2-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories abs: project page:
3
127
648
3
26
172
@liuziwei7
Ziwei Liu
5 months
🔥Image-to-Video Foundation Model🔥 #ICLR2024 Our 🚀SEINE🚀 is accepted to @iclr_conf , which enables high-quality *image-to-video* and *video transition generation* - Project: - Code: - Demo @huggingface :
@_akhaliq
AK
9 months
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction paper page: Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips
2
46
167
2
38
174
@liuziwei7
Ziwei Liu
5 months
🔥Benchmarking #Sora Quantitatively🔥 We perform a *preliminary evaluation* of #Sora on our 📊VBench📊. #Sora undoubtedly outperforms all existing models, especially for "video quality" and "dynamic" dimension - Code: - Benchmark:
@OpenAI
OpenAI
5 months
Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Prompt: “Beautiful, snowy
10K
33K
139K
7
39
172
@liuziwei7
Ziwei Liu
10 months
🔥🔥 We propose 🏙️CityDreamer🌆, a compositional generative model designed specifically for unbounded 3D cities. - Project: - Paper: - Code: - Demo:
@_akhaliq
AK
11 months
CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the
Tweet media one
6
131
613
3
27
166
@liuziwei7
Ziwei Liu
1 year
The training code of EVA3D is also released. Now you can train this high-quality 3D human GAN on your customized datasets: - Code and models:
@_akhaliq
AK
2 years
EVA3D: Compositional 3D Human Generation from 2D Image Collections abs: project page:
3
98
501
1
37
164
@liuziwei7
Ziwei Liu
1 year
#CVPR2023 Our OmniObject3D is selected as **award candidate** (top 0.1%, 12 out of 9155) @CVPR Large-vocabulary high-quality real-scanned 3D objects for perception & generation. - Project: - Paper: - Code:
@_akhaliq
AK
1 year
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:
Tweet media one
1
30
210
4
22
161
@liuziwei7
Ziwei Liu
2 months
🔥 #PSG4D for Spatial Intelligence🔥 We introduce 4D Panoptic Scene Graph ( #PSG4D ) that bridges dynamic 4D sensory data and high-level space-time understanding *Nodes are entities while edges are dynamic relations - Paper: - Code:
Tweet media one
1
42
163
@liuziwei7
Ziwei Liu
6 months
🔥Fine-Grained Text-to-Motion🔥 #NeurIPS2023 We present #FineMoGen , a diffusion-based and LLM-augmented framework that generates fine-grained motion with spatial-temporal prompt. - Project: - Paper: - Code:
0
35
162
@liuziwei7
Ziwei Liu
6 months
🔥Physics-based Text-to-Motion🔥 #NeurIPS2023 We present 👨‍🎤InsActor👩‍🎤, a generative framework that produces *diffusion policies* to synthesize motion for physics-based characters - Project: - Paper: - Code:
@_akhaliq
AK
7 months
InsActor: Instruction-driven Physics-based Characters paper page: Generating animation of physics-based characters with intuitive control has long been a desirable task with numerous applications. However, generating physically simulated animations that
1
42
216
1
39
164
@liuziwei7
Ziwei Liu
2 years
#AvatarCLIP (Text2Avatar Model) generates and animates 3D avatar from natural language description of body shape, appearance and motion. - Project Page: - Demo Video: - Code:
@aifunhouse
aifunhouse
2 years
1. In 2022, text-to-image tech has improved dramatically. Heading in to 2023, text-to-mesh, text-to-video, and text-to-audio models have all been demonstrated. Today we play fortuneteller and explain how in 2023 you'll likely be able to create full 3D characters from text. 🧵
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
24
124
6
41
163
@liuziwei7
Ziwei Liu
8 months
🔥When importing #CityDreamer into UE5🔥 - Project:
@_akhaliq
AK
11 months
CityDreamer: Compositional Generative Model of Unbounded 3D Cities paper page: In recent years, extensive research has focused on 3D natural scene generation, but the domain of 3D city generation has not received as much exploration. This is due to the
Tweet media one
6
131
613
1
39
155
@liuziwei7
Ziwei Liu
8 months
🚀🚀Fancy generating high-res (2K~4K) images using Stable Diffusion without an additional super-resolution module? * Now combining #FreeU with #ScaleCrafter , you can generate 4K images using SDXL for free! - FreeU: - ScaleCrafter:
Tweet media one
Tweet media two
Tweet media three
@_akhaliq
AK
10 months
FreeU: Free Lunch in Diffusion U-Net paper page: we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the
9
159
760
1
42
152
@liuziwei7
Ziwei Liu
9 months
🔥FreeU now has a major upgrade🔥 * By adding structure-aware scaling, FreeU excels at both structural coherence and detail preservation, largely improving the aesthetic quality upon Stable Diffusion XL for free. - Paper: - Code:
@_akhaliq
AK
10 months
FreeU: Free Lunch in Diffusion U-Net paper page: we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly. We initially investigate the key contributions of the
9
159
760
2
44
154
@liuziwei7
Ziwei Liu
2 years
#SIGGRAPH2022 We present a text-driven controllable framework, Text2Human, for high-quality and diverse human generation from natural languages. Welcome to try our @siggraph work with user interface at:
Tweet media one
@_akhaliq
AK
2 years
Text2Human: Text-Driven Controllable Human Image Generation abs: project page: github:
1
54
256
2
41
154
@liuziwei7
Ziwei Liu
3 months
😼Move Anything in Your Picture😼 #CVPR2024 We propose 🏞️SceneDiffusion🏞️ to freely rearrange image layouts by layered scene diffusion @CVPR * It supports a wide range of spatial editing operations, e.g., moving, resizing and layer-wise editing - Paper:
Tweet media one
3
33
152
@liuziwei7
Ziwei Liu
2 years
We propose, EVA3D, an unconditional 3D human generative model learned from 2D image collections. #EVA3D can sample 3D humans with detailed geometry and render high-quality images. - Project: - Code: - Video:
@_akhaliq
AK
2 years
EVA3D: Compositional 3D Human Generation from 2D Image Collections abs: project page:
3
98
501
1
25
150
@liuziwei7
Ziwei Liu
7 months
🔥Segment Any Point Cloud Sequences🔥 #NeurIPS2023 We introduce 🦭Seal🦭, a novel framework that harnesses vision foundation models for segmenting diverse automotive point cloud sequences @ldkong1205 @OpenMMLab - Project: - Code:
0
32
150
@liuziwei7
Ziwei Liu
4 months
✨Consistent Video-to-Video Generation✨ #CVPR2024 We present 🎞️FRESCO🎞️ with *spatial-temporal correspondence* to produce high-quality coherent videos from text prompts @CVPR - Project: - Paper: - Code: 🎞️
@_akhaliq
AK
4 months
FRESCO Spatial-Temporal Correspondence for Zero-Shot Video Translation The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to
Tweet media one
3
13
86
1
36
145
@liuziwei7
Ziwei Liu
2 years
We propose StyleFaceV to generate high-fidelity identity-preserving face videos with vivid movements. - Our core insight is to decompose appearance/pose information and recompose them in StyleGAN3 to produce stable and dynamic results. - Code:
@_akhaliq
AK
2 years
StyleFaceV: Face Video Generation via Decomposing and Recomposing Pretrained StyleGAN3 abs: project page: github:
Tweet media one
0
12
53
1
23
142
@liuziwei7
Ziwei Liu
2 months
📢 #ICLR2024 Welcome to check out our GenAI work @iclr_conf 📢 * Image Gen - HyperHuman: * Video Gen - SEINE: - FreeNoise: * 3D Gen - DreamGaussian: - DiffTF:
Tweet media one
1
21
141
@liuziwei7
Ziwei Liu
1 year
Excited to see that our new 🦦Otter🦦 model "OTTER-Image-MPT7B" ranks 🔥top🔥 on several large multimodal model evaluation benchmarks. - Code, demo and checkpoints:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@_akhaliq
AK
1 year
Otter: A Multi-Modal Model with In-Context Instruction Tuning abs: paper page: github:
Tweet media one
3
71
281
3
31
138
@liuziwei7
Ziwei Liu
7 months
🔥🔥We propose #VideoBooth to enable **customized video generation** with image prompts, which provide more accurate and direct content control beyond the text prompts. - Project: - Code: - Video:
@_akhaliq
AK
7 months
VideoBooth: Diffusion-based Video Generation with Image Prompts paper page: Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with
0
45
214
1
39
137
@liuziwei7
Ziwei Liu
8 months
@huggingface 🚀🚀We made an **AI trailer video** for the movie "The Wondering Earth 3" using #Vchitect ! - Project: - Code: - Demos @huggingface : - Videos:
4
30
135
@liuziwei7
Ziwei Liu
11 months
🔥PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds🔥 * We propose 🚹PointHPS🚺 for accurate 3D human pose and shape estimation from real-world point clouds - Project: - Paper: - Code:
0
27
137
@liuziwei7
Ziwei Liu
3 months
🤩Theme-Aware 3D Asset Generation🤩 #SIGGRAPH We present ⛽️ThemeStation⛽️, which synthesizes customized 3D assets based on few exemplars that exhibit a shared theme @siggraph - Project: - Paper: - Code:
@_akhaliq
AK
4 months
ThemeStation Generating Theme-Aware 3D Assets from Few Exemplars Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing
2
59
238
1
25
135
@liuziwei7
Ziwei Liu
2 years
#SIGGRAPH2022 We propose, AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. - It empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages.
@_akhaliq
AK
2 years
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars abs: project page: github: TL;DR: AvatarCLIP generate and animate avatars given descriptions of body shapes, appearances and motions.
2
52
235
2
29
129
@liuziwei7
Ziwei Liu
1 year
#CVPR2023 "Collaborative Diffusion for Multi-Modal Face Generation and Editing" Diffusion models collaborate to achieve multi-modal face generation without re-training @CVPR - Project: - Paper: - Code:
Tweet media one
2
28
130
@liuziwei7
Ziwei Liu
3 months
⚡️Dynamic 4D Human Rendering⚡️ #CVPR2024 Our 🏄‍♂️SurMo🏄‍♂️ learns dynamic 4D human rendering from videos by surface-based modeling of temporal dynamics and human appearances @CVPR - Project: - Paper: - Code:
0
25
130
@liuziwei7
Ziwei Liu
2 years
We contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a diverse set of subjects, actions and scenarios. - We discover that synthetic data provides critical complements to the real data. - Data and models:
@_akhaliq
AK
2 years
Playing for 3D Human Recovery project page: abs: github:
1
16
95
1
27
132
@liuziwei7
Ziwei Liu
3 months
✨3D Human Diffusion Model✨ We present #StructLDM , a latent diffusion model with high-dimension structural latent space for 3D human generation - Project: - Paper: - Code: - Video:
@dreamingtulpa
Dreaming Tulpa 🥓👑
3 months
Hate to say it, but AI girlfriends are definitely gonna be a thing. StructLDM for instance lets you generate compositional and animatable humans by blending different body parts, identity swapping, local clothing editing, 3D virtual try-on, etc.
9
41
246
2
33
129
@liuziwei7
Ziwei Liu
1 year
🔥Our multi-modal Otter has evolved🔥 #Otter now supports the newly released #Llama2 * We successfully trained a Flamingo-Llama2-Chat7B on CC3M in 5 hours using just 4 A100s * The model showed promising zero-shot captioning skills - Code and models:
Tweet media one
Tweet media two
@_akhaliq
AK
1 year
Otter: A Multi-Modal Model with In-Context Instruction Tuning abs: paper page: github:
Tweet media one
3
71
281
2
33
125
@liuziwei7
Ziwei Liu
4 months
🔥Video Generation with Image Prompts🔥 #CVPR2024 We propose *video generation with image prompts* 📽️VideoBooth📽️, providing more direct content control beyond text prompts @CVPR - Project: - Paper: - Code:
@_akhaliq
AK
7 months
VideoBooth: Diffusion-based Video Generation with Image Prompts paper page: Text-driven video generation witnesses rapid progress. However, merely using text prompts is not enough to depict the desired subject appearance that accurately aligns with
0
45
214
0
29
126
@liuziwei7
Ziwei Liu
6 months
🔥Evaluating 3D Generation with GPT-4V🔥 With carefully designed instructions, GPT-4V serves as an automatic 3D generation evaluator that *strongly aligns with human preference* - Project: - Paper: - Code:
@_akhaliq
AK
6 months
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation paper page: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion
Tweet media one
0
31
131
0
15
124
@liuziwei7
Ziwei Liu
5 months
🔥Large Multi-View Gaussian Model (LGM)🔥 We introduce #LGM , a feed-forward foundation model for text-to-3D and image-to-3D, which generates high-res 3D content in 5s - Project: - Code: - Demo @huggingface :
@_akhaliq
AK
5 months
LGM Large Multi-View Gaussian Model for High-Resolution 3D Content Creation paper page: 3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds,
5
74
321
3
21
124
@liuziwei7
Ziwei Liu
7 months
#NeurIPS2023 We propose 🔥PrimDiffusion🔥, a volumetric primitives diffusion model for 3D human generation, enabling explicit pose, view, and shape control with off-body topology. - Project: - Code: - Video:
1
29
123
@liuziwei7
Ziwei Liu
5 months
🔥Text-to-3D Foundation Model🔥 We present #3DTopia , a two-stage text-to-3D foundation model. The first stage quickly generates 3D candidates; the second stage refines the chosen 3D asset with high quality. - Code: - Demo @Gradio :
@_akhaliq
AK
5 months
3DTopia demo is out on Hugging Face demo: text-to-3D foundation model, which produces high-quality 3D assets within 5 minutes
7
64
296
0
28
122
@liuziwei7
Ziwei Liu
8 months
🚀🚀Happy to see #MotionDiffuse and #ReMoDiffuse are now integrated into *ComfyUI*. - MotionDiffuse: - ReMoDiffuse:
@taziku_co
田中義弘 | taziku CEO / AI × Creative
9 months
「ComfyUI-MotionDiff」は、ComfyUIに自然言語で動きを指定できる、MotionDiffuseを統合する事を目的に開発されたリポジトリ。 まだ試せていないですが、Openposeで動きを抽出できる。これができるとここからアニメも創れる #AI アニメ #生成AI #ComfyUI
1
5
25
0
25
122
@liuziwei7
Ziwei Liu
1 year
We propose **DeepFake-Adapter**, which effectively adapts a pre-trained ViT by enabling high-level semantics from ViT to organically interact with global and local low-level forgeries from adapters. - Paper: - Code:
Tweet media one
3
28
114
@liuziwei7
Ziwei Liu
1 year
We are organizing the #OmniObject3D challenge @ICCVConference with two competition tracks: 1) Track 1: sparse-view 3D reconstruction 2) Track 2: 3D object generation - Challenge period: Aug 1 - Sep 15, 2023 - Homepage: - CodaLab:
Tweet media one
@_akhaliq
AK
1 year
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:
Tweet media one
1
30
210
0
28
120
@liuziwei7
Ziwei Liu
10 months
#ICCV2023 We present #Text2Performer to generate high-resolution vivid human videos with articulated motions from text prompts @ICCVConference . - Project: - Paper: - Code: - Demo:
@arankomatsuzaki
Aran Komatsuzaki
1 year
Text2Performer: Text-Driven Human Video Generation proj: repo: abs:
12
106
405
2
31
119
@liuziwei7
Ziwei Liu
9 months
🔥OmniObject3D Update🔥 We released the fine-grained textural descriptions for #OmniObject3D , which are manually annotated from 5 aspects: *summary*, *appearance*, *material*, *style* and *function*. - Project: - Code and data:
@_akhaliq
AK
1 year
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:
Tweet media one
1
30
210
2
21
117
@liuziwei7
Ziwei Liu
1 year
#ICCV2023 We present 🔥StyleGANEX🔥 @ICCVConference , a next-generation StyleGAN architecture that can render unaligned images/videos for in-the-wild editing, SR and stylization. - Project: - Paper: - Code:
@_akhaliq
AK
1 year
StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces abs: project page: github:
0
45
193
0
23
117
@liuziwei7
Ziwei Liu
1 year
#CVPR2023 Thrilled to give the 🏆 award candidate talk for 🔥OmniObject3D🔥 @CVPR - All the data and code have been released at:
Tweet media one
Tweet media two
@_akhaliq
AK
1 year
OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation abs: project page:
Tweet media one
1
30
210
0
18
115
@liuziwei7
Ziwei Liu
11 months
#ICCV2023 We propose 🔥SHERF @ICCVConference , the first *generalizable* Human NeRF model for recovering *animatable* 3D human from a single image. - Project: - Paper: - Code: - Demo:
0
30
110
@liuziwei7
Ziwei Liu
20 days
🔥Long Context from Langugae to Vision🔥 #LongVA can process 2000 frames or over 200K visual tokens with SoTA performance on Video-MME among 7B models - Paper: - Code: - Demo @Gradio : . Thanks to @_akhaliq !
@Gradio
Gradio
21 days
🤩Long Video Assistant (LongVA): Breakthrough in long 🎥video understanding! - Transfers long context capability from language to vision 🧠 - Only opensource model supporting 384 input frames🤩 - Handles 2000+ frames (200K+ visual tokens) 🤯 - SoTA on Video-MME among 7B models -
3
43
160
1
37
111
@liuziwei7
Ziwei Liu
4 months
🔥One-Stop Evaluation Suite of Large Multimodal Models (LMM)🔥 We present 📊lmms-eval📊, one command evaluation API for thorough evaluation of LMMs over 40 datasets. - Code: - Blog: - Datasets @huggingface :
Tweet media one
@BoLi68567011
Li Bo
4 months
Accelerating the Development of Large Multimoal Models with LMMs-Eval Repo: Blog: We are offering a one command evaluation API for fast and thorough evaluation of LMMs over 39 datasets (increasingly).
Tweet media one
Tweet media two
1
24
114
3
31
110
@liuziwei7
Ziwei Liu
11 months
#ICCV2023 We present 🔥SparseNeRF @ICCVConference that synthesizes novel views given few images. SparseNeRF distills local depth ranking prior from real-world depth observations. - Project: - Paper: - Code:
@_akhaliq
AK
1 year
SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis abs:
Tweet media one
0
16
86
0
21
105