Excited to present our new paper "LightGaussian": A Compact & Efficient Pipeline for Converting
#3D
#GaussianSplatting
into a more compact format!
✍️15x compression rate
✍️200+FPS
Project page 👉:
Paper 📷:
Speeding your view synthesis(<40s) with
#InstantSplat
!
Our large-scale, pose-free method trains in just 37 seconds from sparse views—no
#COLMAP
, no intrinsics needed.
Achieving nearly 30dB test PSNR with just 12 images, New standard in
#NVS
and new training efficiency.
Our feature3DGS is selected as
#CVPR2024
highlight🥳 See you in Seattle.
Unlock the answer of 3D Gaussian Splatting + Large 2D Vision Models?
- distills feature from 2D large-scale models, to 3D, enabling semantic, editable, and promptable explicit (of course real-time) 3D
Construct your 3D model from just three views? See our "FSGS": A Few-Shot View Synthesis Framework based on
#3D
#GaussianSplatting
✍️: 2,000x faster than NeRFs
✍️: SSIM from 0.582 (3D-GS) to 0.682
Project page📸:
Paper📸:
Exciting update: Our latest results using THREE training views!
Despite unknown poses and intrinsics,
#instantsplat
achieve this in under 20 seconds— even faster under sparser views.
Speeding your view synthesis(<40s) with
#InstantSplat
!
Our large-scale, pose-free method trains in just 37 seconds from sparse views—no
#COLMAP
, no intrinsics needed.
Achieving nearly 30dB test PSNR with just 12 images, New standard in
#NVS
and new training efficiency.
Excited to share our
#CVPR2024
paper 'Lift3D'. Discover how to elevate 2D vision models to produce
#3D
consistent predictions and eliminate flickering post-editing—all in a zero-shot manner.
Key Highlights:
✍️Applies to
#DINO
,
#CLIP
, style transfer, SR, open vocabulary
Totally feeling this
#DiT
moment with
#SORA
in 3D Computer Vision! 🚀 The
#dust3r
, super simple and a game-changer. It flips the usual 3D reconstruction steps on their head, letting us get dense reconstructions in just one go.
See video below and more⬇️
Heading to ✈️
#CVPR2024
in
#Seattle
. Please DM me if you’d like to have a ☕️ chat, especially on efficient 3D modeling, and 3D and 4D generation.
Check out the video to see our CVPR presentations and recent work on “Reconstructing Semantic 3D from Unposed Images in Milliseconds”
Three of my papers have been accepted by
#ECCV2024
(and another one for
#IROS
)!
While paper count isn’t everything, these works explore fascinating new areas: 3D reconstruction from sparse views, high-quality 3D multi-task learning, and large-scale 3D generation using
Another in-the-wild
#InstantSplat
test using just THREE training views, under resolution of 1920x1080.
This demonstrates that proper initialization and disabling Adaptive Density Control effectively suppress excessive 3D Gaussians.
We're close to supporting arbitrary
Speeding your view synthesis(<40s) with
#InstantSplat
!
Our large-scale, pose-free method trains in just 37 seconds from sparse views—no
#COLMAP
, no intrinsics needed.
Achieving nearly 30dB test PSNR with just 12 images, New standard in
#NVS
and new training efficiency.
- Dive into our latest research on
#Multimodal
#SLAM
: unposed camera images + inertial measurements -> real-time rendering + dense 3D map.
- The new UT-MM dataset, a multi-modal one from a mobile robot + a camera + an inertial measurement unit.
👉Paper:
- Three training views on
#DL3DV
-10K datasets
- Camera poses & intrinsics are unknown
- Rendering by interpolation
- Resolution: 1920x1080
I can tell, pseudo-views are needed to enhance the quality.
Speeding your view synthesis(<40s) with
#InstantSplat
!
Our large-scale, pose-free method trains in just 37 seconds from sparse views—no
#COLMAP
, no intrinsics needed.
Achieving nearly 30dB test PSNR with just 12 images, New standard in
#NVS
and new training efficiency.
The LightGaussian GitHub code has been finalized! If your goal is better model accuracy preservation, selecting a lower compression ratio is advisable. It can still lead to a ~87% reduction in model size and improve FPS from 192 to 244, than the vanilla 3D
#GaussianSplatting
.
Excited to present our new paper "LightGaussian": A Compact & Efficient Pipeline for Converting
#3D
#GaussianSplatting
into a more compact format!
✍️15x compression rate
✍️200+FPS
Project page 👉:
Paper 📷:
Four years ago at Alibaba Group, I was part of the team that developed one of the first large-scale view synthesis products, called Holographic World (全息世界). This product had a significant impact during COVID-19 by helping to prevent cross-infection during visits and tours.
Here is a great example from
@felixkit
, who created an interactive art gallery with 3DGS. You can walk through the exhibition and get information about the art. I think this sells better than images.
The links to the writeup and to the interactive exhibition are below ⬇️
Thrilled to announce that three of our papers have been accepted by
#CVPR2024
: Feauture 3DGS, Entropic Score Distillation, and Lift3D.
However, I'm even prouder that two of our papers on Efficient 3D Gaussian Splatting were rejected by CVPR2024:
#LightGaussian
,
#FSGS
.
The rendering traj is diverging🆘.
- THREE training views (by random) on a custom scene.
- Just three images, no assumption on poses, camera parameters.
- IPhone, 1428x1071 resolution
But controlling the rendering traj to be reasonable is really a little bit hard.
Speeding your view synthesis(<40s) with
#InstantSplat
!
Our large-scale, pose-free method trains in just 37 seconds from sparse views—no
#COLMAP
, no intrinsics needed.
Achieving nearly 30dB test PSNR with just 12 images, New standard in
#NVS
and new training efficiency.
Amazing slides for anyone wanting to learn GS from scratch. Thanks, Forrest! Special thanks for highlighting our
#LightGaussian
work, where visibility-based importance scoring to prune 'least-important' Gaussians is crucial for rendering efficiency.
Elevate your
#Sora
experience to another dimension! 🌟 Dive into a 3D world with our virtual art gallery, where panorama images transform into 360-degree 3D Gaussian Splatting wonders 🎨✨ Explore art like never before - in real-time!
#3DWorld
#ArtExploration
#InnovateWithUs
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework
Project:
Code:
Uses NeRFs to count any fruit type directly in 3D.
The code is a Nerfstudio extension!
Method ⬇️ 1 | 2
First day at Autonomous Vehicle Research
@NVIDIA
, in collaboration with
@yuewang314
and
@iamborisi
. Eager to deliver impactful advancements in efficient
#3D
modeling/generation. 🤗🤗
Exploring the capabilities of
#DepthAnything
– it’s impressive how it enhances drone-view videos (OOD I guess) by delivering reliable and accurate monocular depth values.
What else can this technology offer?
To consider the optimization "freedom": the better initialization -> the easier optimization -> the faster training speed -> the detailed rendering quality
Great progress in enabling end-to-end, differentiable 3D vision! It's a privilege to witness and contribute to these exciting developments in the field.
Introducing “FlowMap”, the first self-supervised, differentiable structure-from-motion method that is competitive with conventional SfM like Colmap!
IMO this solves a major missing piece for internet-scale training of 3D Deep Learning methods.
1/n
Fortunately, today we have advanced tools like NeRF and 3DGS that allow us to efficiently capture posed RGB-D inputs and convert them into Holographic World without the need for a long and tedious pipeline.
By leveraging large-scale pretrained models (DUSt3R, Monocular Depth,
InstantSplat
Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds
While novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This
@janusch_patas
@JeromeRevaud
Hi, thanks for posting! Our paper will be online very soon.
tl:dr: proper pre-processing DUSt3R + 3DGS -> 37 seconds for pose-free, intrinsics-free, sparse-view novel-view synthesis, with ~30dB PSNR.
Explore immersive Mars footage with the
#Perseverance
rover from
@NASA
.
We can recover the 3D from
#Perseverance
's stereo navigation cameras!
Detailed 3D reconstructions can be done in just ~20 seconds from scratch, without using the calibration from
@NASA_Technology
.
@xiaolonw
@OpenAI
Continuously inspired! Following the impressive in-the-wild test cases, we've experimented with InstantSplat (), training it on 12 views from the Sora video without pre-computing camera parameters from COLMAP.
Ah yes! Finally we’ve finished the Gaussian Splatting - Turn table capture workflow. No more walking around the object or get a huge mess of clouds when doing Gaussian Splats on our Marc scanner.
This is a huge deal for us because it will enable us to scan, process, reconstruct
AI patents number is important but is not everything... We see a lot "Breakthroughs" researches from European teams in recent years, including but not limited to "3D Gaussian Splatting", "DUSt3R"...
(1/5)
3D Gaussian Splatting + AI Foundation Models = ? 🤔
Feature 3DGS🪄, distills feature fields from 2D foundation models, opening the door to a brand new semantic, editable, and promptable explicit 3D scene representation.
AI for 2D, now in 3D! 🚀
🔗
Happy Lunar New Year! 🎉 May the year ahead bring prosperity, joy, and good news, just like this inspiring image from a WeChat moment. Wishing everyone a year filled with happiness and success.
#LunarNewYear
#GoodVibes
Check our paper published on
#3DV
that utilizes the
#DINO
ViT feature multiple stages classification for few-shot
#6DoF
object pose estimation:
✍️ Cas6D: Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images
(1/N) 🌐 Take your
#Sora
journey to new heights! 🚀 Unveil a universe where 3D panoramic
#GaussianSplatting
meet creativity - think "colossal, man-shaped cloud towering over the earth"! 🌩️ Dive into a real-time virtual realm and let your imagination soar! 🎨
#3DExperience
Discover the magic when
#Starship
meets
#AIGC
! 🚀 With our new text-to-3D tech, words craft
#3D
realms—envision Starship's
#Mars
landing soon.
@elonmusk
, how about integrating a real Starship model to the virtual world? 😄
@janusch_patas
Thanks for sharing🥳! We're thinking the issue of the optimal representation for large-scale scene generation.
The ideal representation should manage unbounded scenarios without content duplication—a common issue with outpainting due to its lack of global awareness.
(5/N).
Few-shot Gaussian Splatting () was led by me, ZehaoZhu and
@YifanJiang17
This method achieves the best sparse-view NVS performance, and it runs over 2000 times faster than NeRF-based methods.
@WayneINR
@AjdDavison
@LourdesAgapito
Here are the slides of my recent invited talk about CroCo + DUSt3R + MASt3R
Note: It's a pdf, i know the videos won't work but there's really nothing hitherto unseen. I can point to where to find each individual video upon request.
An example of how DUSt3R can do "impossible matching": given two images without any shared visual content (my office, obviously never seen at training), it can output an accurate reconstruction (no intrinsics, no poses!) in seconds
(1/N).
Feauture 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields, where it enables the distillation of 3D feature fields from any 2D foundation models, using only 18 training views for this complex real-world scene.
Website:
(7/N).
Ackowledgement:
Both REJECTED papers are open-sourced, and thank to the reviewers for the valuable suggestions, and thanks to Hugging Face to invite us to deploy models on their platforms.
We will continue to revise our drafts and address all the reviewers' comments.
No more Image Correspondence -> Sparse Reconstruction -> Dense Reconstruction. Just one feedforwad!
And it's crushing it with OOD data – just 12 drone pics and you've got your model. Mind blown! 🤯
(4/N).
The two rejected papers are even more interesting:
LightGaussian () was led by me,
@KevinWang_111
, and
@KairunWen
. Our method ensures rendering quality while achieving 15x model compression and 50% rendering efficiency, than
#GaussianSplatting
Seeking PMs' insights on the LPIPS curve's alignment with human visual perception. The figures illustrate how LightGaussian maintains rendering quality even when up to 60% of Gaussians are pruned. A breakthrough in
#GaussianSplatting
. See the visual comparison below.
The LightGaussian GitHub code has been finalized! If your goal is better model accuracy preservation, selecting a lower compression ratio is advisable. It can still lead to a ~87% reduction in model size and improve FPS from 192 to 244, than the vanilla 3D
#GaussianSplatting
.
(3/N).
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D, where we extend a single 2D vision operator into consistent 3D under a generalizable way.
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy
R2: While the results are impressive, this is a simple combination of diffusion transformer (ICCV 2023) and latent diffusion model (CVPR 2022). Limited novelty. Weak reject.
@JeromeRevaud
@janusch_patas
@PMel3D
That's exciting direction! We're also advancing towards making 3D modeling as seamless and efficient as its 2D counterpart, enabling end-to-end capabilities.
📢Announcing the first GenAI4Health Workshop at
#NeurIPS2024
where we invite speakers and participants from
#health
,
#AI_safety
, and
#AI_policy
areas to discuss the entangled challenges of potential, trust, and policy compliance of GenAI4Health!
🌍Homepage:
Just tried SplaTAM and it's really cool! Took a moment to get the hang of capturing my desk with the iPhone accurately - it definitely requires some skill. But, so does navigating Open3D :). Check out their demo to get a peek at what the future holds!
@kwea123
Hi, thanks for sharing your concerns.
Regarding the question, I partially (about 20%) agree with your opinion:
1. "The difficulty of comparison of the quality": I understand this point, as multiple teams are working independently, and there's no standard benchmark for
I see some vocal objections: "Sora is not learning physics, it's just manipulating pixels in 2D".
I respectfully disagree with this reductionist view. It's similar to saying "GPT-4 doesn't learn coding, it's just sampling strings". Well, what transformers do is just manipulating
@GKopanas
@PapantonakisP
Cool work!
Pruning, SH assignment/VQ are for both primitive redundancy and feature redundancy.
Our
#LightGaussian
shares very similar high-level idea with it.
@JonClark55
@LuLing26466911
We primarily focus on high-resolution, large-scale scenes. I believe adapting our approach to object-level datasets should not be difficult, as demonstrated by our results on the MVImgNet dataset in our paper.
Much of the current research focus is on improving the quality of nerf.
However, there is one important research question ignored by the current nerf development.
The question is can we apply steganography in nerf?🧐
- Three training views on
#DL3DV
-10K datasets
- Camera poses & intrinsics are unknown
- Rendering by interpolation
- Resolution: 1920x1080
I can tell, pseudo-views are needed to enhance the quality.