Machine Learning Researcher at
@Apple
ML Research (MLR) based in NYC | ex-FAIRer | PhD from HKU | Research on Generative AI for multimodalities. また日本語もできます。
🚀Excited to introduce KaleidoDiffusion --
a new method that improves conditional diffusion model generation by incorporating autoregressive latent priors! This allows us generate much more diverse outputs even with high CFG just like a kaleidoscope🔭!
(1/n)
Kaleido Diffusion
Improving Conditional Diffusion Models with Autoregressive Latent Modeling
Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in
📢 Introducing our latest research
@Apple
MLR for generating high-quality images & videos with a multi-resolution diffusion model -- Matryoshka Diffusion Models or MDM🪆, directly in pixel space (~1024px) without any VAEs or cascaded models. Code will be released soon! !(1/n)
Matryoshka Diffusion Models
paper page:
Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing
I am super excited that the code of our recent ICLR2022 paper, "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis", has been released! Please check
Paper:
Code:
Project page:
Life Update:
After four wonderful years at FAIR Labs, I've decided to move on to join Apple MLR led by Samy Bengio. I will continue working on representation learning and generative models for text, vision and multimodality. Feel free to reach out if you want to work together!
Happy New Year!! I am super excited to share our new pre-print “Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade”, joint work with
@XiangKong4
.
Please check out
(1/2)
Super excited to announce my first NeurIPS paper was accepted!
We propose Levenshtein Transformer that learns to insert and delete words iteratively for sequence generation and refinement tasks! Thanks for my reliable coauthors
@ChanghanWang
@JakeZzzzzzz
!
🪘🪘New pre-print!! I’m delighted to share our latest work
@Apple
MLR
“BOOT👢: Data-free Distillation of Denoising Diffusion Models with Bootstrapping.”
We explore a novel method that can distill your favorite diffusion models into ONE STEP without using training data!🔆 (1/6)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping
paper page:
Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative
We're releasing mBART, a new seq2seq multilingual pretraining system for machine translation across 25 languages. It gives significant improvements for document-level translation and low-resource languages. Read our paper to learn more:
📢 Introducing our latest research
@Apple
MLR for generating high-quality images & videos with a multi-resolution diffusion model -- Matryoshka Diffusion Models or MDM🪆, directly in pixel space (~1024px) without any VAEs or cascaded models. Code will be released soon! !(1/n)
I'll attend
#NeurIPS
in person next week, presenting our recent works:
PLANNER
Tue morning,
#1921
Diffusion without Attention
Fri all day, workshop on DM
I'm excited to see you soon and chat about multimodal & diffusion models!
I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.”
This is joint work with my Apple colleagues,
@zhaisf
@icaruszyz
@itsbautistam
@jsusskin
(1/6)
f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
abs:
project page:
propose f-DM, an end-to-end non-cascaded diffusion model that allows progressive signal transformations along diffusion
Excited to share that our work, "NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion", was accepted at
#ICML2023
Huge thanks to amazing coauthors
@alextrevith
@kaien_lin
@jsusskin
@LingjieLiu1
and Ravi Ramamoorthi. See you in Honolulu!
It is huge fun working with Sasha! Please check our recent work on exploring better architectures for diffusion models where we replace all the attention with linear RNNs, which achieves much better efficiency and No patchify is needed.
Thanks
@NathanYan2012
@srush_nlp
@Apple
As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck.
Excited to be in person for
#ICML2023
in Hawai'i🌴🌊! I'll be presenting two posters (Nerfdiff, σREPARAM) on Tuesday and giving a contributed talk (BOOT) at the on Friday. Please ping me if you want to chat about diffusion models, transformers, and 3D!!
Super excited to announce that our recent work "Cross-lingual Retrieval for Iterative Self-Supervised Training (CRISS)" has been accepted as *spotlight* presentation at NeurIPS2020!!
Congrats to all my amazing colleagues!
@mr_cheu
@xl_nlp
and Yuqing Tang at
@facebookai
Introducing our new work "Cross-lingual Retrieval for Iterative Self-Supervised Training" ()
Joint work with Yuqing Tang,
@xl_nlp
,
@thoma_gu
(
@facebookai
)
0/4
Just arrived in Vancouver for
#CVPR2023
from June 18-22. I'm thrilled about the first-time CVPR experience and eager to engage in chats on generative models, 3D and MLR! Please visit our poster on 3D-aware diffusion model at the 3DMV workshop () on June 19!
I will be attending
#ICCV2023
in person in Paris and presenting our poster on "Single-stage diffusion (SSD)-NeRF" on Wednesday 4th, 10:30 AM-12:30 PM! Looking forward to meeting people and talking about diffusion models and 3D!
Please check our recent
#ICCV2023
paper on SSD-NeRF () !! We proposed a unified view of 3D generation and reconstruction by learning a "single-stage" 3D diffusion model directly from 2D images.
Please check out our paper for more details!
📃 Paper:
Huge thanks to incredible collaborators
@Apple
MLR
@zhaisf
@YizheZhangNLP
@jsusskin
Navdeep Jaitly
for their amazing contributions!
📷 A special thanks to
@_akhaliq
for reposting our work! (6/n)
(1/3) Super excited to present our recent work: Neural Sparse Voxel Fields (NSVF): a hybrid neural scene representation for fast and high-quality free-viewpoint rendering.
Joint work with
@LingjieLiu1
(MPI), Zaw Lin (NUS), Tat-Seng Chua (NUS) and Christian Theobalt (MPI).
Super excited to announce that our recent work "Neural Sparse Voxel Fields (NSVF)"() has been accepted as *spotlight* presentation at NeurIPS2020!! Also the code and data have been released! Please checkout .
(1/3) Super excited to present our recent work: Neural Sparse Voxel Fields (NSVF): a hybrid neural scene representation for fast and high-quality free-viewpoint rendering.
Joint work with
@LingjieLiu1
(MPI), Zaw Lin (NUS), Tat-Seng Chua (NUS) and Christian Theobalt (MPI).
MDM is a single generative model that handles various high-resolution targets:
Images 🖼️
Text-to-Images 📜➡️🖼️
Text-to-Videos 📜➡️🎥
Distinct from existing works, MDM doesn't need a pre-trained VAE (e.g., SD) or training multiple upscaling modules (e.g., IMAGEN)(2/n)
Sharing our recent
#NeurIPS2023
paper on latent diffusion for text generation. PLANNER is a diffusion model in the latent space, connected with an autoregressive language decoder, which can generate more diverse and coherent texts.
🎉 Thrilled to announce that our Planner paper has been accepted at
#NeurIPS2023
! 📚 If you're searching for a latent text diffusion approach that creates diverse and coherent text, check out our research! 😄
Code will be released soon!
#TextGeneration
#Diffusion
#NLG
How? We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture. Just like a "Matryoshka doll", our nested UNet embeds lower resolutions UNets inside the higher ones.🪆
We can do the same for both images & videos. (3/n)
Please check out our recent work with
@seayong08
@kchonyc
and Victor! We found that vanilla zero-shot NMT usually fails due to spurious correlations in the data, and we proposed simple approaches to fix it! Accepted by ACL2019. Thanks for your attention!
Attending
#ECCV2022
Oct 23-27 at Tel Aviv, first-time in-person conference in the recent three years!! Happy to chat about research happening at
@Apple
MLR.
We are also looking for interns who are interested in generative models for text, images, videos, 3D, and multimodal!
With these improvements, MDM can train a solo pixel-space model at impressive resolutions (e.g., 1024x1024). To achieve these results, we only need a compact dataset like CC12M and a few days of training with just 3-4 nodes of 8 A100 GPUs. 🔥🚀 (5/n)
Please check our recent
#ICCV2023
paper on SSD-NeRF () !! We proposed a unified view of 3D generation and reconstruction by learning a "single-stage" 3D diffusion model directly from 2D images.
Our paper 'Single-Stage Diffusion NeRF' will be presented at
#ICCV2023
. We merge 3D diffusion with NeRF into a holistic model, providing priors for both 3D generation and reconstruction (from an arbitrary number of views). Check it out here:
#NeRF
#AI
On my way to Kigali
#ICLR2023
in-person for presenting our poster on diffusion model with signal transformations! It will be a long flight arriving in April 30. Looking forward to seeing friends and chatting more about generative models, 3D and opportunities
@Apple
MLR!
I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.”
This is joint work with my Apple colleagues,
@zhaisf
@icaruszyz
@itsbautistam
@jsusskin
(1/6)
Please check out this incredible introduction video about our recent effort on "diffusion models without attention"!! Thanks so much,
@srush_nlp
, for making this! Also, thanks,
@NathanYan2012
, for your hard work.
We should always stay curious and think outside the box!
New Video: RNNs for Diffusion? Short technical overview of "Diffusion Models without Attention" a recent paper on long-range models for image generation.
Our submission "Universal Neural Machine Translation for Extremely Low Resource Languages" has been accepted as the full paper oral presentation by NAACL-HLT 2018!! Please check out at This is a joint work with Hany and Jacob, congrats!
@xutan_tx
and I will virtually give a tutorial about "Non-autoregressive Sequence Generation" this Sunday, May 22 at 14:30-18:00 Irish Standard Time.
#ACL2022NLP
#NLProc
Please come and check it out
More details
Besides, MDM isn't just innovative in its structure; We also propose a progressive training schedule that smoothly transitions from lower to higher resolutions, optimizing high-res generation with noticeable improvement.💡 (4/n)
i was curious if that new "mamba" layer could be used for image generation (tldr: ya prob)
i hacked together a quick test following ideas from the diffusion transformer (DiT) and made "DiM".
after an hour or so on my 4090 it seems that it's learning the oxford flowers dataset.
Thanks for checking out our new (in-progress) results! Joint work with
@ChanghanWang
@JakeZzzzzzz
, we hope to have a simple but efficient way of unifying sequence generation and refinement, by learning both insertion and deletion operations!
This is really cool - a transformer network that uses insertions and deletions as its primary operations. Roughly same performance, but up to 5x more efficient!
Levenshtein Transformer
In this work, we propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images.
Joint work with
@LingjieLiu1
(MPII) Peng Wang (HKU) and Christian Theobalt (MPII)
We (
@melbayad
@MichaelAuli
@EXGRV
) have also worked on a very similar approach to adaptively change the depth of decoding for MT date back to ICLR2020
Current LLMs expend the same amount of computation on each token they generate.
But some predictions are much harder than others!
With CALM, the authors redirect computational resources to "hard" inferences for better perf (~50% speedup)
Here's how 👇
In this work, we combine approaches from 4 aspects including data, model, loss function and learning, and finally close the performance gap between fully non-autoregressive machine translation and Transformers while maintaining over 16x speed-up at inference time.
(2/2)
Thanks for checking out our new (in-progress) results! By doing insertion-based decoding, we can essentially generate a sequence in an arbitrary order!🤔 We can also make it learn to generate in a good order adaptively.😦🤭
@xutan_tx
and I will virtually give a tutorial about "Non-autoregressive Sequence Generation" this Sunday, May 22 at 14:30-18:00 Irish Standard Time.
#ACL2022NLP
#NLProc
Please come and check it out
More details
@dustinvtran
What are "inverse CDF-like" tricks? Completely no details or reference not even sure what is the reasoning here why Yann is wrong. The following 2 tweets are also nonsense
PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model
paper page:
propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over
Thanks for checking out our new (in-progress) results! By doing insertion-based decoding, we can essentially generate a sequence in an arbitrary order!🤔 We can also make it learn to generate in a good order adaptively.
New work! Humans appear to learn similarly for different modalities and so should machines! data2vec uses the same self-supervised algorithm to train models for vision, speech, and nlp.
Paper:
Blog:
Code:
(3/3) With the sparse voxel structure, our method is over 10 times faster than the state-of-the-art (NeRF) at inference time while achieving higher quality results.
Check out more at:
paper:
video:
Feel like a slippery slope argument. If there is a tool which is LLM can help us improve scientific writing, especially for non-English speakers, why should we ban it? What is the difference between LLM and a dictionary? It is the author's responsibility to check the fact.
With LLMs for science out there (
#Galactica
) we need new ethics rules for scientific publication. Existing rules regarding plagiarism, fraud, and authorship need to be rethought for LLMs to safeguard public trust in science. Long thread about trust, peer review, & LLMs. (1/23)
@emiel_hoogeboom
@JonathanHeek
@TimSalimans
In our recent ICLR paper, we also proposed a very similar noise schedule adjustment for high resolution and varying resolution diffusion. Hope you maybe interested
I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.”
This is joint work with my Apple colleagues,
@zhaisf
@icaruszyz
@itsbautistam
@jsusskin
(1/6)
@baaadas
hmm I kind of disagree… even for the RS title, the training of during a PhD is very useful and helpful from many aspects… for example you will have more freedom to choose topics, to make mistakes and gain the problem solving skills without worrying being fired?
If you're interested in neural scene representations and neural rendering, feel free to join us in 40mins on the Q&A live session of our NeurIPS Spotlight paper: Neural Sparse Voxel Fields:
Q&A session at Dec 8th, 2020 @ 17:30 CET (8:30 AM PST)
Fully NAT significantly reduces the inference latency with quality drop compared to AT. Can we close the performance gap while maintaining the latency advantage? Please checkout our ACL-finding paper: Fully Non-autoregressive Neural Machine Translation:
Tricks of the Trade.
There is one more thing... Together with the tutorial slides, we have finally open-sourced the code of our last year's ACL2021 paper on fully non-autoregressive translation (NAT). Joint work with
@XiangKong4
Paper:
Code:
Please check out our paper for more details!
📃 Paper Link:
Huge thanks to my incredible collaborators
@zhaisf
@YizheZhangNLP
@LingjieLiu1
@jsusskin
for their amazing contributions! 👏
A special thanks to
@_akhaliq
for tweeting about our research! (6/6)
(2/3) NSVF defines a set of voxel-bounded implicit fields organized in sparse voxels. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images.
f-DM can produce high-quality samples on standard image generation benchmarks. Furthermore, we can readily manipulate the learned latent space and perform
conditional generation tasks (e.g., super-resolution) without additional training. (5/6)
We propose a generalized family of DMs, which is end-to-end non-cascaded, and allows progressive signal transformations along diffusion, including downsampling, blurring, and VAEs.
An interpolation-based formulation is used to smoothly bridge consecutive transformations. (3/6)
@PreetumNakkiran
I feel very upset about people posting meme like this around… I don’t think research opinions should have this correlation with IQ scores or showing off anything
To tackle the modeling challenges, we also identify the importance of adjusting the noise levels whenever the signal is sub-sampled. A resolution-agnostic SNR is proposed as a practical guide. (4/6)
@unixpickle
@Apple
Unfortunately, the naive version should still be slower than LDM, as it has to go through the high-res images anyway... But we can easily combine methods like our previous work () to progressively grow the resolution during inference and reduce the gap.
Despite the empirical success, diffusion models (DMs) are restricted to denoising in the ambient space. On the other hand, common generative models like VAEs employ a coarse-to-fine generation process. In this work, we are interested in combining the best of the two worlds. (2/6)
@zngu
Not sure what you wanted to say. Whenever people are reporting the speed-up, we should always state the baseline model we are comparing with. In your words, any neural system might be potentially slower than SMT then.
@YiTayML
@MIT_CSAIL
@Saboo_Shubham_
They just came from the same time and BART did have a simpler formulation as an encoder-decoder model. I personally feel very annoying by the way you talk about things. Thanks
@zngu
@odashi_t
@raphaelshu
@kchonyc
@jlibovicky
@jasonleeinf
Also, in my view, non-autoregressive approaches may or may not be useful in the end, as it has both potentials and limitations. I think it is still a developing area. I am not sure we should limit ourselves by asking all papers to compare with the highly optimized system so far.
It will introduce some classical methods of non-autoregressive generation for machine translation and its recent applications on various tasks, including GEC, ASR, TTS, and image generation!