Jiatao Gu @thoma_gu Twitter profile | Pikagi

Pikagi

Jiatao Gu

@thoma_gu

3,627

Followers

1,851

Following

52

Media

405

Statuses

Machine Learning Researcher at @Apple ML Research (MLR) based in NYC | ex-FAIRer | PhD from HKU | Research on Generative AI for multimodalities. また日本語もできます。

New York, USA

https://t.co/LH3SdQngVd

Joined October 2012

Don't wanna be here? Send us removal request.

Pinned Tweet

@thoma_gu

Jiatao Gu

4 months

🚀Excited to introduce KaleidoDiffusion -- a new method that improves conditional diffusion model generation by incorporating autoregressive latent priors! This allows us generate much more diverse outputs even with high CFG just like a kaleidoscope🔭! (1/n)

Tweet media one

@_akhaliq

AK

4 months

Kaleido Diffusion Improving Conditional Diffusion Models with Autoregressive Latent Modeling Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in

Tweet media one

2

70

325

3

32

168

Last Seen Profiles

@BinorRaja

@aoi480066544312

@BinorRaja

@eudeleg_rome

@DJDALTON23

@bokeplokalmalam

@bokeplokalmalam

@ahmedmona111111

@pshlii

@stw_pdg

@udin12344566

@stw_pdg

@wihops

@Dr_MSU

@eliakhynn

@VioletaVicci

@omarapollo

@metalnekomaru

@VmaniakJ

@coli_jilbab

@bokeplokalmalam

@summer_love311

@DingleDorfDraws

@PasutriAsli2

@ayunochan_

@9com5

@BucNasty20

@LouisvilleSB

@bokeplokalmalam

@bokeplokalmalam

@bokeplokalmalam

@Khasmlv

@stw_pdg

@cbb22

@Hijabbacol2883

@StalkingVi96051

@thoma_gu

Jiatao Gu

11 months

📢 Introducing our latest research @Apple MLR for generating high-quality images & videos with a multi-resolution diffusion model -- Matryoshka Diffusion Models or MDM🪆, directly in pixel space (~1024px) without any VAEs or cascaded models. Code will be released soon! !(1/n)

Tweet media one

@_akhaliq

AK

11 months

Matryoshka Diffusion Models paper page: Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing

Tweet media one

13

90

521

17

156

803

@thoma_gu

Jiatao Gu

3 years

I am super excited that the code of our recent ICLR2022 paper, "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis", has been released! Please check Paper: Code: Project page:

3

107

651

@thoma_gu

Jiatao Gu

2 years

Life Update: After four wonderful years at FAIR Labs, I've decided to move on to join Apple MLR led by Samy Bengio. I will continue working on representation learning and generative models for text, vision and multimodality. Feel free to reach out if you want to work together!

8

6

399

@thoma_gu

Jiatao Gu

4 years

Happy New Year!! I am super excited to share our new pre-print “Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade”, joint work with @XiangKong4 . Please check out (1/2)

Tweet media one

3

37

228

@thoma_gu

Jiatao Gu

5 years

Super excited to announce my first NeurIPS paper was accepted! We propose Levenshtein Transformer that learns to insert and delete words iteratively for sequence generation and refinement tasks! Thanks for my reliable coauthors @ChanghanWang @JakeZzzzzzz !

Tweet media one

4

47

222

@thoma_gu

Jiatao Gu

1 year

🪘🪘New pre-print!! I’m delighted to share our latest work @Apple MLR “BOOT👢: Data-free Distillation of Denoising Diffusion Models with Bootstrapping.” We explore a novel method that can distill your favorite diffusion models into ONE STEP without using training data!🔆 (1/6)

@_akhaliq

AK

1 year

BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping paper page: Diffusion models have demonstrated excellent potential for generating diverse images. However, their performance often suffers from slow generation due to iterative

6

42

163

1

47

185

@thoma_gu

Jiatao Gu

5 years

[1/7] Super excited to present our recent work -- mBART. We demonstrate multilingual denoising pre-training produces significant gains across a variety of machine translation tasks! Joint work with @YinhanL @NamanGoyal21 @xl_nlp @edunov @gh_marjan @ml_perception @LukeZettlemoyer

@AIatMeta

AI at Meta

5 years

We're releasing mBART, a new seq2seq multilingual pretraining system for machine translation across 25 languages. It gives significant improvements for document-level translation and low-resource languages. Read our paper to learn more:

Tweet media one

13

356

977

2

43

178

@thoma_gu

Jiatao Gu

9 months

Thrilled to announce that our "Matryoshka Diffusion Models" paper got accepted at #ICLR2024 ! Huge thanks to the amazing Apple MLR colleagues @zhaisf @YizheZhangNLP @jsusskin Navdeep Jaitly for their efforts. See you in Vienna! 🚀 #MachineLearning

@thoma_gu

Jiatao Gu

11 months

📢 Introducing our latest research @Apple MLR for generating high-quality images & videos with a multi-resolution diffusion model -- Matryoshka Diffusion Models or MDM🪆, directly in pixel space (~1024px) without any VAEs or cascaded models. Code will be released soon! !(1/n)

Tweet media one

17

156

803

5

16

148

@thoma_gu

Jiatao Gu

10 months

I'll attend #NeurIPS in person next week, presenting our recent works: PLANNER Tue morning, #1921 Diffusion without Attention Fri all day, workshop on DM I'm excited to see you soon and chat about multimodal & diffusion models!

Tweet card media

Diffusion Models Without Attention

In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player. However, their application at high resolutions presents...

3

11

121

@thoma_gu

Jiatao Gu

2 years

I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.” This is joint work with my Apple colleagues, @zhaisf @icaruszyz @itsbautistam @jsusskin (1/6)

Tweet media one

@_akhaliq

AK

2 years

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation abs: project page: propose f-DM, an end-to-end non-cascaded diffusion model that allows progressive signal transformations along diffusion

Tweet media one

0

9

65

2

19

112

@thoma_gu

Jiatao Gu

1 year

Excited to share that our work, "NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion", was accepted at #ICML2023 Huge thanks to amazing coauthors @alextrevith @kaien_lin @jsusskin @LingjieLiu1 and Ravi Ramamoorthi. See you in Honolulu!

@_akhaliq

AK

2 years

NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion abs: project page:

3

38

258

2

17

102

@thoma_gu

Jiatao Gu

10 months

It is huge fun working with Sasha! Please check our recent work on exploring better architectures for diffusion models where we replace all the attention with linear RNNs, which achieves much better efficiency and No patchify is needed. Thanks @NathanYan2012 @srush_nlp @Apple

@srush_nlp

Sasha Rush

10 months

As with LMs, modern Diffusion models rely heavily on Attention. This improves quality but requires patching to scale. Working with Apple, we designed a model without attention that matches top imagenet accuracy and removes this resolution bottleneck.

Tweet media one

10

112

667

1

9

90

@thoma_gu

Jiatao Gu

1 year

Excited to be in person for #ICML2023 in Hawai'i🌴🌊! I'll be presenting two posters (Nerfdiff, σREPARAM) on Tuesday and giving a contributed talk (BOOT) at the on Friday. Please ping me if you want to chat about diffusion models, transformers, and 3D!!

Tweet media one

Tweet media two

Tweet media three

Tweet media four

2

14

76

@thoma_gu

Jiatao Gu

1 year

A bit over one year at Apple... the commit pattern clearly shows when a deadline is😂

Tweet media one

4

1

69

@thoma_gu

Jiatao Gu

4 years

Super excited to announce that our recent work "Cross-lingual Retrieval for Iterative Self-Supervised Training (CRISS)" has been accepted as *spotlight* presentation at NeurIPS2020!! Congrats to all my amazing colleagues! @mr_cheu @xl_nlp and Yuqing Tang at @facebookai

@mr_cheu

Chau Tran

4 years

Introducing our new work "Cross-lingual Retrieval for Iterative Self-Supervised Training" () Joint work with Yuqing Tang, @xl_nlp , @thoma_gu ( @facebookai ) 0/4

Tweet media one

1

3

15

2

11

72

@thoma_gu

Jiatao Gu

1 year

Just arrived in Vancouver for #CVPR2023 from June 18-22. I'm thrilled about the first-time CVPR experience and eager to engage in chats on generative models, 3D and MLR! Please visit our poster on 3D-aware diffusion model at the 3DMV workshop () on June 19!

Tweet media one

Tweet media two

Tweet media three

Tweet media four

2

3

67

@thoma_gu

Jiatao Gu

1 year

I will be attending #ICCV2023 in person in Paris and presenting our poster on "Single-stage diffusion (SSD)-NeRF" on Wednesday 4th, 10:30 AM-12:30 PM! Looking forward to meeting people and talking about diffusion models and 3D!

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous…

machinelearning.apple.com

@thoma_gu

Jiatao Gu

1 year

Please check our recent #ICCV2023 paper on SSD-NeRF () !! We proposed a unified view of 3D generation and reconstruction by learning a "single-stage" 3D diffusion model directly from 2D images.

0

5

38

0

3

63

@thoma_gu

Jiatao Gu

11 months

Please check out our paper for more details! 📃 Paper: Huge thanks to incredible collaborators @Apple MLR @zhaisf @YizheZhangNLP @jsusskin Navdeep Jaitly for their amazing contributions! 📷 A special thanks to @_akhaliq for reposting our work! (6/n)

Tweet media one

1

8

60

@thoma_gu

Jiatao Gu

11 months

Additional video results (7/n, n=7)

1

6

58

@thoma_gu

Jiatao Gu

7 years

Our submission to ICLR "Non-Autoregressive Neural Machine Translation" got accepted!! Congrats @jekbradbury @RichardSocher @CaimingXiong See you in Vancouver!!

1

5

56

@thoma_gu

Jiatao Gu

4 years

(1/3) Super excited to present our recent work: Neural Sparse Voxel Fields (NSVF): a hybrid neural scene representation for fast and high-quality free-viewpoint rendering. Joint work with @LingjieLiu1 (MPI), Zaw Lin (NUS), Tat-Seng Chua (NUS) and Christian Theobalt (MPI).

Tweet media one

2

9

53

@thoma_gu

Jiatao Gu

4 years

Super excited to announce that our recent work "Neural Sparse Voxel Fields (NSVF)"() has been accepted as *spotlight* presentation at NeurIPS2020!! Also the code and data have been released! Please checkout .

Tweet card media

GitHub - facebookresearch/NSVF: Open source code for the paper of Neural Sparse Voxel Fields.

Open source code for the paper of Neural Sparse Voxel Fields. - facebookresearch/NSVF

@thoma_gu

Jiatao Gu

4 years

(1/3) Super excited to present our recent work: Neural Sparse Voxel Fields (NSVF): a hybrid neural scene representation for fast and high-quality free-viewpoint rendering. Joint work with @LingjieLiu1 (MPI), Zaw Lin (NUS), Tat-Seng Chua (NUS) and Christian Theobalt (MPI).

Tweet media one

2

9

53

0

7

48

@thoma_gu

Jiatao Gu

11 months

MDM is a single generative model that handles various high-resolution targets: Images 🖼️ Text-to-Images 📜➡️🖼️ Text-to-Videos 📜➡️🎥 Distinct from existing works, MDM doesn't need a pre-trained VAE (e.g., SD) or training multiple upscaling modules (e.g., IMAGEN)(2/n)

Tweet media one

Tweet media two

Tweet media three

2

7

45

@thoma_gu

Jiatao Gu

1 year

Sharing our recent #NeurIPS2023 paper on latent diffusion for text generation. PLANNER is a diffusion model in the latent space, connected with an autoregressive language decoder, which can generate more diverse and coherent texts.

@YizheZhangNLP

Yizhe Zhang

1 year

🎉 Thrilled to announce that our Planner paper has been accepted at #NeurIPS2023 ! 📚 If you're searching for a latent text diffusion approach that creates diverse and coherent text, check out our research! 😄 Code will be released soon! #TextGeneration #Diffusion #NLG

0

2

15

1

3

43

@thoma_gu

Jiatao Gu

2 years

Our paper "data2vec" has been accepted as long presentation at ICML2022! Please check it out!!

@MichaelAuli

Michael Auli

2 years

data2vec is a long talk at this year's @icmlconf . Congratulations to @ZloiAlexei @mhnt1580 @QiantongX @arunbabu1234 @thoma_gu ! Updated paper: with new ablations showing that contextualized target representations work very well.

4

8

49

0

2

44

@thoma_gu

Jiatao Gu

11 months

How? We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture. Just like a "Matryoshka doll", our nested UNet embeds lower resolutions UNets inside the higher ones.🪆 We can do the same for both images & videos. (3/n)

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

4

40

@thoma_gu

Jiatao Gu

5 years

Please check out our recent work with @seayong08 @kchonyc and Victor! We found that vanilla zero-shot NMT usually fails due to spurious correlations in the data, and we proposed simple approaches to fix it! Accepted by ACL2019. Thanks for your attention!

@arxiv_cscl

arXiv CS-CL

5 years

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

0

0

1

2

7

40

@thoma_gu

Jiatao Gu

2 years

Attending #ECCV2022 Oct 23-27 at Tel Aviv, first-time in-person conference in the recent three years!! Happy to chat about research happening at @Apple MLR. We are also looking for interns who are interested in generative models for text, images, videos, 3D, and multimodal!

1

0

40

@thoma_gu

Jiatao Gu

11 months

With these improvements, MDM can train a solo pixel-space model at impressive resolutions (e.g., 1024x1024). To achieve these results, we only need a compact dataset like CC12M and a few days of training with just 3-4 nodes of 8 A100 GPUs. 🔥🚀 (5/n)

Tweet media one

Tweet media two

Tweet media three

1

3

39

@thoma_gu

Jiatao Gu

1 year

Please check our recent #ICCV2023 paper on SSD-NeRF () !! We proposed a unified view of 3D generation and reconstruction by learning a "single-stage" 3D diffusion model directly from 2D images.

Tweet card media

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation...

3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model...

@haosu_twitr

Hao Su

1 year

Our paper 'Single-Stage Diffusion NeRF' will be presented at #ICCV2023 . We merge 3D diffusion with NeRF into a holistic model, providing priors for both 3D generation and reconstruction (from an arbitrary number of views). Check it out here: #NeRF #AI

0

24

115

0

5

38

@thoma_gu

Jiatao Gu

5 years

Recent advances in low-resource machine translation

Recent advances in low-resource machine translation

Recently, Facebook AI has advanced state-of-the-art results in key language understanding tasks and also launched a new benchmark to push AI systems further

0

8

34

@thoma_gu

Jiatao Gu

1 year

On my way to Kigali #ICLR2023 in-person for presenting our poster on diffusion model with signal transformations! It will be a long flight arriving in April 30. Looking forward to seeing friends and chatting more about generative models, 3D and opportunities @Apple MLR!

@thoma_gu

Jiatao Gu

2 years

I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.” This is joint work with my Apple colleagues, @zhaisf @icaruszyz @itsbautistam @jsusskin (1/6)

Tweet media one

2

19

112

1

8

39

@thoma_gu

Jiatao Gu

9 months

Please check out this incredible introduction video about our recent effort on "diffusion models without attention"!! Thanks so much, @srush_nlp , for making this! Also, thanks, @NathanYan2012 , for your hard work. We should always stay curious and think outside the box!

@srush_nlp

Sasha Rush

9 months

New Video: RNNs for Diffusion? Short technical overview of "Diffusion Models without Attention" a recent paper on long-range models for image generation.

5

41

237

0

1

36

@thoma_gu

Jiatao Gu

2 years

For more details and results, please check the following: Paper: Project page: Thanks! (6/6)

0

5

31

@thoma_gu

Jiatao Gu

7 years

Our submission "Universal Neural Machine Translation for Extremely Low Resource Languages" has been accepted as the full paper oral presentation by NAACL-HLT 2018!! Please check out at This is a joint work with Hany and Jacob, congrats!

Universal Neural Machine Translation for Extremely Low Resource Languages - Microsoft Research

UniNMT In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach...

www.microsoft.com

1

7

30

@thoma_gu

Jiatao Gu

9 months

Happy New Year!! Looking forward to 2024!

0

0

28

@thoma_gu

Jiatao Gu

2 years

@xutan_tx and I will virtually give a tutorial about "Non-autoregressive Sequence Generation" this Sunday, May 22 at 14:30-18:00 Irish Standard Time. #ACL2022NLP #NLProc Please come and check it out More details

Tweet card media

GitHub - NAR-tutorial/acl2022

Contribute to NAR-tutorial/acl2022 development by creating an account on GitHub.

1

6

25

@thoma_gu

Jiatao Gu

11 months

Besides, MDM isn't just innovative in its structure; We also propose a progressive training schedule that smoothly transitions from lower to higher resolutions, optimizing high-res generation with noticeable improvement.💡 (4/n)

Tweet media one

Tweet media two

1

2

25

@thoma_gu

Jiatao Gu

10 months

Please check our recent paper on DiffuSSM @NathanYan2012

@nousr_

no usr

10 months

i was curious if that new "mamba" layer could be used for image generation (tldr: ya prob) i hacked together a quick test following ideas from the diffusion transformer (DiT) and made "DiM". after an hour or so on my 4090 it seems that it's learning the oxford flowers dataset.

Tweet media one

20

40

334

1

5

25

@thoma_gu

Jiatao Gu

5 years

Thanks for checking out our new (in-progress) results! Joint work with @ChanghanWang @JakeZzzzzzz , we hope to have a simple but efficient way of unifying sequence generation and refinement, by learning both insertion and deletion operations!

@evolvingstuff

evolvingstuff

5 years

This is really cool - a transformer network that uses insertions and deletions as its primary operations. Roughly same performance, but up to 5x more efficient! Levenshtein Transformer

Tweet media one

1

51

175

1

6

22

@thoma_gu

Jiatao Gu

3 years

In this work, we propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Joint work with @LingjieLiu1 (MPII) Peng Wang (HKU) and Christian Theobalt (MPII)

Tweet media one

Tweet media two

Tweet media three

Tweet media four

1

4

23

@thoma_gu

Jiatao Gu

2 years

We ( @melbayad @MichaelAuli @EXGRV ) have also worked on a very similar approach to adaptively change the depth of decoding for MT date back to ICLR2020

Tweet card media

Depth-Adaptive Transformer

Sequence model that dynamically adjusts the amount of computation for each input.

@mathemagic1an

Jay Hack

2 years

Current LLMs expend the same amount of computation on each token they generate. But some predictions are much harder than others! With CALM, the authors redirect computational resources to "hard" inferences for better perf (~50% speedup) Here's how 👇

12

118

1K

0

2

21

@thoma_gu

Jiatao Gu

4 years

In this work, we combine approaches from 4 aspects including data, model, loss function and learning, and finally close the performance gap between fully non-autoregressive machine translation and Transformers while maintaining over 16x speed-up at inference time. (2/2)

Tweet media one

0

1

19

@thoma_gu

Jiatao Gu

5 years

Hi! We are at #NeurIPS2019 this week, and come to visit our poster for Levenshtein Transformer() today at 5pm! @ChanghanWang @JakeZzzzzzz

1

3

19

@thoma_gu

Jiatao Gu

6 years

Finally had my Ph.D. oral presentation! Looking forward to the next journey!

Tweet media one

0

0

18

@thoma_gu

Jiatao Gu

6 years

Check out our updated and extended version with clearer formulations and more experimental results!

@arxiv_cscl

arXiv CS-CL

6 years

Insertion-based Decoding with automatically Inferred Generation Order

0

1

5

0

3

19

@thoma_gu

Jiatao Gu

6 years

Thanks for checking out our new (in-progress) results! By doing insertion-based decoding, we can essentially generate a sequence in an arbitrary order!🤔 We can also make it learn to generate in a good order adaptively.😦🤭

0

1

17

@thoma_gu

Jiatao Gu

2 years

Thank you very much for coming to our tutorial! The slides for today's talk have been released on our webpage at Thanks again!

@thoma_gu

Jiatao Gu

2 years

@xutan_tx and I will virtually give a tutorial about "Non-autoregressive Sequence Generation" this Sunday, May 22 at 14:30-18:00 Irish Standard Time. #ACL2022NLP #NLProc Please come and check it out More details

1

6

25

1

2

18

@thoma_gu

Jiatao Gu

5 years

@kchonyc @ChanghanWang @stanfordnlp It turns out that 1130/2001 test pairs are exactly the same from the training set..

0

2

15

@thoma_gu

Jiatao Gu

3 years

0

2

13

@thoma_gu

Jiatao Gu

2 years

@dustinvtran What are "inverse CDF-like" tricks? Completely no details or reference not even sure what is the reasoning here why Yann is wrong. The following 2 tweets are also nonsense

0

0

13

@thoma_gu

Jiatao Gu

1 year

Please check our recent work on latent diffusion for text generation led by my amazing colleague @YizheZhangNLP at Apple MLR!

@_akhaliq

AK

1 year

PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model paper page: propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over

Tweet media one

1

17

109

0

0

12

@thoma_gu

Jiatao Gu

6 years

Thanks for checking out our new (in-progress) results! By doing insertion-based decoding, we can essentially generate a sequence in an arbitrary order!🤔 We can also make it learn to generate in a good order adaptively.

@OfirPress

Ofir Press

6 years

New translation model (by @thoma_gu et al.) that does not generate the target sequence from left to right. Cool!

Tweet media one

0

17

64

0

3

12

@thoma_gu

Jiatao Gu

3 years

Check out our recent work on "unified" self-supervised learning for vision, speech and nlp!

@MichaelAuli

Michael Auli

3 years

New work! Humans appear to learn similarly for different modalities and so should machines! data2vec uses the same self-supervised algorithm to train models for vision, speech, and nlp. Paper: Blog: Code:

10

115

443

0

1

11

@thoma_gu

Jiatao Gu

7 years

Check our new paper on Fully-Parallel Text Generation for Neural Machine Translation!!

0

2

11

@thoma_gu

Jiatao Gu

4 years

(3/3) With the sparse voxel structure, our method is over 10 times faster than the state-of-the-art (NeRF) at inference time while achieving higher quality results. Check out more at: paper: video:

Tweet media one

Tweet media two

Tweet media three

1

1

11

@thoma_gu

Jiatao Gu

2 years

Feel like a slippery slope argument. If there is a tool which is LLM can help us improve scientific writing, especially for non-English speakers, why should we ban it? What is the difference between LLM and a dictionary? It is the author's responsibility to check the fact.

@Michael_J_Black

Michael Black

@Michael_J_Black

2 years

With LLMs for science out there ( #Galactica ) we need new ethics rules for scientific publication. Existing rules regarding plagiarism, fraud, and authorship need to be rethought for LLMs to safeguard public trust in science. Long thread about trust, peer review, & LLMs. (1/23)

30

130

507

1

1

10

@thoma_gu

Jiatao Gu

2 years

@emiel_hoogeboom @JonathanHeek @TimSalimans In our recent ICLR paper, we also proposed a very similar noise schedule adjustment for high resolution and varying resolution diffusion. Hope you maybe interested

@thoma_gu

Jiatao Gu

2 years

I'm happy to share our latest work, “f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation.” This is joint work with my Apple colleagues, @zhaisf @icaruszyz @itsbautistam @jsusskin (1/6)

Tweet media one

2

19

112

1

0

9

@thoma_gu

Jiatao Gu

11 months

@baaadas hmm I kind of disagree… even for the RS title, the training of during a PhD is very useful and helpful from many aspects… for example you will have more freedom to choose topics, to make mistakes and gain the problem solving skills without worrying being fired?

0

0

9

@thoma_gu

Jiatao Gu

4 years

“以地事秦，犹抱薪救火，薪不尽，火不灭”

0

0

9

@thoma_gu

Jiatao Gu

4 years

Please come to check our presentation and Q&A!

@LingjieLiu1

Lingjie Liu

4 years

If you're interested in neural scene representations and neural rendering, feel free to join us in 40mins on the Q&A live session of our NeurIPS Spotlight paper: Neural Sparse Voxel Fields: Q&A session at Dec 8th, 2020 @ 17:30 CET (8:30 AM PST)

0

2

14

0

0

9

@thoma_gu

Jiatao Gu

3 years

Please check the pre-recorded video for our ACL finding paper on fully non-autoregressive machine translation!

@XiangKong4

Xiang Kong

3 years

Fully NAT significantly reduces the inference latency with quality drop compared to AT. Can we close the performance gap while maintaining the latency advantage? Please checkout our ACL-finding paper: Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade.

1

4

10

0

2

8

@thoma_gu

Jiatao Gu

2 years

There is one more thing... Together with the tutorial slides, we have finally open-sourced the code of our last year's ACL2021 paper on fully non-autoregressive translation (NAT). Joint work with @XiangKong4 Paper: Code:

Tweet media one

1

1

8

@thoma_gu

Jiatao Gu

1 year

Please check out our paper for more details! 📃 Paper Link: Huge thanks to my incredible collaborators @zhaisf @YizheZhangNLP @LingjieLiu1 @jsusskin for their amazing contributions! 👏 A special thanks to @_akhaliq for tweeting about our research! (6/6)

1

0

7

@thoma_gu

Jiatao Gu

4 years

(2/3) NSVF defines a set of voxel-bounded implicit fields organized in sparse voxels. We progressively learn the underlying voxel structures with a diffentiable ray-marching operation from only a set of posed RGB images.

Tweet media one

Tweet media two

1

1

7

@thoma_gu

Jiatao Gu

11 months

@jbhuang0604 Can you give an example for the right side?

5

0

6

@thoma_gu

Jiatao Gu

1 year

Time to consider arxiv ban like *CL community?

@wjscheirer

Walter Scheirer

1 year

Motion 3: Repeal of the CVPR Social Media Ban Yes: 626 No: 418

2

19

86

1

0

7

@thoma_gu

Jiatao Gu

2 years

@kchonyc Thank you for your very kind efforts! I eventually finished mine…

1

0

6

@thoma_gu

Jiatao Gu

2 years

f-DM can produce high-quality samples on standard image generation benchmarks. Furthermore, we can readily manipulate the learned latent space and perform conditional generation tasks (e.g., super-resolution) without additional training. (5/6)

Tweet media one

Tweet media two

1

0

6

@thoma_gu

Jiatao Gu

1 year

@kwea123 Learning NeRF may have different reasons. Don’t teach people what you should do.

1

0

5

@thoma_gu

Jiatao Gu

10 months

@violet_zct Let’s catch up

0

0

5

@thoma_gu

Jiatao Gu

10 months

PLANNER: we propose a novel latent diffusion model for text generation, joint work @YizheZhangNLP @zhaisf @jsusskin Navdeep Jaitly Diffusion Models without Attention: we combine diffusion models with SSM, joint work @NathanYan2012 @srush_nlp

0

0

5

@thoma_gu

Jiatao Gu

2 years

We propose a generalized family of DMs, which is end-to-end non-cascaded, and allows progressive signal transformations along diffusion, including downsampling, blurring, and VAEs. An interpolation-based formulation is used to smoothly bridge consecutive transformations. (3/6)

Tweet media one

Tweet media two

1

0

5

@thoma_gu

Jiatao Gu

7 years

It's finally out!! Looks so cool!! Our non-autoregressive NMT also used distillation to make it possible! Hope to get more ideas from it.

@OriolVinyalsML

Oriol Vinyals

@OriolVinyalsML

7 years

Almost 3000x speedup with Parallel WaveNet (Google Speech production model). Brought to you by @avdnoord et al. Drastic! More details & link to paper:

0

114

366

1

0

5

@thoma_gu

Jiatao Gu

11 months

@PreetumNakkiran I feel very upset about people posting meme like this around… I don’t think research opinions should have this correlation with IQ scores or showing off anything

1

0

5

@thoma_gu

Jiatao Gu

4 years

Also our poster session at Gather Town A3-Spot A2 12:00 – 14:00 EST

@thoma_gu

Jiatao Gu

4 years

Please come to check our presentation and Q&A!

0

0

9

0

1

5

@thoma_gu

Jiatao Gu

11 months

@baaadas How do you know if you don’t want to be a professor

1

0

5

@thoma_gu

Jiatao Gu

2 years

@OfirPress @jeremy_r_cole @adihaviv @ori__ram @peter_izsak @omerlevy_ Isn't it very natural that for decoder-only model? The masked attention of autoregressive nature must learn positions

1

0

4

@thoma_gu

Jiatao Gu

2 years

To tackle the modeling challenges, we also identify the importance of adjusting the noise levels whenever the signal is sub-sampled. A resolution-agnostic SNR is proposed as a practical guide. (4/6)

Tweet media one

1

0

4

@thoma_gu

Jiatao Gu

4 years

0

0

4

@thoma_gu

Jiatao Gu

1 year

Finally arrived in hotel… can get some rest for tomorrow’s conference…

0

0

4

@thoma_gu

Jiatao Gu

9 months

@alextrevith Congrats to your great work!

1

0

3

@thoma_gu

Jiatao Gu

2 years

Unbelievable we can still see the ending!!

@berserk_project

ベルセルク公式

@berserk_project

2 years

『ベルセルク』が6月24日発売のヤングアニマル13号から連載再開いたします。連載再開に際し、ヤングアニマル編集部並びに森恒二先生からのメッセージを掲載いたします。引き続き『ベルセルク』をご愛読いただけるよう何卒よろしくお願い申し上げます。 #BERSERK #ベルセルク

Tweet media one

Tweet media two

1K

82K

170K

0

0

4

@thoma_gu

Jiatao Gu

11 months

@unixpickle @Apple Unfortunately, the naive version should still be slower than LDM, as it has to go through the high-res images anyway... But we can easily combine methods like our previous work () to progressively grow the resolution during inference and reduce the gap.

1

1

4

@thoma_gu

Jiatao Gu

5 years

@kchonyc totally agreed!! And maybe it is still not too late to catch up again?🤪 We also have more knowledge to work on this better!

1

0

4

@thoma_gu

Jiatao Gu

4 years

Wow... where the dream began...

@kchonyc

Kyunghyun Cho

4 years

0

0

1

0

0

4

@thoma_gu

Jiatao Gu

2 years

@OfirPress @jeremy_r_cole @adihaviv @ori__ram @peter_izsak @omerlevy_ Just like you also don't need "positional embeddings" for LSTM-based LM or even simpler NNLM?

1

0

4

@thoma_gu

Jiatao Gu

7 years

Countdowns to top CV/NLP/ML/Robotics/AI conference deadlines #machinelearning

0

0

3

@thoma_gu

Jiatao Gu

3 years

😭

@berserk_project

ベルセルク公式

@berserk_project

3 years

【三浦建太郎先生　ご逝去の報】『ベルセルク』の作者である三浦建太郎先生が、2021年5月6日、急性大動脈解離のため、ご逝去されました。三浦先生の画業に最大の敬意と感謝を表しますとともに、心よりご冥福をお祈りいたします。 2021年5月20日　株式会社白泉社　ヤングアニマル編集部

Tweet media one

0

203K

331K

0

0

3

@thoma_gu

Jiatao Gu

7 years

@kchonyc

Tweet card media

Evaluate metrics in WMT task · Issue #50 · tensorflow/tensor2tensor

Hi, I read the paper Attention is all you need. The results of wmt tasks are really exciting. But I found that there's no detailed explanation about what exact metrics was used in wmt translati...

0

1

3

@thoma_gu

Jiatao Gu

5 years

@ChanghanWang @JakeZzzzzzz

Tweet media one

0

1

3

@thoma_gu

Jiatao Gu

2 years

Despite the empirical success, diffusion models (DMs) are restricted to denoising in the ambient space. On the other hand, common generative models like VAEs employ a coarse-to-fine generation process. In this work, we are interested in combining the best of the two worlds. (2/6)

Tweet media one

Tweet media two

1

0

3

@thoma_gu

Jiatao Gu

7 years

@jekbradbury

0

0

1

@thoma_gu

Jiatao Gu

4 years

@zngu Not sure what you wanted to say. Whenever people are reporting the speed-up, we should always state the baseline model we are comparing with. In your words, any neural system might be potentially slower than SMT then.

1

0

3

@thoma_gu

Jiatao Gu

5 years

@AlexRoseJo @ChanghanWang @JakeZzzzzzz Hi Alexander , the code is in internal fairseq and we will release soon

0

0

3

@thoma_gu

Jiatao Gu

2 years

@YiTayML @MIT_CSAIL @Saboo_Shubham_ They just came from the same time and BART did have a simpler formulation as an encoder-decoder model. I personally feel very annoying by the way you talk about things. Thanks

3

0

3

@thoma_gu

Jiatao Gu

4 years

@zngu @odashi_t @raphaelshu @kchonyc @jlibovicky @jasonleeinf Also, in my view, non-autoregressive approaches may or may not be useful in the end, as it has both potentials and limitations. I think it is still a developing area. I am not sure we should limit ourselves by asking all papers to compare with the highly optimized system so far.

1

0

3

@thoma_gu

Jiatao Gu

2 years

It will introduce some classical methods of non-autoregressive generation for machine translation and its recent applications on various tasks, including GEC, ASR, TTS, and image generation!

0

0

3

@thoma_gu

Jiatao Gu

1 year

@tengyuma @HongLiu9903 @zhiyuanli_ @dlwh @percyliang @StanfordAILab @stanfordnlp @StanfordCRFM @Stanford Will this be useful for other domains like speed-up the convergence of diffusion models?

0

0

2

@thoma_gu

Jiatao Gu

2 years

@itsbautistam Congrats!

1

0

3