Timur Garipov @tim_garipov Twitter profile

Pinned Tweet

Timur Garipov

4 months

Yesterday, I successfully defended my PhD thesis "Guiding Deep Probabilistic Models". I have been so lucky to be advised by Tommi Jaakkola and to have the support of my thesis committee members: @samikaski and @phillip_isola . Defense slides:

15

8

158

Last Seen Profiles

@ibq_77

@GradillaJe5926

@alhrbyhsam614

@edsicilia

@alhrbyhsam614

@TrevaEtienne

@Vero_Blanco

@ap_owariasahi

@alhrbyhsam614

@BinorRaja

@Cabify_Peru

@CFC_NKUNK

@alhrbyhsam614

@stw46

@aarenmccray06

@DamnitMaurie

@jcrtoral

@Royalm44

@rodinea

@pc84D9C5UvV1CAS

@GBP70631328

@shareeajahnaii

@Carbon_wayne

@EzriKonsa

@jaewithbubu

@sabR1na_07hiy

@BonzoMcPierce

@Mersinevli3335

@IRIran_Military

@GracePoe2013

@noiire_

@SlyBitcoin

@GutzlerSta93686

@GannonHoopsFan

@TinaBaby9x

@christopherglen

Timur Garipov

@tim_garipov

2 months

I moved to SF and joined @OpenAI .

61

10

1K

Timur Garipov

@tim_garipov

11 months

Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N

1

59

330

Timur Garipov

@tim_garipov

10 months

Looking forward to #NeurIPS2023 in New Orleans! Message me if you want to meet at the conference I am on the job market! I am finishing my PhD at MIT by Fall 2024 and looking for Research Scientist roles in industry. My research interests in 🧵

Timur Garipov

@tim_garipov

11 months

Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N

1

59

330

3

18

161

Timur Garipov

@tim_garipov

7 years

Stochastic Weight Averaging in @PyTorch is available at Codebase for our paper by @Pavel_Izmailov , Dmitrii Podoprikhin, @tim_garipov , Dmitry Vetrov, @andrewgwils .

Averaging Weights Leads to Wider Optima and Better Generalization

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of...

arxiv.org

0

29

94

Timur Garipov

@tim_garipov

5 years

First paper written at a new lab is out on arxiv! "The Benefits of Pairwise Discriminators for Adversarial Training" Joint work with @ShangyuanTong and Tommi Jaakkola. Arxiv: Code:

2

18

75

Timur Garipov

@tim_garipov

5 years

In collaboration with @Pavel_Izmailov and Javier @ideami we produced a hi-res visualization of the training process of a mode-connecting curve in DNN loss landscape. blog post: Javier's website:

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs Figure 1: visualizat

izmailovpavel.github.io

2

9

52

Timur Garipov

@tim_garipov

2 years

I am at #Neurips2022 this week! DM me if you would like to chat about deep learning and probabilistic models including DL loss landscapes, domain adaptation, optimal transport, diffusion, and ML&Control.

2

3

47

Timur Garipov

@tim_garipov

2 years

@ShangyuanTong and I will be presenting our ICLR 2022 spotlight paper "Adversarial Support Alignment" on Mon 04/25 1:30pm-3:30pm (EDT). Joint work with Yang Zhang, @CodeTerminator , and Tommi Jaakkola. Paper: Code: 1/8

2

9

40

Timur Garipov

@tim_garipov

5 years

Now, when I've finished my Master's, I am excited to announce that I will join @MITEECS and start pursuing PhD in fall. Huge thanks to my family, friends, collaborators, and mentors who supported me at the beginning of my academic journey. Thrilled to see what's coming next!

3

1

38

Timur Garipov

@tim_garipov

10 months

Presenting "Compositional Sculpting of Iterative Generative Processes" @ #NeurlPS2023 today! Poster #712 5pm-7pm Come by to chat about mathematical operations and algorithms for the composition of generative models. Links:

Timur Garipov

@tim_garipov

11 months

Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N

1

59

330

0

5

31

Timur Garipov

@tim_garipov

11 months

"Compositional Sculpting of Iterative Generative Processes" Joint work with Sebastiaan De Peuter, @EpisodeYang , @montsgarg , @samikaski , and Tommi Jaakkola. OpenReview: GitHub: Arxiv: 2/N

Compositional Sculpting of Iterative Generative Processes

High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative...

arxiv.org

1

4

27

Timur Garipov

@tim_garipov

1 year

Excited to start my summer research internship @Cruise ! Let me know if you are in the Bay Area and would like to catch up!

1

23

Timur Garipov

@tim_garipov

6 years

Our paper "Averaging Weights Leads to Wider Optima and Better Generalization" appears as Oral at UAI Joint work with @Pavel_Izmailov , Dmitrii Podoprikhin, Dmitry Vetrov, and @andrewgwils . arXiv: code:

GitHub - timgaripov/swa: Stochastic Weight Averaging in PyTorch

Stochastic Weight Averaging in PyTorch. Contribute to timgaripov/swa development by creating an account on GitHub.

github.com

0

7

23

Timur Garipov

@tim_garipov

6 years

Our paper "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs" is accepted to #NIPS2018 as a spotlight! Joint work with my friends @Pavel_Izmailov and Dmitrii Podoprikhin under supervision of Dmitry Vetrov and @andrewgwils . arxiv:

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by...

arxiv.org

0

4

23

Timur Garipov

@tim_garipov

6 years

Check out the short video summarizing our #NeurIPS2018 paper "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs". Come to our presentation (spotlight at 04:20 PM, poster #162 , Tue Dec 4th). We would be happy to talk to you at the conference!

0

6

9

Timur Garipov

@tim_garipov

11 months

Diffusion models and GFlowNets are iterative processes. Naturally, their sampling distributions can be altered by interventions on sampling policies. However, such policy modifications can break the balance between the steps of the process leading to uncontrollable results. 4/N

1

8

Timur Garipov

@tim_garipov

10 months

I am excited about understanding deep learning, probabilistic modeling, uncertainty estimation, and generative models. Would be thrilled to work on applied ML. See for my CV and publications.

0

9

Timur Garipov

@tim_garipov

11 months

Starting with a prior p(X), we can create a new distribution by introducing an observation Y w/ likelihood p(Y | X). Following Bayes’ rule, the posterior emphasizes points that agree with the observation. Repeating this procedure, we can “sculpt” the desired distribution. 6/N

1

0

7

Timur Garipov

@tim_garipov

11 months

🧪 we composed pre-trained GFlowNets p1 - generates molecules with high SEH (a proxy for binding to a specific enzyme), p2 - generates molecules with high synthetic availability. Harmonic Mean ➡️ molecules w/ high SA and SEH. Contrast ➡️ high SA but low SEH and vice versa. 13/N

1

7

Timur Garipov

@tim_garipov

11 months

To summarize, we introduced a method for composing iterative generative processes. These compositions are based on observation conditioning and can be realized with classifier guidance. Please see our paper for more details and results and find our code on GitHub. N/N

0

1

6

Timur Garipov

@tim_garipov

11 months

Prior work has realized “product” and “negation” in diffusion models via score-function arithmetic. However, the sampling procedure requires MCMC correction as the direct sums of scores break the diffusion balance. We explore a different approach. 5/N

1

0

5

Timur Garipov

@tim_garipov

11 months

Compositional generation = operations on distributions. In EBMs, energy functions can be added/subtracted resulting in new distributions called “product” and “negation”.These operations, explored in the prior work, do not require changes in the sampling algorithm. 3/N

1

0

5

Timur Garipov

@tim_garipov

5 years

@jm_alexia Regarding the model, we use +E[C(y)], since, motivated by our theory, we want to parameterize a symmetric discriminator. More work is needed to explain the practical success of expected critics in PairGAN/RaGAN. Maybe, the function space perspective could be useful for that.

1

2

3

Timur Garipov

@tim_garipov

11 months

To compose two distributions, we start with a mixture distribution. Naturally, we specify the likelihood Y | X as the probability that X has been generated from the specific base model. In this case, Y represents the index of the base model. 7/N

1

0

4

Timur Garipov

@tim_garipov

11 months

Observations are independent given a terminal state, so we train a terminal state classifier to predict them independently. For intermediate states, observations are not independent, the non-terminal state classifier must predict observations jointly. 12/N

1

0

4

Timur Garipov

@tim_garipov

11 months

In diffusion models and GFlowNets mixture and conditional policies can be realized provided a classifier trained on samples from the models. With these components, we obtain the score function of the composite diffusion and the forward policy of the composite GFlowNet. 11/N

1

0

4

Timur Garipov

@tim_garipov

11 months

The density of this posterior is proportional to the harmonic mean of the densities of base distributions. We call this operation the “harmonic mean”. This distribution concentrates on points that are common in both p1 and p2. 9/N

1

0

4

Timur Garipov

@tim_garipov

11 months

Observations (1, 1) have a different effect. We call this operation the “contrast of p1 and p2” as it highlights regions that are common in p1 and uncommon in p2. Flipping the observations reverses the direction of the effect and we obtain the “contrast of p2 and p1”. 10/N

1

0

4

Timur Garipov

@tim_garipov

2 years

Methodologically, distribution alignment can be implemented via adversarial training with a domain discriminator. We show that Jensen–Shannon discriminator preserves support differences in the input space as support differences in its 1D output space. 6/8

1

0

3

Timur Garipov

@tim_garipov

5 years

@jbloom22 Thank you for the positive feedback, Jon! Glad you liked it!

0

3

Timur Garipov

@tim_garipov

11 months

We show that with such observations, we can create composite distributions by repeatedly applying guidance to the mixture prior. For example, by specifying observations (1, 2) we can emphasize the points that have high likelihood under both distributions at the same time. 8/N

1

0

3

Timur Garipov

@tim_garipov

2 months

@McaleerStephen @OpenAI Thank you!

0

3

Timur Garipov

@tim_garipov

10 months

@habibh Thanks!

0

Timur Garipov

@tim_garipov

2 years

Based on this result, we propose a practical method called Adversarial Support Alignment (ASA). In experiments on domain adaptation under label distribution shift, we show that ASA is more robust against these shifts compared to other alignment-based baselines. 7/8

1

0

3

Timur Garipov

@tim_garipov

21 days

@andrewgwils Congratulations, Andrew!

1

0

2

Timur Garipov

@tim_garipov

2 years

For more details, please see our paper where we also draw the connections between different alignment approaches from the optimal transport perspective. 8/8

0

2

Timur Garipov

@tim_garipov

7 years

@ngutten @SashaVNovikov Right now we don't know the exact answer to this question. But several results suppose that these surfaces may be connected. You may like to take a look at the paper which provides theoretical analysis of a related subject under some assumptions.

1

0

2

Timur Garipov

@tim_garipov

5 years

@jm_alexia Main focus of this paper is the theoretical analysis of pairwise objectives. We find that there is a generator objective which doesn’t allow the discriminator to break the aligned generator. We believe that this is a promising approach for addressing the instability of training.

1

0

1

Timur Garipov

@tim_garipov

3 years

@elmelis @hseas Congrats, David!

0

1

Timur Garipov

@tim_garipov

4 months

@_onionesque @samikaski @phillip_isola Thanks, Shubhendu!

0

1

Timur Garipov

@tim_garipov

5 years

@jm_alexia BTW, we are big fans of your work on GANs. We are very much inspired by RGAN (and other papers) and experiments on CAT images 🐈

0

Timur Garipov

@tim_garipov

4 months

@nicolayr_ @samikaski @phillip_isola Thanks!

0

1

Timur Garipov

@tim_garipov

2 years

We study the problem of aligning the supports of distributions. Compared to the works on distribution alignment, we do not require the densities to be matched. 5/8

1

0

1

Timur Garipov

@tim_garipov

4 months

@GuangHeLee1 @samikaski @phillip_isola Thanks, Guang-He!

0

1

Timur Garipov

@tim_garipov

7 years

@ngutten @SashaVNovikov Hi Nicholas! First, let me check that I understood you question right: are you asking about a situation in which we won't be able to find a path of almost constant loss or about a way to find such path between two non-optimal points in the NN parameters space?

1

0

1

Timur Garipov

@tim_garipov

7 years

@ngutten @SashaVNovikov Following the paper notation I wrote down corresponding objectives. Based on the paper I would say that it should be easy to find such paths but of course this has to be checked properly.

1

0

1

Timur Garipov

@tim_garipov

4 months

@jeankaddour @samikaski @phillip_isola Thank you!

0

1

Timur Garipov

@tim_garipov

7 years

@ngutten @SashaVNovikov There is a funny thing that objectives we used to connect local optima could be seen as MAE objective (2) with v = 0.

1

0

1

Timur Garipov

@tim_garipov

2 years

Recent works in domain adaptation literature show that exact alignment of distributions is not always desired and can in fact be detrimental. The most well-known scenario demonstrating this issue is label distribution shift. 4/8

1

0

1

Timur Garipov

@tim_garipov

2 years

In domain adaptation context, alignment-based methods (e.g DANN) learn a feature extractor such that the extracted representations are 1) predictive of the class label in source domain; 2) domain invariant: source and target feature distributions must be aligned. 3/8

1

0

1

Timur Garipov

@tim_garipov

2 years

Machine learning tasks often involve alignment of distributions. The goal of distribution alignment is to bring the distributions closer by tuning the parameters. The optimality is reached when the distributions are aligned, i.e. the densities are matched everywhere. 2/8

1

0

1

Timur Garipov

@tim_garipov

7 years

@ngutten @SashaVNovikov Thinking about the latter question I came up with two loss functions for paths which may allow to find the paths. Suppose the two networks we want to connect have a loss function values equal to v. Then we can minimize MSE\MAE values between the loss of a point on the path and v.

1

0

1