Timur Garipov Profile
Timur Garipov

@tim_garipov

2,646
Followers
946
Following
25
Media
134
Statuses

Member of Technical Staff @OpenAI PhD @MITEECS @MIT_CSAIL

San Francisco, CA
Joined March 2018
Don't wanna be here? Send us removal request.
Pinned Tweet
@tim_garipov
Timur Garipov
4 months
Yesterday, I successfully defended my PhD thesis "Guiding Deep Probabilistic Models". I have been so lucky to be advised by Tommi Jaakkola and to have the support of my thesis committee members: @samikaski and @phillip_isola . Defense slides:
15
8
158
@tim_garipov
Timur Garipov
2 months
I moved to SF and joined @OpenAI .
61
10
1K
@tim_garipov
Timur Garipov
11 months
Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N
Tweet media one
1
59
330
@tim_garipov
Timur Garipov
10 months
Looking forward to #NeurIPS2023 in New Orleans! Message me if you want to meet at the conference I am on the job market! I am finishing my PhD at MIT by Fall 2024 and looking for Research Scientist roles in industry. My research interests in 🧵
@tim_garipov
Timur Garipov
11 months
Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N
Tweet media one
1
59
330
3
18
161
@tim_garipov
Timur Garipov
5 years
First paper written at a new lab is out on arxiv! "The Benefits of Pairwise Discriminators for Adversarial Training" Joint work with @ShangyuanTong and Tommi Jaakkola. Arxiv: Code:
Tweet media one
Tweet media two
2
18
75
@tim_garipov
Timur Garipov
5 years
In collaboration with @Pavel_Izmailov and Javier @ideami we produced a hi-res visualization of the training process of a mode-connecting curve in DNN loss landscape. blog post: Javier's website:
2
9
52
@tim_garipov
Timur Garipov
2 years
I am at #Neurips2022 this week! DM me if you would like to chat about deep learning and probabilistic models including DL loss landscapes, domain adaptation, optimal transport, diffusion, and ML&Control.
2
3
47
@tim_garipov
Timur Garipov
2 years
@ShangyuanTong and I will be presenting our ICLR 2022 spotlight paper "Adversarial Support Alignment" on Mon 04/25 1:30pm-3:30pm (EDT). Joint work with Yang Zhang, @CodeTerminator , and Tommi Jaakkola. Paper: Code: 1/8
Tweet media one
2
9
40
@tim_garipov
Timur Garipov
5 years
Now, when I've finished my Master's, I am excited to announce that I will join @MITEECS and start pursuing PhD in fall. Huge thanks to my family, friends, collaborators, and mentors who supported me at the beginning of my academic journey. Thrilled to see what's coming next!
3
1
38
@tim_garipov
Timur Garipov
10 months
Presenting "Compositional Sculpting of Iterative Generative Processes" @ #NeurlPS2023 today! Poster #712 5pm-7pm Come by to chat about mathematical operations and algorithms for the composition of generative models. Links:
Tweet media one
@tim_garipov
Timur Garipov
11 months
Classifier guidance in diffusion enables generation conditioned on information that might not have been specified at training time. In our recent #NeurIPS2023 paper we show how this idea can be generalized to compose pre-trained diffusion models as well as GFlowNets. 🧵 1/N
Tweet media one
1
59
330
0
5
31
@tim_garipov
Timur Garipov
1 year
Excited to start my summer research internship @Cruise ! Let me know if you are in the Bay Area and would like to catch up!
1
1
23
@tim_garipov
Timur Garipov
6 years
Our paper "Averaging Weights Leads to Wider Optima and Better Generalization" appears as Oral at UAI Joint work with @Pavel_Izmailov , Dmitrii Podoprikhin, Dmitry Vetrov, and @andrewgwils . arXiv: code:
0
7
23
@tim_garipov
Timur Garipov
6 years
Our paper "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs" is accepted to #NIPS2018 as a spotlight! Joint work with my friends @Pavel_Izmailov and Dmitrii Podoprikhin under supervision of Dmitry Vetrov and @andrewgwils . arxiv:
0
4
23
@tim_garipov
Timur Garipov
6 years
Check out the short video summarizing our #NeurIPS2018 paper "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs". Come to our presentation (spotlight at 04:20 PM, poster #162 , Tue Dec 4th). We would be happy to talk to you at the conference!
0
6
9
@tim_garipov
Timur Garipov
11 months
Diffusion models and GFlowNets are iterative processes. Naturally, their sampling distributions can be altered by interventions on sampling policies. However, such policy modifications can break the balance between the steps of the process leading to uncontrollable results. 4/N
Tweet media one
Tweet media two
1
1
8
@tim_garipov
Timur Garipov
10 months
I am excited about understanding deep learning, probabilistic modeling, uncertainty estimation, and generative models. Would be thrilled to work on applied ML. See for my CV and publications.
0
0
9
@tim_garipov
Timur Garipov
11 months
Starting with a prior p(X), we can create a new distribution by introducing an observation Y w/ likelihood p(Y | X). Following Bayes’ rule, the posterior emphasizes points that agree with the observation. Repeating this procedure, we can “sculpt” the desired distribution. 6/N
Tweet media one
1
0
7
@tim_garipov
Timur Garipov
11 months
🧪 we composed pre-trained GFlowNets p1 - generates molecules with high SEH (a proxy for binding to a specific enzyme), p2 - generates molecules with high synthetic availability. Harmonic Mean ➡️ molecules w/ high SA and SEH. Contrast ➡️ high SA but low SEH and vice versa. 13/N
Tweet media one
1
1
7
@tim_garipov
Timur Garipov
11 months
To summarize, we introduced a method for composing iterative generative processes. These compositions are based on observation conditioning and can be realized with classifier guidance. Please see our paper for more details and results and find our code on GitHub. N/N
0
1
6
@tim_garipov
Timur Garipov
11 months
Prior work has realized “product” and “negation” in diffusion models via score-function arithmetic. However, the sampling procedure requires MCMC correction as the direct sums of scores break the diffusion balance. We explore a different approach. 5/N
Tweet media one
1
0
5
@tim_garipov
Timur Garipov
11 months
Compositional generation = operations on distributions. In EBMs, energy functions can be added/subtracted resulting in new distributions called “product” and “negation”.These operations, explored in the prior work, do not require changes in the sampling algorithm. 3/N
Tweet media one
1
0
5
@tim_garipov
Timur Garipov
5 years
@jm_alexia Regarding the model, we use +E[C(y)], since, motivated by our theory, we want to parameterize a symmetric discriminator. More work is needed to explain the practical success of expected critics in PairGAN/RaGAN. Maybe, the function space perspective could be useful for that.
1
2
3
@tim_garipov
Timur Garipov
11 months
To compose two distributions, we start with a mixture distribution. Naturally, we specify the likelihood Y | X as the probability that X has been generated from the specific base model. In this case, Y represents the index of the base model. 7/N
Tweet media one
1
0
4
@tim_garipov
Timur Garipov
11 months
Observations are independent given a terminal state, so we train a terminal state classifier to predict them independently. For intermediate states, observations are not independent, the non-terminal state classifier must predict observations jointly. 12/N
Tweet media one
1
0
4
@tim_garipov
Timur Garipov
11 months
In diffusion models and GFlowNets mixture and conditional policies can be realized provided a classifier trained on samples from the models. With these components, we obtain the score function of the composite diffusion and the forward policy of the composite GFlowNet. 11/N
Tweet media one
Tweet media two
1
0
4
@tim_garipov
Timur Garipov
11 months
The density of this posterior is proportional to the harmonic mean of the densities of base distributions. We call this operation the “harmonic mean”. This distribution concentrates on points that are common in both p1 and p2. 9/N
Tweet media one
1
0
4
@tim_garipov
Timur Garipov
11 months
Observations (1, 1) have a different effect. We call this operation the “contrast of p1 and p2” as it highlights regions that are common in p1 and uncommon in p2. Flipping the observations reverses the direction of the effect and we obtain the “contrast of p2 and p1”. 10/N
Tweet media one
1
0
4
@tim_garipov
Timur Garipov
2 years
Methodologically, distribution alignment can be implemented via adversarial training with a domain discriminator. We show that Jensen–Shannon discriminator preserves support differences in the input space as support differences in its 1D output space. 6/8
Tweet media one
1
0
3
@tim_garipov
Timur Garipov
5 years
@jbloom22 Thank you for the positive feedback, Jon! Glad you liked it!
0
0
3
@tim_garipov
Timur Garipov
11 months
We show that with such observations, we can create composite distributions by repeatedly applying guidance to the mixture prior. For example, by specifying observations (1, 2) we can emphasize the points that have high likelihood under both distributions at the same time. 8/N
Tweet media one
1
0
3
@tim_garipov
Timur Garipov
2 months
0
0
3
@tim_garipov
Timur Garipov
10 months
@habibh Thanks!
0
0
0
@tim_garipov
Timur Garipov
2 years
Based on this result, we propose a practical method called Adversarial Support Alignment (ASA). In experiments on domain adaptation under label distribution shift, we show that ASA is more robust against these shifts compared to other alignment-based baselines. 7/8
Tweet media one
1
0
3
@tim_garipov
Timur Garipov
21 days
@andrewgwils Congratulations, Andrew!
1
0
2
@tim_garipov
Timur Garipov
2 years
For more details, please see our paper where we also draw the connections between different alignment approaches from the optimal transport perspective. 8/8
Tweet media one
0
0
2
@tim_garipov
Timur Garipov
7 years
@ngutten @SashaVNovikov Right now we don't know the exact answer to this question. But several results suppose that these surfaces may be connected. You may like to take a look at the paper which provides theoretical analysis of a related subject under some assumptions.
1
0
2
@tim_garipov
Timur Garipov
5 years
@jm_alexia Main focus of this paper is the theoretical analysis of pairwise objectives. We find that there is a generator objective which doesn’t allow the discriminator to break the aligned generator. We believe that this is a promising approach for addressing the instability of training.
1
0
1
@tim_garipov
Timur Garipov
3 years
@elmelis @hseas Congrats, David!
0
0
1
@tim_garipov
Timur Garipov
4 months
0
0
1
@tim_garipov
Timur Garipov
5 years
@jm_alexia BTW, we are big fans of your work on GANs. We are very much inspired by RGAN (and other papers) and experiments on CAT images 🐈
0
0
0
@tim_garipov
Timur Garipov
2 years
We study the problem of aligning the supports of distributions. Compared to the works on distribution alignment, we do not require the densities to be matched. 5/8
Tweet media one
1
0
1
@tim_garipov
Timur Garipov
4 months
0
0
1
@tim_garipov
Timur Garipov
7 years
@ngutten @SashaVNovikov Hi Nicholas! First, let me check that I understood you question right: are you asking about a situation in which we won't be able to find a path of almost constant loss or about a way to find such path between two non-optimal points in the NN parameters space?
1
0
1
@tim_garipov
Timur Garipov
7 years
@ngutten @SashaVNovikov Following the paper notation I wrote down corresponding objectives. Based on the paper I would say that it should be easy to find such paths but of course this has to be checked properly.
Tweet media one
1
0
1
@tim_garipov
Timur Garipov
7 years
@ngutten @SashaVNovikov There is a funny thing that objectives we used to connect local optima could be seen as MAE objective (2) with v = 0.
1
0
1
@tim_garipov
Timur Garipov
2 years
Recent works in domain adaptation literature show that exact alignment of distributions is not always desired and can in fact be detrimental. The most well-known scenario demonstrating this issue is label distribution shift. 4/8
Tweet media one
1
0
1
@tim_garipov
Timur Garipov
2 years
In domain adaptation context, alignment-based methods (e.g DANN) learn a feature extractor such that the extracted representations are 1) predictive of the class label in source domain; 2) domain invariant: source and target feature distributions must be aligned. 3/8
Tweet media one
1
0
1
@tim_garipov
Timur Garipov
2 years
Machine learning tasks often involve alignment of distributions. The goal of distribution alignment is to bring the distributions closer by tuning the parameters. The optimality is reached when the distributions are aligned, i.e. the densities are matched everywhere. 2/8
Tweet media one
1
0
1
@tim_garipov
Timur Garipov
7 years
@ngutten @SashaVNovikov Thinking about the latter question I came up with two loss functions for paths which may allow to find the paths. Suppose the two networks we want to connect have a loss function values equal to v. Then we can minimize MSE\MAE values between the loss of a point on the path and v.
1
0
1