Carl Vondrick Profile Banner
Carl Vondrick Profile
Carl Vondrick

@cvondrick

6,122
Followers
584
Following
73
Media
740
Statuses

Associate Professor at @Columbia . Computer Vision and Machine Learning.

New York, NY
Joined October 2012
Don't wanna be here? Send us removal request.
@cvondrick
Carl Vondrick
6 years
Our latest work shows that learning to colorize videos causes visual tracking to emerge automatically! Blog: Paper: @alirezafathi @kevskibombom @sguada @abhi2610
10
144
459
@cvondrick
Carl Vondrick
8 years
Fantastic conditional GAN results by Isola et al
Tweet media one
6
250
419
@cvondrick
Carl Vondrick
3 years
The future is hard to anticipate! In our latest #CVPR2021 paper, we introduce a framework for learning *what* is predictable in the future. Rather than committing up front to categories to predict, our approach learns how to hedge the bet.
9
44
283
@cvondrick
Carl Vondrick
8 years
Finding Tiny Faces -- had to zoom in quite a bit to parse how cool the results are!
Tweet media one
1
142
258
@cvondrick
Carl Vondrick
4 years
Learning unsupervised machine translation is easier if you open your eyes! Image distributions create transitive relations between languages. This creates incidental supervision for learning multilingual representations on 50 unpaired languages @Surisdi
4
43
242
@cvondrick
Carl Vondrick
6 years
Neural networks fooled by unusual poses
Tweet media one
7
83
226
@cvondrick
Carl Vondrick
8 years
Learning Features by Watching Objects Move by Pathak et al
Tweet media one
3
105
228
@cvondrick
Carl Vondrick
7 years
amazing generations of the video future! Red border means output, green is input.
1
96
215
@cvondrick
Carl Vondrick
8 years
SoundNet: Learning natural sound representations with convnets and 2 million unlabeled videos.
Tweet media one
2
72
175
@cvondrick
Carl Vondrick
8 years
Recognizing objects and scenes from sound only. Turn on your speakers! More visualizations:
1
95
168
@cvondrick
Carl Vondrick
4 years
What causes adversarial examples? Latest #ECCV2020 paper from @ChengzhiM and Amogh shows that deep networks are vulnerable partly because they are trained on too few tasks. Just by increasing tasks, we strengthen robustness for each task individually.
Tweet media one
4
37
150
@cvondrick
Carl Vondrick
7 years
Unsupervised Learning by Predicting Noise by Bojanowski and Joulin. Cool yet simple idea that works quite well!!
Tweet media one
3
60
149
@cvondrick
Carl Vondrick
4 years
Oops! Dave+Bo introduce a dataset of unconstrained videos showing unintentional action. We study self-supervised approaches for learning video representations of intentionality. #CVPR2020 Poster 93, Tue 10am PST Website: Paper:
2
24
119
@cvondrick
Carl Vondrick
3 years
Learning from Unlabeled Video ( #LUV ❤️‍🔥) starts today at 1:50pm EDT / 10:50am PDT! You will LUV the speaker lineup and the curated papers! 😍 featuring @pathak2206 @akanazawa @SongShuran and more #CVPR2021 #CVPR21 #CVPR
Tweet media one
2
15
111
@cvondrick
Carl Vondrick
7 years
predicting the future, with semantic segmentations! by Neverova and Luc
Tweet media one
1
43
99
@cvondrick
Carl Vondrick
8 years
Cross-Modal Scene Networks: learning aligned representations across several different modalities
Tweet media one
1
47
92
@cvondrick
Carl Vondrick
7 years
CVPR workshop on negative results!
0
64
85
@cvondrick
Carl Vondrick
3 years
Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!
2
15
84
@cvondrick
Carl Vondrick
4 years
Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos.
Tweet media one
2
13
77
@cvondrick
Carl Vondrick
6 years
Got many replies. I don't believe the problem has to do with neural nets. The problem is the paradigm of supervised classification and closed datasets. We need models that learn from an open world, with self-supervision, never stop learning, and transfer between tasks.
@cvondrick
Carl Vondrick
6 years
Neural networks fooled by unusual poses
Tweet media one
7
83
226
5
18
73
@cvondrick
Carl Vondrick
7 years
See, hear, read: deep representations shared over 3 natural modalities. Units activate on objects in each modality.
Tweet media one
1
24
67
@cvondrick
Carl Vondrick
4 months
With just a few hours of experimentation in the physical world, a robot can learn on its own to design and throw paper airplanes further than a person, and even learn to build robot grippers out of cheap paper. No foundation models. No simulation. No language.
@ruoshi_liu
Ruoshi Liu
4 months
Humans can design tools to solve various real-world tasks, and so should embodied agents. We introduce PaperBot, a framework for learning to create and utilize paper-based tools directly in the real world.
7
29
171
0
2
60
@cvondrick
Carl Vondrick
4 years
Videos of the full workshop are now available on YouTube: . Thanks everyone, especially the speakers, for a great workshop!
@cvondrick
Carl Vondrick
4 years
Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos.
Tweet media one
2
13
77
0
9
64
@cvondrick
Carl Vondrick
4 years
Our new paper (w/ @Surisdi ,Dave) shows Transformers can meta-learn a process for language acquisition from vision. At inference, the policy adapts to new words and generalizes better. #CVPR2020 Paper: Talk: Mon 11:40am PST
Tweet media one
0
16
62
@cvondrick
Carl Vondrick
3 years
I am so excited to be part of this dream team. We will be investigating the next generation of ML and predictive models for truly planetary scale problems. If you are passionate about cutting-edge ML coupled with societal impact, please apply to Columbia for various positions!
@Columbia
Columbia University
3 years
Hurricane Ida made one thing clear: we are not prepared for the extreme weather caused by #climatechange . A new climate modeling center is designed to improve climate projections and encourage societies to plan for the inevitable disruptions ahead. @NSF
Tweet media one
0
9
36
0
1
60
@cvondrick
Carl Vondrick
4 years
Sssshhh!! There is so much noise in cities today. Ruilin and Rundi introduce a new approach that removes ambient noise from audio, letting the speech come through loud and clear. Let's have a listen... 🔊Turn on your speakers! 🔗
3
9
59
@cvondrick
Carl Vondrick
7 years
great high-res image manipulation by interpolating in feature space -- simple, no GAN required (Upchurch et al)
Tweet media one
1
15
57
@cvondrick
Carl Vondrick
7 years
Network visualization, dissection, and interpretability by David Bau and Bolei Zhou at MIT! @zhoubolei
Tweet media one
0
25
57
@cvondrick
Carl Vondrick
5 years
Learn about Learning from Unlabeled Videos at #CVPR2019 , Sunday in Room E, 9:00am Fresh posters and keynotes: Antonio Torralba, Noah Snavely, Andrew Zisserman, Bill Freeman, Abhinav Gupta, Kristen Grauman
1
10
54
@cvondrick
Carl Vondrick
4 years
I had a joke about homography, but it was too plane.
1
0
53
@cvondrick
Carl Vondrick
5 years
Announcing the Workshop on Learning from Unlabeled Video at CVPR 2019. Come for dynamite speakers, and stay for the abstracts! Abstract deadline is March 4. Topics include self-supervised learning, sound and vision, visual anticipation, active vision, etc
0
10
47
@cvondrick
Carl Vondrick
5 years
Self-supervised learning is prediction, and unsupervised learning is compression (in my view)
@RandomlyWalking
Charles Sutton
5 years
@jmhessel “Self-supervised” is a rebranding for “unsupervised” to avoid confusing people who ask Qs like “how can LMs be unsupervised if you give them the next token to predict”? I dislike rebranding, but I dislike even more arguing about whether LMs are unsupervised. So,🤷‍♂️?
11
12
113
5
1
47
@cvondrick
Carl Vondrick
6 years
The "deep inversion" quiz by Oxford: how well do you understand neural network visualizations?
Tweet media one
2
21
47
@cvondrick
Carl Vondrick
3 years
Don’t want pi day to end? Come to the hyperbolic world! In hyperbolic space, pi has no upper bound. You can eat pie for the rest of the year.
@cvondrick
Carl Vondrick
3 years
Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!
2
15
84
1
4
45
@cvondrick
Carl Vondrick
8 years
learning to recognize objects with only a few examples -- exciting 'low data' paradigm
0
13
44
@cvondrick
Carl Vondrick
7 years
Important warning of non-peer reviewed papers: public can lose trust in science and research if too much low-quality work is posted.
@CSProfKGD
Kosta Derpanis
7 years
To preprint or not. This debate sounds strangely familiar #ComputerVision
1
4
8
2
11
43
@cvondrick
Carl Vondrick
3 years
Turn any container into a smart container — all you need is noise!
@Boyuan__Chen
Boyuan Chen
3 years
How can we tell "what is where" inside a container, after dropping something into it? Can we generate visual scenes from sound? Excited to share our latest work: The Boombox: Visual Reconstruction from Acoustic Vibrations. ()
1
2
31
2
2
43
@cvondrick
Carl Vondrick
3 years
@jbhuang0604 Conclusion is where you accidentally tell reviewers how to reject your paper!
1
1
42
@cvondrick
Carl Vondrick
7 years
"Do Good Research" by Fredo Durand
1
22
40
@cvondrick
Carl Vondrick
6 years
CVPR should be "Computer Vision, Prediction, and Robotics"
0
1
36
@cvondrick
Carl Vondrick
8 years
cool idea to interactively reconfigure pretrained CNNs in order to recognize unseen classes, by Krishnan and Ramanan
Tweet media one
0
17
35
@cvondrick
Carl Vondrick
5 years
The computer vision group at Columbia is looking for a postdoctoral fellow. Come wrangle pixels with us in the big city. More info:
Tweet media one
0
15
31
@cvondrick
Carl Vondrick
6 years
Excellent piece, but I disagree we should give up our datasets. To get commonsense and generalization, we should create rich & diverse multi-modal datasets that span huge number of tasks. Probably need new data collection means, eg interaction and self-supervision (not MTurk)
@MelMitchell1
Melanie Mitchell
6 years
My opinion piece in the NY Times.
25
202
414
0
5
26
@cvondrick
Carl Vondrick
8 years
released 35 million video clips! stabilized, natural video. 1 year! fun dataset for generative video models
0
15
24
@cvondrick
Carl Vondrick
3 years
@rzhang88 I should see a doctor ASAP! AI is going to save my life!
Tweet media one
1
0
24
@cvondrick
Carl Vondrick
8 years
Learning camouflaged QR codes
@DmitryUlyanovML
Dmitry Ulyanov
8 years
A nice paper from our lab on learning visual codes, to appear at NIPS
1
18
48
0
7
23
@cvondrick
Carl Vondrick
7 years
Learning visual and auditory representations simultaneously from video!
@relja_work
Relja Arandjelović
7 years
My first paper at DeepMind: What can be learnt by looking at and listening to a large amount of unlabelled videos?
Tweet media one
4
105
266
0
2
21
@cvondrick
Carl Vondrick
7 years
2
2
20
@cvondrick
Carl Vondrick
8 years
Learning to find moving objects irrespective of camera motion, by Tokmakov et al.
Tweet media one
1
9
20
@cvondrick
Carl Vondrick
3 years
Most predictive models operate in Euclidean space. However, when there is uncertainty or multiple modes, the optimal solution is to regress the mean, which often lacks any interpretation. Our idea: Let’s make the mean mean something!
1
2
18
@cvondrick
Carl Vondrick
5 years
Map of emotional responses versus audible gasps:
0
3
19
@cvondrick
Carl Vondrick
7 years
0
4
19
@cvondrick
Carl Vondrick
3 years
Predictive models on physical robots learn rich features about their surroundings -- they learn about obstacles and even the policy of other robots. Latest paper with @BoyuanChen1 and @hodlipson , out today!
@CUSEAS
Columbia Engineering
3 years
Can a robot be empathetic? @MechCU Prof @hodlipson thinks so: his lab has created a robot that learns to visually predict how its partner robot will behave. @Columbia
Tweet media one
1
8
16
0
3
19
@cvondrick
Carl Vondrick
3 years
Hyperbolic geometry for machine learning and computer vision is a young and rapidly growing area. We are not the first to work with this geometry, and we will not be the last! Code, models, data, visuals, and links to tutorials are on our project website:
1
0
17
@cvondrick
Carl Vondrick
7 years
Photos with the smart phone removed
Tweet media one
0
5
15
@cvondrick
Carl Vondrick
8 years
what do we visualize when we visualize ConvNets? important question:
0
7
16
@cvondrick
Carl Vondrick
7 years
"Person analysis using cheap and large-scale synthetic data" in Learning from Synthetic Humans by Varol et al
Tweet media one
0
5
16
@cvondrick
Carl Vondrick
8 years
preprint for "Generating Videos with Scene Dynamics": adversarial nets for video generation & learning & prediction
1
7
14
@cvondrick
Carl Vondrick
7 years
Low-power vision mode that produces lower-quality image data suitable only for computer vision -- by Buckler et al
Tweet media one
0
2
15
@cvondrick
Carl Vondrick
6 years
@haldaume3 you can silently and randomly add questions that have a single, well-defined answer that you also know. then, discard all workers that fail those "quiz" questions
1
0
13
@cvondrick
Carl Vondrick
8 years
NIPS 2014: 3 reviews, 6k char rebut. 2015: 4 reviews, 5k char rebut. 2016: 6 reviews, 3k char rebut. 2017: 9 reviews, tweet rebuttal ?!
0
4
13
@cvondrick
Carl Vondrick
4 years
Last keynote: Alyosha Efros and Allan Jabri on learning space-time correspondences, starting in 5 min
Tweet media one
1
1
12
@cvondrick
Carl Vondrick
7 years
For example, a hidden unit automatically emerges for dogs. It activates on images of dogs, sentences about dogs, or sounds of barking
0
0
12
@cvondrick
Carl Vondrick
8 years
@cvondrick learns nice convolutional filters for raw waveforms, without ground truth labels
Tweet media one
3
2
12
@cvondrick
Carl Vondrick
6 years
Dear twitter, how do you take notes and jot down ideas for research? Do you use an app, pen/paper, memory?
11
2
11
@cvondrick
Carl Vondrick
4 years
The main idea: Natural audio will contain intervals of silence, which we can leverage as incidental supervision for learning to denoise. By learning to first detect these pauses, we can estimate a profile for the noise, and suppress it throughout the audio.
1
0
11
@cvondrick
Carl Vondrick
5 years
Saturday at #ICML2019 !
@avdnoord
Aäron van den Oord
5 years
Excited to announce our #ICML2019 Workshop on Self-Supervised Learning! Covering- Vision, NLP, Audio, Robotics, RL ... Submissions now open - deadline April 25! Speakers: @ylecun , @chelseabfinn , Andrew Zisserman, Alexei Efros, Jacob Devlin, Abhinav Gupta
Tweet media one
2
56
244
0
0
11
@cvondrick
Carl Vondrick
5 years
Andrew Zisserman on leveraging temporal coherence and sound to learn from video!
Tweet media one
1
2
9
@cvondrick
Carl Vondrick
3 years
Here’s an example. As the model observes more of the video, the future becomes more and more predictable. Our model makes increasingly specific forecasts of the future.
1
0
11
@cvondrick
Carl Vondrick
3 years
Just by changing predictive models to work in hyperbolic space instead of Euclidean space, the model automatically learns to select the right level of abstraction under uncertainty!
2
0
10
@cvondrick
Carl Vondrick
6 years
Learning to see in the dark: super cool results!
0
2
10
@cvondrick
Carl Vondrick
7 years
Fine-grained sound recognition, plus a fun dataset collection!
@CSProfKGD
Kosta Derpanis
7 years
Why an #ArtificialIntelligence firm is busy smashing thousands of windows
0
0
4
0
2
10
@cvondrick
Carl Vondrick
4 years
@dimadamen There’s also multiple modes, eg multimodal prediction
2
0
10
@cvondrick
Carl Vondrick
8 years
Great results from the Scene Parsing Challenge with Places Database
Tweet media one
0
1
9
@cvondrick
Carl Vondrick
2 years
Congratulations Dídac!! @Surisdi
@ColumbiaCompSci
ColumbiaCompSci
2 years
Didac Suris ( @Surisdi ), one of our PhD students, won a Microsoft Research Fellowship ( @MSFTResearch )! Learn more about him and his PhD experience here -
Tweet media one
2
4
51
0
0
9
@cvondrick
Carl Vondrick
5 years
Cool paper from Berkeley: learn 3D flow from unlabeled stereo videos
Tweet media one
0
1
9
@cvondrick
Carl Vondrick
21 days
Generative video models facing the physical world 👇 #CVPR2024
@ruoshi_liu
Ruoshi Liu
22 days
Recently released video generation models are amazing😍 How can we use them in robotics to learn generalizable visuomotor policies? Come find out in my talks at these 4 CVPR workshops next week, where I will talk about recent works in 3D, generative models, and robotics!
1
6
53
0
0
9
@cvondrick
Carl Vondrick
6 years
@farhanhubble @quantombone Videos are coming soon!!
1
0
8
@cvondrick
Carl Vondrick
4 years
While each language represents a bicycle with a different word, the underlying visual representation remains consistent. A bicycle has a similar appearance in the UK, France, Japan, and India. We leverage this natural property for translating unpaired languages.
Tweet media one
1
1
8
@cvondrick
Carl Vondrick
5 years
Antonio Torralba on multi-modal learning and self-supervised learning
Tweet media one
1
0
7
@cvondrick
Carl Vondrick
8 years
@cvondrick basic idea is: visual recognition networks teach networks for sound, enabling learning from tons of unlabeled video
Tweet media one
1
2
7
@cvondrick
Carl Vondrick
6 years
1
2
7
@cvondrick
Carl Vondrick
4 years
@dimadamen @fdellaert @Oxford_VGG Video should be on YouTube next week. Thanks everyone for attending and great questions, and especially Yale Song for leading the behind the scenes!
0
0
7
@cvondrick
Carl Vondrick
4 years
It learns how to translate individual words across 50 fifty languages... even without paired language supervision
Tweet media one
1
0
6
@cvondrick
Carl Vondrick
5 years
@LakeBrenden @washingtonpost To be fair, I’m a human who also needs instructions to solve a Rubik cube!
0
0
6
@cvondrick
Carl Vondrick
4 years
The approach finds very interesting transitive paths between languages via vision, which we show below. When there is a strong path, the final score is high (top row), and it's low when the path is not aligned well (bottom row)
Tweet media one
1
0
6
@cvondrick
Carl Vondrick
3 years
Since hyperbolic space is continuous, the hierarchy is actually continuous as well! This lets us work with hierarchies of any depth. Here’s 3 levels deep.
1
1
6
@cvondrick
Carl Vondrick
4 years
We show pairwise performance between source and target languages. As you might expect, languages within the same family are easier to translate between. But our approach is language agnostic, and makes no assumptions on grammar or vocab. The full dataset is available online!
Tweet media one
1
0
5
@cvondrick
Carl Vondrick
5 years
Abhinav Gupta on self-supervision from video and robotics! Room is too packed to get in
Tweet media one
1
0
5
@cvondrick
Carl Vondrick
5 years
...but we should have large *evaluation* sets to establish diversity and significance
@random_forests
Josh Gordon
5 years
You don't always need large datasets to do "real" research in DL (a common misconception). Take a look at CycleGan for a counter example. A beautiful paper, with a relatively small amount of data:
0
34
137
0
1
5
@cvondrick
Carl Vondrick
3 years
The predictions are initially near the origin of the space, which corresponds to predicting the “root” node of the hierarchy. But over time, the prediction moves closer to the boundary of the space, corresponding to more specific forecasts.
1
0
5
@cvondrick
Carl Vondrick
4 years
We can also translate sentences, not just individual words! Of course, it works best on concrete visual concepts
Tweet media one
1
0
5
@cvondrick
Carl Vondrick
4 years
Starting now: Ivan Laptev from INRIA
Tweet media one
1
2
5
@cvondrick
Carl Vondrick
3 years
@cian_neuro Power efficiency.
0
0
5
@cvondrick
Carl Vondrick
4 years
@CSProfKGD @alfcnz I ended up using Screenflow, and I found it fantastic. It jointly records your screen, audio, and webcam. There is a simple UI to create different scenes.
1
0
5