Carl Vondrick @cvondrick Twitter profile

Last Seen Profiles

@OtoSatoh

@yadavat

@GoOnFookOff

@esrun202105

@alshmry70272

@zhrani2030

@CheyenneYote

@AnanHikal43948

@pursleynicki

@HadarMuchtar

@MNRHOCKEYTALKS

@Pitball2022

@KEDINY4

@fluffalcat

@YalcinCiftciGsi

@ultrasapbt

@demeatloaf

@jmilano24

@Berman7Do

@RM_Soapneel_07

@clementTFT

@Cmoiwsh7777

@FacundoDelGaiso

@Snduhukire

@a7f_d

@Nh_r29

${full_name}$

@tsatsu_jr

@zhsu13777

@Patbrdh

@echO10488222

@shan_tykate

@zushi_0088

@syedjunaidshahs

@AndreaRamo38977

@TheCribbsShow

@1TakeOcho

Carl Vondrick

@cvondrick

6 years

Our latest work shows that learning to colorize videos causes visual tracking to emerge automatically! Blog: Paper: @alirezafathi @kevskibombom @sguada @abhi2610

10

144

459

Carl Vondrick

@cvondrick

8 years

Fantastic conditional GAN results by Isola et al

6

250

419

Carl Vondrick

@cvondrick

3 years

The future is hard to anticipate! In our latest #CVPR2021 paper, we introduce a framework for learning *what* is predictable in the future. Rather than committing up front to categories to predict, our approach learns how to hedge the bet.

9

44

283

Carl Vondrick

@cvondrick

8 years

Finding Tiny Faces -- had to zoom in quite a bit to parse how cool the results are!

1

142

258

Carl Vondrick

@cvondrick

4 years

Learning unsupervised machine translation is easier if you open your eyes! Image distributions create transitive relations between languages. This creates incidental supervision for learning multilingual representations on 50 unpaired languages @Surisdi

4

43

242

Carl Vondrick

@cvondrick

6 years

Neural networks fooled by unusual poses

7

83

226

Carl Vondrick

@cvondrick

8 years

Learning Features by Watching Objects Move by Pathak et al

3

105

228

Carl Vondrick

@cvondrick

7 years

amazing generations of the video future! Red border means output, green is input.

1

96

215

Carl Vondrick

@cvondrick

8 years

SoundNet: Learning natural sound representations with convnets and 2 million unlabeled videos.

2

72

175

Carl Vondrick

@cvondrick

8 years

Recognizing objects and scenes from sound only. Turn on your speakers! More visualizations:

1

95

168

Carl Vondrick

@cvondrick

4 years

What causes adversarial examples? Latest #ECCV2020 paper from @ChengzhiM and Amogh shows that deep networks are vulnerable partly because they are trained on too few tasks. Just by increasing tasks, we strengthen robustness for each task individually.

4

37

150

Carl Vondrick

@cvondrick

7 years

Unsupervised Learning by Predicting Noise by Bojanowski and Joulin. Cool yet simple idea that works quite well!!

3

60

149

Carl Vondrick

@cvondrick

4 years

Oops! Dave+Bo introduce a dataset of unconstrained videos showing unintentional action. We study self-supervised approaches for learning video representations of intentionality. #CVPR2020 Poster 93, Tue 10am PST Website: Paper:

2

24

119

Carl Vondrick

@cvondrick

3 years

Learning from Unlabeled Video ( #LUV ❤️‍🔥) starts today at 1:50pm EDT / 10:50am PDT! You will LUV the speaker lineup and the curated papers! 😍 featuring @pathak2206 @akanazawa @SongShuran and more #CVPR2021 #CVPR21 #CVPR

2

15

111

Carl Vondrick

@cvondrick

7 years

predicting the future, with semantic segmentations! by Neverova and Luc

1

43

99

Carl Vondrick

@cvondrick

8 years

Cross-Modal Scene Networks: learning aligned representations across several different modalities

1

47

92

Carl Vondrick

@cvondrick

7 years

CVPR workshop on negative results!

0

64

85

Carl Vondrick

@cvondrick

3 years

Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!

2

15

84

Carl Vondrick

@cvondrick

4 years

Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos.

2

13

77

Carl Vondrick

@cvondrick

6 years

Got many replies. I don't believe the problem has to do with neural nets. The problem is the paradigm of supervised classification and closed datasets. We need models that learn from an open world, with self-supervision, never stop learning, and transfer between tasks.

Carl Vondrick

@cvondrick

6 years

Neural networks fooled by unusual poses

7

83

226

5

18

73

Carl Vondrick

@cvondrick

7 years

See, hear, read: deep representations shared over 3 natural modalities. Units activate on objects in each modality.

1

24

67

Carl Vondrick

@cvondrick

4 months

With just a few hours of experimentation in the physical world, a robot can learn on its own to design and throw paper airplanes further than a person, and even learn to build robot grippers out of cheap paper. No foundation models. No simulation. No language.

Ruoshi Liu

@ruoshi_liu

4 months

Humans can design tools to solve various real-world tasks, and so should embodied agents. We introduce PaperBot, a framework for learning to create and utilize paper-based tools directly in the real world.

7

29

171

0

2

60

Carl Vondrick

@cvondrick

4 years

Videos of the full workshop are now available on YouTube: . Thanks everyone, especially the speakers, for a great workshop!

LUV 2020

News & Updates Jun 19, 2020: The workshop was a great success! Thank you everyone. Jun 6, 2020: Workshop program is available. The workshop will be fully virtual with an assortment of live and...

sites.google.com

Carl Vondrick

@cvondrick

4 years

Learning from Unlabeled Video Workshop -- starting now! First up: Andrea Vedaldi (Oxford) on Learning Representations and Geometry from Unlabelled Videos.

2

13

77

0

9

64

Carl Vondrick

@cvondrick

4 years

Our new paper (w/ @Surisdi ,Dave) shows Transformers can meta-learn a process for language acquisition from vision. At inference, the policy adapts to new words and generalizes better. #CVPR2020 Paper: Talk: Mon 11:40am PST

0

16

62

Carl Vondrick

@cvondrick

3 years

I am so excited to be part of this dream team. We will be investigating the next generation of ML and predictive models for truly planetary scale problems. If you are passionate about cutting-edge ML coupled with societal impact, please apply to Columbia for various positions!

Columbia University

@Columbia

3 years

Hurricane Ida made one thing clear: we are not prepared for the extreme weather caused by #climatechange . A new climate modeling center is designed to improve climate projections and encourage societies to plan for the inevitable disruptions ahead. @NSF

0

9

36

0

1

60

Carl Vondrick

@cvondrick

4 years

Sssshhh!! There is so much noise in cities today. Ruilin and Rundi introduce a new approach that removes ambient noise from audio, letting the speech come through loud and clear. Let's have a listen... 🔊Turn on your speakers! 🔗

3

9

59

Carl Vondrick

@cvondrick

7 years

great high-res image manipulation by interpolating in feature space -- simple, no GAN required (Upchurch et al)

1

15

57

Carl Vondrick

@cvondrick

7 years

Network visualization, dissection, and interpretability by David Bau and Bolei Zhou at MIT! @zhoubolei

0

25

57

Carl Vondrick

@cvondrick

5 years

Learn about Learning from Unlabeled Videos at #CVPR2019 , Sunday in Room E, 9:00am Fresh posters and keynotes: Antonio Torralba, Noah Snavely, Andrew Zisserman, Bill Freeman, Abhinav Gupta, Kristen Grauman

LUV 2019

All invited talks and oral sessions will be in Room E in Hyatt Regency. Morning posters will be in the Pacific Arena Ballroom (main convention center). Afternoon posters will be in Room E in Hyatt...

sites.google.com

1

10

54

Carl Vondrick

@cvondrick

4 years

I had a joke about homography, but it was too plane.

1

0

53

Carl Vondrick

@cvondrick

5 years

Announcing the Workshop on Learning from Unlabeled Video at CVPR 2019. Come for dynamite speakers, and stay for the abstracts! Abstract deadline is March 4. Topics include self-supervised learning, sound and vision, visual anticipation, active vision, etc

LUV 2019

All invited talks and oral sessions will be in Room E in Hyatt Regency. Morning posters will be in the Pacific Arena Ballroom (main convention center). Afternoon posters will be in Room E in Hyatt...

sites.google.com

0

10

47

Carl Vondrick

@cvondrick

5 years

Self-supervised learning is prediction, and unsupervised learning is compression (in my view)

Charles Sutton

@RandomlyWalking

5 years

@jmhessel “Self-supervised” is a rebranding for “unsupervised” to avoid confusing people who ask Qs like “how can LMs be unsupervised if you give them the next token to predict”? I dislike rebranding, but I dislike even more arguing about whether LMs are unsupervised. So,🤷‍♂️?

11

12

113

5

1

47

Carl Vondrick

@cvondrick

6 years

The "deep inversion" quiz by Oxford: how well do you understand neural network visualizations?

2

21

47

Carl Vondrick

@cvondrick

3 years

Don’t want pi day to end? Come to the hyperbolic world! In hyperbolic space, pi has no upper bound. You can eat pie for the rest of the year.

Carl Vondrick

@cvondrick

3 years

Our predictive model is hyperbolic, which naturally encodes hierarchical structure. When the model is most confident, it will predict at a concrete level of the hierarchy. But when not confident, the *mean* solution automatically selects a higher level!

2

15

84

1

4

45

Carl Vondrick

@cvondrick

8 years

learning to recognize objects with only a few examples -- exciting 'low data' paradigm

0

13

44

Carl Vondrick

@cvondrick

7 years

Important warning of non-peer reviewed papers: public can lose trust in science and research if too much low-quality work is posted.

Kosta Derpanis

@CSProfKGD

7 years

To preprint or not. This debate sounds strangely familiar #ComputerVision

1

4

8

2

11

43

Carl Vondrick

@cvondrick

3 years

Turn any container into a smart container — all you need is noise!

Boyuan Chen

@Boyuan__Chen

3 years

How can we tell "what is where" inside a container, after dropping something into it? Can we generate visual scenes from sound? Excited to share our latest work: The Boombox: Visual Reconstruction from Acoustic Vibrations. ()

1

2

31

2

43

Carl Vondrick

@cvondrick

3 years

@jbhuang0604 Conclusion is where you accidentally tell reviewers how to reject your paper!

1

42

Carl Vondrick

@cvondrick

7 years

"Do Good Research" by Fredo Durand

1

22

40

Carl Vondrick

@cvondrick

6 years

CVPR should be "Computer Vision, Prediction, and Robotics"

0

1

36

Carl Vondrick

@cvondrick

8 years

cool idea to interactively reconfigure pretrained CNNs in order to recognize unseen classes, by Krishnan and Ramanan

0

17

35

Carl Vondrick

@cvondrick

5 years

The computer vision group at Columbia is looking for a postdoctoral fellow. Come wrangle pixels with us in the big city. More info:

0

15

31

Carl Vondrick

@cvondrick

6 years

Excellent piece, but I disagree we should give up our datasets. To get commonsense and generalization, we should create rich & diverse multi-modal datasets that span huge number of tasks. Probably need new data collection means, eg interaction and self-supervision (not MTurk)

Melanie Mitchell

@MelMitchell1

6 years

My opinion piece in the NY Times.

25

202

414

0

5

26

Carl Vondrick

@cvondrick

8 years

released 35 million video clips! stabilized, natural video. 1 year! fun dataset for generative video models

0

15

24

Carl Vondrick

@cvondrick

3 years

@rzhang88 I should see a doctor ASAP! AI is going to save my life!

1

0

24

Carl Vondrick

@cvondrick

8 years

Learning camouflaged QR codes

Dmitry Ulyanov

@DmitryUlyanovML

8 years

A nice paper from our lab on learning visual codes, to appear at NIPS

1

18

48

0

7

23

Carl Vondrick

@cvondrick

3 years

Congratulations to @Surisdi and Ruoshi Li on their #CVPR2021 paper!! and checkout the video below for an hour long talk with all the details and results!

Dídac Surís - Learning the Predictability of the Future

February 16th, 2021. MIT, CSAILAbstract: Not everything in the future is predictable. We cannot anticipate the outcomes of coin flips, and we cannot forecast...

www.youtube.com

0

3

22

Carl Vondrick

@cvondrick

7 years

Learning visual and auditory representations simultaneously from video!

Relja Arandjelović

@relja_work

7 years

My first paper at DeepMind: What can be learnt by looking at and listening to a large amount of unlabelled videos?

4

105

266

0

2

21

Carl Vondrick

@cvondrick

7 years

2

20

Carl Vondrick

@cvondrick

8 years

Learning to find moving objects irrespective of camera motion, by Tokmakov et al.

1

9

20

Carl Vondrick

@cvondrick

3 years

Most predictive models operate in Euclidean space. However, when there is uncertainty or multiple modes, the optimal solution is to regress the mean, which often lacks any interpretation. Our idea: Let’s make the mean mean something!

1

2

18

Carl Vondrick

@cvondrick

5 years

Map of emotional responses versus audible gasps:

0

3

19

Carl Vondrick

@cvondrick

7 years

0

4

19

Carl Vondrick

@cvondrick

3 years

Predictive models on physical robots learn rich features about their surroundings -- they learn about obstacles and even the policy of other robots. Latest paper with @BoyuanChen1 and @hodlipson , out today!

Columbia Engineering

@CUSEAS

3 years

Can a robot be empathetic? @MechCU Prof @hodlipson thinks so: his lab has created a robot that learns to visually predict how its partner robot will behave. @Columbia

1

8

16

0

3

19

Carl Vondrick

@cvondrick

3 years

Hyperbolic geometry for machine learning and computer vision is a young and rapidly growing area. We are not the first to work with this geometry, and we will not be the last! Code, models, data, visuals, and links to tutorials are on our project website:

1

0

17

Carl Vondrick

@cvondrick

7 years

Photos with the smart phone removed

0

5

15

Carl Vondrick

@cvondrick

8 years

what do we visualize when we visualize ConvNets? important question:

0

7

16

Carl Vondrick

@cvondrick

7 years

"Person analysis using cheap and large-scale synthetic data" in Learning from Synthetic Humans by Varol et al

0

5

16

Carl Vondrick

@cvondrick

8 years

preprint for "Generating Videos with Scene Dynamics": adversarial nets for video generation & learning & prediction

1

7

14

Carl Vondrick

@cvondrick

7 years

Low-power vision mode that produces lower-quality image data suitable only for computer vision -- by Buckler et al

0

2

15

Carl Vondrick

@cvondrick

6 years

@haldaume3 you can silently and randomly add questions that have a single, well-defined answer that you also know. then, discard all workers that fail those "quiz" questions

1

0

13

Carl Vondrick

@cvondrick

6 years

Rocks that convert CO2 into stone:

How Oman’s Rocks Could Help Save the Planet (Published 2018)

The rocks in this part of the world have a special ability: They can turn carbon dioxide into stone.

www.nytimes.com

1

2

13

Carl Vondrick

@cvondrick

8 years

NIPS 2014: 3 reviews, 6k char rebut. 2015: 4 reviews, 5k char rebut. 2016: 6 reviews, 3k char rebut. 2017: 9 reviews, tweet rebuttal ?!

0

4

13

Carl Vondrick

@cvondrick

4 years

Last keynote: Alyosha Efros and Allan Jabri on learning space-time correspondences, starting in 5 min

1

12

Carl Vondrick

@cvondrick

7 years

For example, a hidden unit automatically emerges for dogs. It activates on images of dogs, sentences about dogs, or sounds of barking

0

12

Carl Vondrick

@cvondrick

8 years

@cvondrick learns nice convolutional filters for raw waveforms, without ground truth labels

3

2

12

Carl Vondrick

@cvondrick

6 years

@dimadamen Thank you for the kind words!! We finally got the paper up on arxiv tonight: Video of results are coming soon...

Tracking Emerges by Colorizing Videos

We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to...

arxiv.org

0

4

12

Carl Vondrick

@cvondrick

6 years

Dear twitter, how do you take notes and jot down ideas for research? Do you use an app, pen/paper, memory?

11

2

11

Carl Vondrick

@cvondrick

4 years

The main idea: Natural audio will contain intervals of silence, which we can leverage as incidental supervision for learning to denoise. By learning to first detect these pauses, we can estimate a profile for the noise, and suppress it throughout the audio.

1

0

11

Carl Vondrick

@cvondrick

5 years

Saturday at #ICML2019 !

Aäron van den Oord

@avdnoord

5 years

Excited to announce our #ICML2019 Workshop on Self-Supervised Learning! Covering- Vision, NLP, Audio, Robotics, RL ... Submissions now open - deadline April 25! Speakers: @ylecun , @chelseabfinn , Andrew Zisserman, Alexei Efros, Jacob Devlin, Abhinav Gupta

2

56

244

0

11

Carl Vondrick

@cvondrick

5 years

Andrew Zisserman on leveraging temporal coherence and sound to learn from video!

1

2

9

Carl Vondrick

@cvondrick

3 years

Here’s an example. As the model observes more of the video, the future becomes more and more predictable. Our model makes increasingly specific forecasts of the future.

1

0

11

Carl Vondrick

@cvondrick

3 years

Just by changing predictive models to work in hyperbolic space instead of Euclidean space, the model automatically learns to select the right level of abstraction under uncertainty!

2

0

10

Carl Vondrick

@cvondrick

6 years

Learning to see in the dark: super cool results!

0

2

10

Carl Vondrick

@cvondrick

7 years

Fine-grained sound recognition, plus a fun dataset collection!

Kosta Derpanis

@CSProfKGD

7 years

Why an #ArtificialIntelligence firm is busy smashing thousands of windows

0

4

0

2

10

Carl Vondrick

@cvondrick

3 years

@keenanisalive What about IoU?

Generalized Intersection over Union: A Metric and A Loss for...

Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for...

arxiv.org

1

0

10

Carl Vondrick

@cvondrick

4 years

@dimadamen There’s also multiple modes, eg multimodal prediction

2

0

10

Carl Vondrick

@cvondrick

8 years

Great results from the Scene Parsing Challenge with Places Database

0

1

9

Carl Vondrick

@cvondrick

8 years

@cvondrick code and pre-trained models available on github:

GitHub - cvondrick/soundnet: SoundNet: Learning Sound Representations from Unlabeled Video. NIPS...

SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016 - cvondrick/soundnet

github.com

0

2

9

Carl Vondrick

@cvondrick

2 years

Congratulations Dídac!! @Surisdi

ColumbiaCompSci

@ColumbiaCompSci

2 years

Didac Suris ( @Surisdi ), one of our PhD students, won a Microsoft Research Fellowship ( @MSFTResearch )! Learn more about him and his PhD experience here -

2

4

51

0

9

Carl Vondrick

@cvondrick

5 years

Cool paper from Berkeley: learn 3D flow from unlabeled stereo videos

0

1

9

Carl Vondrick

@cvondrick

21 days

Generative video models facing the physical world 👇 #CVPR2024

Ruoshi Liu

@ruoshi_liu

22 days

Recently released video generation models are amazing😍 How can we use them in robotics to learn generalizable visuomotor policies? Come find out in my talks at these 4 CVPR workshops next week, where I will talk about recent works in 3D, generative models, and robotics!

1

6

53

0

9

Carl Vondrick

@cvondrick

6 years

@farhanhubble @quantombone Videos are coming soon!!

1

0

8

Carl Vondrick

@cvondrick

4 years

While each language represents a bicycle with a different word, the underlying visual representation remains consistent. A bicycle has a similar appearance in the UK, France, Japan, and India. We leverage this natural property for translating unpaired languages.

1

8

Carl Vondrick

@cvondrick

5 years

Antonio Torralba on multi-modal learning and self-supervised learning

1

0

7

Carl Vondrick

@cvondrick

8 years

@cvondrick basic idea is: visual recognition networks teach networks for sound, enabling learning from tons of unlabeled video

1

2

7

Carl Vondrick

@cvondrick

6 years

@CSProfKGD @quantombone @farhanhubble Ok, videos are finally out!!

1

2

7

Carl Vondrick

@cvondrick

4 years

@dimadamen @fdellaert @Oxford_VGG Video should be on YouTube next week. Thanks everyone for attending and great questions, and especially Yale Song for leading the behind the scenes!

0

7

Carl Vondrick

@cvondrick

4 years

It learns how to translate individual words across 50 fifty languages... even without paired language supervision

1

0

6

Carl Vondrick

@cvondrick

5 years

@LakeBrenden @washingtonpost To be fair, I’m a human who also needs instructions to solve a Rubik cube!

0

6

Carl Vondrick

@cvondrick

4 years

The approach finds very interesting transitive paths between languages via vision, which we show below. When there is a strong path, the final score is high (top row), and it's low when the path is not aligned well (bottom row)

1

0

6

Carl Vondrick

@cvondrick

3 years

Since hyperbolic space is continuous, the hierarchy is actually continuous as well! This lets us work with hierarchies of any depth. Here’s 3 levels deep.

1

6

Carl Vondrick

@cvondrick

4 years

We show pairwise performance between source and target languages. As you might expect, languages within the same family are easier to translate between. But our approach is language agnostic, and makes no assumptions on grammar or vocab. The full dataset is available online!

1

0

5

Carl Vondrick

@cvondrick

5 years

Abhinav Gupta on self-supervision from video and robotics! Room is too packed to get in

1

0

5

Carl Vondrick

@cvondrick

5 years

...but we should have large *evaluation* sets to establish diversity and significance

Josh Gordon

@random_forests

5 years

You don't always need large datasets to do "real" research in DL (a common misconception). Take a look at CycleGan for a counter example. A beautiful paper, with a relatively small amount of data:

0

34

137

0

1

5

Carl Vondrick

@cvondrick

3 years

The predictions are initially near the origin of the space, which corresponds to predicting the “root” node of the hierarchy. But over time, the prediction moves closer to the boundary of the space, corresponding to more specific forecasts.

1

0

5

Carl Vondrick

@cvondrick

4 years

We can also translate sentences, not just individual words! Of course, it works best on concrete visual concepts

1

0

5

Carl Vondrick

@cvondrick

4 years

Starting now: Ivan Laptev from INRIA

1

2

5

Carl Vondrick

@cvondrick

3 years

@cian_neuro Power efficiency.

0

5

Carl Vondrick

@cvondrick

4 years

@CSProfKGD @alfcnz I ended up using Screenflow, and I found it fantastic. It jointly records your screen, audio, and webcam. There is a simple UI to create different scenes.

1

0

5