Amir Zamir @zamir_ar Twitter profile | Pikagi

Pikagi

Amir Zamir

@zamir_ar

4,522

Followers

639

Following

80

Media

882

Statuses

Assistant Prof of CS, @EPFL_en Swiss Federal Institute of Technology. Previously @Berkeley_AI , @StanfordAILab , @ucf . Into #ComputerVision , #MachineLearning , #AI

https://t.co/ztK06nSJHq

Joined October 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@zamir_ar

Amir Zamir

4 months

We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the

@zamir_ar

Amir Zamir

11 months

We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple . 4M: Massively Multimodal Masked Modeling 🌐 🧵1/n

Tweet media one

8

138

614

6

89

326

Last Seen Profiles

@mediacology

@nyjetshistory

@AshokBunny7872

@catheeeeey

@HerefordHerdFB

@crot_ayo

@Asilian94

@swnwnyc

@BrittBuell

@mrjakedwood

@Moo_Mint

@BrawlerBearz

@tarveentalks

@stw46

@AElg3edy

@PUOOL11

@AllDefi14240

@Chaky_Art

@osusume_navi_PR

@marigolde_uas

@jeanna_sellers

@BraFrizzl

@caressive_loveN

@AUKhan2u

@BinorRaja

@ITServeorg

@nedzuko446043

@jmcdgobi

@hlohungary

@blemish_Missis

@ThaMan84567151

@gadiagaalu

@ajc254

@o103o103

@bokeplokalmalam

@larral_ro

@zamir_ar

Amir Zamir

4 years

Salaries of PhD students, postdocs, and professors in Europe and Switzerland. Numbers are in Euro. Special attention for future students and faculties 😜

Tweet media one

55

145

704

@zamir_ar

Amir Zamir

11 months

We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple . 4M: Massively Multimodal Masked Modeling 🌐 🧵1/n

Tweet media one

8

138

614

@zamir_ar

Amir Zamir

5 years

Is it a good idea to train RL policies from raw pixels? Could visual priors about the world help RL? We just released the code of our Mid-Level Vision paper addressing these questions. Spoiler: using raw pixels doesn’t generalize! Play with the results at

0

109

380

@zamir_ar

Amir Zamir

11 months

I will hire PhD students and postdocs, especially in large multimodal models and related areas.

Tweet card media

Computer and Communication Sciences – Our doctoral program covers all areas of computer science and of information and communication theory, from its mathematical foundations to systems, platforms,...

@zamir_ar

Amir Zamir

11 months

We are releasing the 1st version of 4M, a framework for training multimodal foundation models across tens of modalities & tasks, based on scalable masked modeling. Joint effort by @EPFL_en & @Apple . 4M: Massively Multimodal Masked Modeling 🌐 🧵1/n

Tweet media one

8

138

614

8

41

335

@zamir_ar

Amir Zamir

2 years

We present MultiMAE at #ECCV2022 on Wed. MultiMAE is a general multi-modal & multi-task pre-training strategy based on masked autoencoders. It shows notable results in cross-modal representation learning and transfer learning. 1/5

2

45

267

@zamir_ar

Amir Zamir

2 years

I’m honored! The recognition goes to all of my collaborators, as much as it does to me. Thank you! 🙏 @eccvconf

@sitzikbs

Yizhak Ben-Shabat (Itzik) 💔

2 years

Second young researcher award. Congrats @zamir_ar

Tweet media one

1

0

7

29

9

247

@zamir_ar

Amir Zamir

4 years

Progress on #consistency & #multitask learning. Existing methods give inconsistent results across tasks, even if joint trained. We developed a general method for learning w/ Cross-Task Consistency. It gave notable gains for anything we tried. Live #demo :

3

58

240

@zamir_ar

Amir Zamir

2 years

Happy to share that CLIPasso will be one of the best paper awards of #SIGGRAPH 2022. Congrats to the entire team! CLIP turned out to be a powerful perceptual loss.

Tweet media one

@YVinker

Yael Vinker🎗

2 years

Very excited to share that CLIPasso was selected as one of the five Best Paper Award winners at #SIGGRAPH this year!🎨🎉🏆 It is such a great honor!! Special thanks to all my teammates @Esychology @jbo_twt @roman__bachmann @DanielCohenOr1 @bermano_h @zamir_ar and Arik Shamir

2

13

65

3

29

178

@zamir_ar

Amir Zamir

3 years

We released OMNIDATA: a pipeline for creating steerable vision datasets. It gives the user control over generating the desired dataset using real-world 3D scans. It bridges vision datasets (pre-recorded data) and simulators (online data generation). Demo:

@iamsashasax

Sasha (Alexander) Sax

3 years

Vision datasets (i.e. ImageNet) are usually collected once for a fixed task. But how do we know the choice of camera intrinsics, tasks, etc. is a good one? Our ICCV paper on “steerable datasets” addresses this problem and gets 'human-level' surface normal preds along the way (⅓)

1

9

40

5

28

156

@zamir_ar

Amir Zamir

1 year

Is it possible to adapt a neural network on the fly at the test time to cope with distribution shifts? RNA does precisely that by creating a closed-loop feedback system. We will present it on Wed afternoon at @ICCVConference . 1/n

2

21

139

@zamir_ar

Amir Zamir

5 years

Next time someone tells you reaching "human-level" at task X is the holy grail in AI, show them this video. All it takes is making the task narrow enough and there is a way to brutally outperform humans already. Being as *broad* as humans/animals is the challenge.

@hardmaru

hardmaru

5 years

Teams of high school students built bottle-flipping robots for RoboCon 2018 in Japan

34

866

3K

3

26

85

@zamir_ar

Amir Zamir

2 years

I will hire again from the Summer @EPFL program this year. Several great projects came out of S @E interns in the past, eg CLIPasso (SIGGRAPH22 best paper), Omnidata (ICCV21). Apply if our interests align. (this is for BS/MS interns. PhD visitors have another program)

@ICepfl

EPFL Computer and Communication Sciences

2 years

The Summer @EPFL 2023 application site is now open! 🎊 To apply, please visit the Summer @EPFL website: . The application deadline for all students is the Sunday closest to the 1st of December (anywhere on earth).

Tweet media one

0

11

35

3

14

76

@zamir_ar

Amir Zamir

4 years

OpenBot is a step in the right direction. Massively scalable robotic platforms are great. I dream of an army of little (harmless!) robots running around visually exploring and making sense of the world.

Tweet card media

OpenBot: Turning Smartphones into Robots

Current robots are either expensive or make significant compromises on sensory richness, computational power, and communication capabilities. We propose to l...

www.youtube.com

2

7

71

@zamir_ar

Amir Zamir

11 months

We'll present at NeurIPS, today at 5pm CST. Spotlight #1022 . Effectively bringing sensory modalities to large models is one way to make them more grounded, and ultimately have a more complete World Model. This is a step in that direction hopefully, and more will come.

@zamir_ar

Amir Zamir

11 months

4M exhibits having learned a solid cross-modal representation. We can use the various modalities to probe how 4M reconciles unusual inputs by manipulating one part of it while keeping the remainder fixed. (8/n)

2

4

18

1

8

71

@zamir_ar

Amir Zamir

6 years

Gibson Database of Spaces includes 572 buildings,1447 floors, and >2million ft². All real buildings scanned and #3D reconstructed. Worth a few years of human visual experience. Browse the spaces by videos & 3D cuts: #perception #robotics #dataset #vision

Tweet media one

1

19

60

@zamir_ar

Amir Zamir

4 years

The point isn’t making big $$$ as a student/postdoc, but to live comfortably to allow focusing on research, rather than financial preoccupation. Especially if supporting a family. I think the general picture of the table remains true after considering living expense and variances

5

1

59

@zamir_ar

Amir Zamir

1 year

I will certainly hire again from the Summer @EPFL program this year.

@martin_schrimpf

Martin Schrimpf

@martin_schrimpf

1 year

Applications now open for the Summer @EPFL program -- 3-month fellowship for Bachelor/Master students to immerse yourself in research

Tweet media one

6

38

154

0

2

51

@zamir_ar

Amir Zamir

5 years

Exactly 3 years ago we proposed to #CVPR with @ozansener . Today glad to see the @nature article on importance of negative results. “one of the worst aspects of science today: its toxic definitions of success”.

0

12

40

@zamir_ar

Amir Zamir

1 year

Visual odometry is a basic function for embodied AI. At #CVPR23 we will present a multi-modal & modality-invariant visual odometry framework called Visual Odometry Transformer (VOT). Also I give a talk on multi-modal learning on several projects at the Multiearth w/ on Mon. 🧵

1

8

38

@zamir_ar

Amir Zamir

1 year

Tomorrow at @CVPR , I'll give a talk about recent works on multi-modal and multi-task masked modeling for creating vision foundation models. 1:45 PM @ West 109 - 110

Tweet media one

0

2

33

@zamir_ar

Amir Zamir

4 years

Learning with cross-task consistency was one of the #CVPR 2020 best paper award nominees. Congrats to the entire team at @StanfordAILab @berkeley_ai @ICepfl . And congrats to the winning paper by our @Oxford_VGG colleagues.

@zamir_ar

Amir Zamir

4 years

Progress on #consistency & #multitask learning. Existing methods give inconsistent results across tasks, even if joint trained. We developed a general method for learning w/ Cross-Task Consistency. It gave notable gains for anything we tried. Live #demo :

3

58

240

0

3

29

@zamir_ar

Amir Zamir

6 years

This gem never gets old. Great for a break from arxiv. It’s remarkable how much jargon education vs so little critical thinking training we receive in AI today. Watch the first minute and you’ll be sold. Science wisdom by @ProfFeynman .

0

4

28

@zamir_ar

Amir Zamir

2 years

We will present Task Discovery at #NeurIPS on Thur. Large NNs are known to fit any *training* labels. But learning from what labels would lead to *generalization*? Can we find such labels/tasks for an unlabeled dataset automatically? What would they mean?

Tweet media one

@andrew_atanov

Andrei Atanov

2 years

What are the tasks that a neural net generalizes on? In our #NeurIPS2022 paper, we introduce a Task Discovery 🔎 framework to approach this question and automatically find such tasks. We show how such tasks look and what they reveal about NNs. 🌐 🧵1/9

Tweet media one

1

10

52

1

1

26

@zamir_ar

Amir Zamir

2 years

Classical sampling-based planning algorithms in robotics (eg RRT,PRM) are efficient, performant & interpretable. Are they useful in learning-based frameworks? PALMER( #NeurIPS22 , #CoRL22 w) shows they can be effectively repurposed for learning-based frameworks & representations 🧵

1

3

25

@zamir_ar

Amir Zamir

4 years

Predecessor of color bias in ML datasets

@voxdotcom

Vox

8 years

Color film was designed for white people. Here's what it did to dark skin:

32

2K

3K

0

10

25

@zamir_ar

Amir Zamir

5 years

Gibson environment's ~600 buildings mesh rendered directly in PyBullet physics engine! FPS >5000! Great work by @erwincoumans . Check here if you want to visit inside these buildings: . Erwin's PuBullet rendering:

Tweet card media

Gibson environment rendered directly in PyBullet

Beautiful data from http://gibsonenv.stanford.eduCan run at 5000 FPS, disable OpenGL V-Sync! When getting the camera image it is limited by GPU to CPU read-b...

www.youtube.com

@erwincoumans

Erwin Coumans 🇺🇦

5 years

>5000 FPS indoor rendering in PyBullet, using beautiful scanned assets and texture atlas from the Stanford Gibson project:

2

4

36

0

2

18

@zamir_ar

Amir Zamir

2 years

For a live demo, interactive visualizations, code, and the summary video, see . If you’re attending #ECCV2022 , come chat on Wednesday afternoon, w/ @roman__bachmann @dmizrahi_ @andrew_atanov . 5/5

Tweet media one

1

0

17

@zamir_ar

Amir Zamir

11 months

There have been demos of “multimodal foundation model” results – but one with a demonstrable deep & broad understanding of the input like 4M’s is unprecedented. It’s not an image+text conversational model, but one that extracts a deeper understanding of the scene. (2/n)

Tweet media one

1

2

21

@zamir_ar

Amir Zamir

5 years

If you want to see more that #turtlebot and two finger gripper arms, Jamie Paik @robotician gave a keynote talk with fun videos at #CoRL 2019 on soft robotics and intuitive interactions.

0

3

16

@zamir_ar

Amir Zamir

4 years

Tiny Images dataset (>1700 citations) was permanently taken down, due to (unintended) inclusion of inappropriate language and images, found by Prabhu&Birhane. Clearly, everything we do (and did) in computer vision is now under a bigger scrutiny magnifier!

Tweet media one

1

0

16

@zamir_ar

Amir Zamir

3 years

The poster session of Cross-Domain Ensembles (ICCV oral) is in 1.5 hours 🙂

@aseretys

Teresa Yeo

3 years

We introduce a general approach for enforcing diversity in ensembles. It leads to notable improvements in #robustness on a wide range of tasks and datasets for #adversarial and non-adversarial shifts. Joint work with @oguzhanthefatih and @zamir_ar Website:

2

3

30

0

5

15

@zamir_ar

Amir Zamir

5 years

#Cycle -consistency by #Rumi ? “Appear as you are & Be as you appear”

Tweet media one

1

0

15

@zamir_ar

Amir Zamir

6 years

Adversarial t-shirts are coming?

1

2

13

@zamir_ar

Amir Zamir

2 years

@zdeborova 5 mins. High level.

Tweet card media

Text to Image: Part 2 -- how image diffusion works in 5 minutes

Part 1: https://youtu.be/GYyP7Ova8KAHow image diffusion worksText + Image Generation Playlist: https://youtube.com/playlist?list=PLWfDJ5nla8UoG2mvvHs_OS0as...

www.youtube.com

0

0

13

@zamir_ar

Amir Zamir

2 years

Via this objective, MultiMAE learns cross-modal predictive coding. The video showcases an example, where we input only depth & two RGB patches. The hue of one patch is being changed. The model propagates the colors semantically and according to depth. More examples on webpage.3/5

1

4

13

@zamir_ar

Amir Zamir

4 years

@Elnaz_AK Indeed. But I have lived in the Bay Area and Switzerland is not bad at all. And at least I can see some benefit out of my extra payments 😉

1

0

13

@zamir_ar

Amir Zamir

11 months

4M exhibits having learned a solid cross-modal representation. We can use the various modalities to probe how 4M reconciles unusual inputs by manipulating one part of it while keeping the remainder fixed. (8/n)

2

4

18

@zamir_ar

Amir Zamir

6 years

New York Times @nytimes article on home robotics, failures of the past, and (not-so-low-hanging) potentials for the future. Covered our Gibson environment too. "What Comes After the Roomba?"

1

0

11

@zamir_ar

Amir Zamir

5 years

No interaction with the world yet, but clearly some nontrivial muscle control and behavior is present. Always interesting to contemplate how much cognitive and control bias we are born with, before any learning occurs.

@TerriGreenUSA

Terri Green

6 years

@Alyssa_Milano @BrianKempGA This incredible 4D scan captured footage of what unborn fetuses do in the womb.

4

8

20

0

1

12

@zamir_ar

Amir Zamir

3 years

The Nature of Robotics exhibition at EPFL Pavilions. Catch the last two days.

Tweet media one

@EPFLPavilions

EPFL Pavilions

3 years

Crowning a successful Nature of Robotics exhibition, EPFL Pavilions would like to invite you to a guided virtual tour with the exhibition's curator, Giulia Bini. Join us today at 6 PM on Instagram: #virtualtour #natureofrobotics #epflpavilions

Tweet media one

0

0

3

1

0

11

@zamir_ar

Amir Zamir

11 months

The work is led by @dmizrahi_ & @roman__bachmann , along with @oguzhanthefatih , @aseretys , Mingfei Gao, @aligarjani , David Griffiths, @hujm99 , @afshin_dn , @zamir_ar at @EPFL_en & @Apple . We'll present at NeurIPS, Wed at 5-7pm CST (spotlight #1022 ) 🌐 (n/n)

2

2

11

@zamir_ar

Amir Zamir

4 years

Very true for research.

@tbCvh863cMspPa

Ⓥ 🌱 🐮 🇨🇦 🇺🇦

@tbCvh863cMspPa

4 years

Craving attention interferes with creativity

0

0

1

0

0

10

@zamir_ar

Amir Zamir

11 months

4M trains a single Transformer jointly on many diverse modalities. The key to making it scalable was relying on tokenization to remove modality-specific intricacies, then masking tokens from both the inputs and targets to encourage multimodal fusion & improve efficiency. (3/n)

1

2

15

@zamir_ar

Amir Zamir

11 months

4M can perform compositional generation by weighting different conditions by different amounts, even negatively. This allows the user to control precisely how strongly or weakly a generated output should follow each condition. (9/n)

1

2

12

@zamir_ar

Amir Zamir

1 year

@francoisfleuret Goes well in the collection of littles

Tweet media one

0

1

8

@zamir_ar

Amir Zamir

5 years

@andrey_kurenkov ImageNet pretraining doesn't work well if the task isn't based on object semantics (eg monocular 3D) or images aren't from internet users (ie Flickr, instagram, etc style). See taskonomy analysis & the works that apply ImageNet models on images coming from robot onboard cameras.

1

2

7

@zamir_ar

Amir Zamir

1 year

@docmilanfar I empathize. Such itemized recipes exist because they’re tempting (to both the speaker and the audience). We like them because following them would provide a tangible path to greatness. We don’t want to believe often there isn’t any; and the lists are usually over generalization.

1

0

7

@zamir_ar

Amir Zamir

11 months

Through controlled ablations, we found that increasing the number of pre-training tasks generally improves transfer performance, got insights into the masking strategy, and observed promising scaling trends in terms of dataset and model size. (13/n)

Tweet media one

1

1

10

@zamir_ar

Amir Zamir

2 years

MultiMAE has a simple and efficient pre-training objective: mask out a large number of patches from multiple input modalities, and learn to reconstruct them from the remaining information. 2/5

Tweet media one

Tweet media two

Tweet media three

1

0

8

@zamir_ar

Amir Zamir

4 years

@colinraffel ImageNet performance is not a full representation of “learning from limited labeled data” though. The trends on other tasks (eg single image 3D) don’t quite hold up. There seem to be some ImageNet/object classification overfitting in methodologies.

1

0

8

@zamir_ar

Amir Zamir

11 months

4M has multimodal retrieval capabilities, by adding global embeddings of models like DINOv2 or ImageBind to the set of 4M modalities, that were not possible with the original networks. 4M effectively distilled contrastive models using a more generative objective. (11/n)

Tweet media one

1

2

11

@zamir_ar

Amir Zamir

11 months

We trained 4M on different kinds of image, semantic, and geometric metadata extracted from the pseudo labels, enabling a high degree of control over the generation process and strong potential for steerable data generation. (10/n)

1

2

10

@zamir_ar

Amir Zamir

5 years

What?? According to the Supreme Court of the United States “Using copyrighted material in a dataset that is used to train a discriminative machine-learning algorithm is perfectly legal”

1

3

7

@zamir_ar

Amir Zamir

11 months

Besides the out-of-the-box capabilities, a 4M model can also be directly used as a ViT backbone. It exhibits strong transfer performance by outperforming MAE and MultiMAE on various standard vision benchmarks. (12/n)

Tweet media one

1

1

9

@zamir_ar

Amir Zamir

11 months

4M models can output any of the modalities conditioned on any other(s). To do that, we iteratively predict and sample tokens then add them back to the input. Once all tokens from a modality are predicted, we move on to the next modality. (5/n)

1

1

10

@zamir_ar

Amir Zamir

11 months

We would need a large and diverse multimodal dataset to train such a model. Existing datasets are either too small or not diverse enough, so we instead start from image & text pairs then use off-the-shelf pseudo-labeling networks to generate the remaining modalities. (4/n)

Tweet media one

1

1

10

@zamir_ar

Amir Zamir

6 years

Though the unicorn of robotics might well be at a supermarket, construction site, or a warehouse—rather than a home. Related to the recent @nytimes article by @markoff , the piece on "The Hunt for Robot Unicorns" by @IEEESpectrum was a good read too

0

2

7

@zamir_ar

Amir Zamir

4 years

@Michael_J_Black @docmilanfar Agreed. I often tell students we have letters because some critical information is lost in common metrics and standardized tests (GPA, paper/citation count, school name, etc). That’s their purpose and they should serve it however it makes sense. Good for avoding survivorship bias

0

0

7

@zamir_ar

Amir Zamir

11 months

4M’s any-to-any generation and in-painting capabilities enable fine-grained multimodal generation and editing tasks. Such as performing semantic edits or grounding the generation in extracted intermediate modalities. (7/n)

Tweet media one

Tweet media two

1

2

8

@zamir_ar

Amir Zamir

4 years

Technology adoption in US households, 1860 to 2019.

Tweet media one

0

0

6

@zamir_ar

Amir Zamir

11 months

This approach makes it convenient to add new modalities from diverse formats (e.g. images, sequences, neural network feature maps, etc). We already trained models that can jointly operate on 20+ modalities/tasks and are adding more. (6/n)

Tweet media one

1

2

11

@zamir_ar

Amir Zamir

4 years

The method basically augments standard supervised learning objective w/ explicit cross-task consistency constraints. The constraints are learned from data; no need for differentiable or apriori known constraints. We start with a consistent "triangle" and extend to larger graphs.

Tweet media one

1

2

5

@zamir_ar

Amir Zamir

2 years

MultiMAE is trained *entirely using pseudo labels*, making it applicable to any RGB dataset without any annotations. It can be flexibly transferred to tasks where more than just one modality is (optionally and arbitrarily) available, with notable performance benefits. 4/5

Tweet media one

Tweet media two

1

0

6

@zamir_ar

Amir Zamir

2 years

WTF! He demanded an non-Muslim American journalist to wear a headscarf — in New York!!

@TheDailyShow

The Daily Show

2 years

"I was not in that moment as a journalist or a woman going to put a headscarf on and somehow bind myself." CNN's @amanpour on refusing to wear a headscarf for her interview with Iran's president Ebrahim Raisi

521

2K

7K

1

0

6

@zamir_ar

Amir Zamir

6 years

@tsimonite @SergeBelongie @nisselson The conclusion that simulation-to-reality gap is about to disappear is shortsighted, IMO. The biggest obstacle #sim2real faces is not photorealistic rendering, but matching the semantic complexity of real world in simulation. Good luck creating a full messy bedroom in simulation.

0

1

5

@zamir_ar

Amir Zamir

4 years

Cross-Task Consistency is quite useful for standard single-task learning too, not just multitask. Simple conclusion: instead of training your network to do X→Y1, train it to do X→Y1→Y2. It will fit the data better with improved Y1 predictions. We extend this to larger configs.

Tweet media one

1

1

5

@zamir_ar

Amir Zamir

2 years

@y_m_asano @jalayrac @mcaron31 @NagraniArsha @imisra_ Great talks! Looking forward to seeing the recordings 😉

0

0

5

@zamir_ar

Amir Zamir

4 years

@MattNiessner I see. Unsurprising. There is a disproportional focus on fixing the diversity issue close to the end of the pipeline (PhD student level, postdoc level, faculty level). That's way too late. Mostly fixes the cosmetics only. We need to start much earlier.

1

0

5

@zamir_ar

Amir Zamir

4 years

@AjdDavison Well, just like with many other things, scaling up is one big issue🙂In terms of both scene size and the required density of images. I won’t be surprised if scaling brings in some of the classic mechanisms that are written off now. But things are moving fast in this space, so...

0

0

5

@zamir_ar

Amir Zamir

3 years

@vincesitzmann @MIT Congrats!! Long way from Bytes meetings 👏🍾

0

0

4

@zamir_ar

Amir Zamir

2 years

@wenzeljakob @merlin_ND @DelioVicini Congrats! Beautiful work done.

0

0

4

@zamir_ar

Amir Zamir

5 years

I always wondered why living organisms didn’t grow 'wheels' through evolution.

Tweet media one

0

1

4

@zamir_ar

Amir Zamir

5 years

@abigail_e_see @skynet_today 2. clickbait titles/pictures: they are probably the fastest way to get traffic but fast doesn’t mean good. Concise and descriptive > catchy and inaccurate. Be a responsible journalist/blogger/presenter, even if it costs you getting less attention in short run.

0

1

4

@zamir_ar

Amir Zamir

3 years

@JesParent @bradpwyble @ArtDeza @StateoftheartAI @iamsashasax @jitendramalik28 @silviocinguetta First time looking at taskonomy’s citation graph, thanks! 😁 BTW, taskonomy images are real, not synthetic. There are a lot “synthetic” datasets out there, with a wide range. Eg synthesized from real scans/data (eg Gibson Env or IBR), from artists’ CAD, or others like fractals.

1

1

4

@zamir_ar

Amir Zamir

5 years

@SHamidRezatofig @CVPR @cvpr2020

0

0

4

@zamir_ar

Amir Zamir

1 year

Using a closed-loop formulation is common in control theory and robotics for solving (hard) problems. RNA uses a side controller network (h) to interpret a feedback signal to adapt a given pre-trained network (f). It is implemented via inserting FiLM layers in f. 2/n

1

1

3

@zamir_ar

Amir Zamir

4 years

@fdellaert Depends if the primary goal is looks/communication or there are more functions

1

0

3

@zamir_ar

Amir Zamir

1 year

We experimented with a set of signals that are practical for real-world use. However, those signals are also imperfect, so in the paper we also perform controlled experiments using ideal signals to isolate the actual performance of RNA. 5/n

1

1

3

@zamir_ar

Amir Zamir

6 years

@fchollet Could be foveated, instead of hierarchical, too. At minimum certain parts of biological perception prefers fovea over an explicit hierarchy.

0

0

3

@zamir_ar

Amir Zamir

3 years

@TrackingActions @ICepfl They were great exams and discussions! Credit goes to Roman and Onur for the job ;)

1

0

3

@zamir_ar

Amir Zamir

1 year

The experiments are on several tasks, eg depth, semantic segmentation, 3D reconstruction, ImageNet, & on a range of distribution shifts. We also provide a discussion on the landscape of related formulations. Joint w/ @aseretys , @oguzhanthefatih , Zahra n/n

Tweet media one

0

1

4

@zamir_ar

Amir Zamir

4 years

@zacharylipton @IBM What’s the “AI” in there? I read multiple articles (by @IBM & others) and this seems like a database integration mostly. The fact that they keep shoving the word “AI” in it to get attention and turn it into a PR campain is extra alarming if this really benefits the less fortunate

2

0

2

@zamir_ar

Amir Zamir

4 years

@igubins It was just a random 0.25% sample of the full training dataset. The goal was to evaluate whether the trends hold under a low data regime too. We didn't think about putting the sample indexes on Github. We could. I believe any iid random sample would do.

1

1

3

@zamir_ar

Amir Zamir

4 years

@colinraffel Talk titles are even more amazing!! "Learning Internal Reps From Multiple Tasks", "Identifying Relevant Tasks", "Where is Multitask Learning Useful?", "Combining supervised and unsupervised learning, where do we go from here?", "Continual Learning"

0

0

3

@zamir_ar

Amir Zamir

3 years

@jinayoon_ Maybe add europe and the rest of the world besides north america? 😉

0

1

3

@zamir_ar

Amir Zamir

6 years

Joint work with @xf1280 @ZhiyangH @iamsashasax @silviocinguetta , Jitendra Malik at @StanfordCVGL and @berkeley_ai .

0

0

3

@zamir_ar

Amir Zamir

1 year

The side network h has ~5-20% of the number of parameters of f. It is trained to predict how f should be updated -- so it amortizes the optimization (takes only a feedforward pass), making it much (~30x) faster than performing test-time optimization using SGD (TTO). 3/n

1

1

3

@zamir_ar

Amir Zamir

2 years

@georgiagkioxari @jbhuang0604 And off-the-shelf single-view 3D methods aren’t too bad either

Tweet media one

1

0

3

@zamir_ar

Amir Zamir

4 years

paper by @vinayprabhu & @Abebab

1

0

3

@zamir_ar

Amir Zamir

5 years

@andrey_kurenkov @Bschulz5 @elonmusk @skynet_today Scaling is easier than inventing. If we know how to make AGI, likely 2xAGI or 10xAGI is quick, so @elonmusk might be right on that. But the missing piece rn is the G in AGI. And I suspect we're inconceivably far from it. Otherwise nX human-level is already here for narrow tasks.

1

1

3

@zamir_ar

Amir Zamir

4 years

Those that found inaccuracies in the table according to your experinece, consider directly reporting the error to the source to update their stats: administration @informatics -europe.org. I sent them an email inviting them to look at the reported inaccuracies in this thread.

0

0

3

@zamir_ar

Amir Zamir

4 years

@_herobotics_ @fdellaert Gibson website is a good one. Especially the dataset page

0

0

3

@zamir_ar

Amir Zamir

6 years

@hardmaru @erwincoumans A (quantitative) answer to the generalization question through a study is brewing. Sneak peak: perception and dynamics aspects should be viewed and analyzed separately wrt generalization. Their generalization traits don't appear to correlate strongly. (opportunity or threat?)

0

0

2

@zamir_ar

Amir Zamir

4 years

Now, phone cases with tiny legs!

0

0

2

@zamir_ar

Amir Zamir

6 years

@AlexandreRbcqt @Stanford @StanfordCVGL 🙏 😊

0

0

2

@zamir_ar

Amir Zamir

4 years

@LauTor83 @ArnoutDevos brought that up, and I reported the error to the source a bit ago. My phd students dont quite get that total amount too, but seems like the reported numbers for all countries is higher (eg comments about Germany). Some tax/employment rate adjustment might be in play?

0

0

2

@zamir_ar

Amir Zamir

1 year

Very cute! Is it still necessary to capture (many) pixels/photons given powerful generative models? 🧵

Tweet media one

Tweet media two

@BjoernKarmann

Bjørn Karmann

1 year

Introducing – Paragraphica! 📡📷 A camera that takes photos using location data. It describes the place you are at and then converts it into an AI-generated "photo". See more here: or try to take your own photo here:

1K

5K

23K

1

0

2