Ari Morcos @arimorcos Twitter profile

Pinned Tweet

Ari Morcos

7 months

I'm incredibly excited to announce our new company, @datologyai ! Training models is hard and identifying the right data is the most important and difficult part -- our goal @datologyai to make optimizing training data at scale easy and automatic across modalities.

53

32

466

Last Seen Profiles

@Cariboozer01

@AAlwlyd83192

@bokeplokalmalam

@iamsati

@lagatorvii

@Fafa2010ah

@GQKOREA

@215_26a

@supadupafarm

@jiminvkos

@bokeplokalmalam

@LolaSpende

@DFallamhain

@Justhackingco

@SukkumiArt

@mr_sangel

@FosteringNS

@marcoaguiriano

@ni_draw

@7emsb5570

@Thabiso72400

@franstopiello

@baekhyunpicudo

@Ak1pup

@lowbi1

@BinorRaja

@FutbollLay

@L4NDINHA

@_viandvi_

@dankster_avi

@Bonkerz_Palmer

@WOCxZiQVQ12TCBW

@heather_wa61605

@initokyolagi

@stw46

Ari Morcos

@arimorcos

2 years

Neural scaling laws are great for predictability, but power law scaling is slow, especially in the large data regime when 10x the data results in small gains. Can we do better? We show that exponential scaling is possible via intelligent data pruning.

4

54

305

Ari Morcos

@arimorcos

2 years

Web-scale data has driven the incredible progress in AI but do we really need all that data? We introduce SemDeDup, an exceedingly simple method to remove semantic duplicates in web data which can reduce the LAION dataset (& train time) by 2x w/ minimal performance loss. 🧵👇

7

58

310

Ari Morcos

@arimorcos

3 years

We know invariance is important for generalization, but what is the source of this invariance? Does it come from the architecture, augmentations, or the data itself? In our #NeurIPS2021 paper led by @marksibrahim and @D_Bouchacourt , we aim to find out.

5

65

270

Ari Morcos

@arimorcos

5 years

Most approaches to learning generalizable representations have focused on constraining the structure of the representation. But what if you instead constrain *how representations can be manipulated*? We introduce latent canonicalization to test this:

6

70

271

Ari Morcos

@arimorcos

4 years

Are all negatives created equal in contrastive instance discrimination? In new work led by Tiffany Cai, we show that only the hardest 5% of negatives per query are both necessary and largely sufficient for self-supervised learning. Tweetprint time!

2

55

246

Ari Morcos

@arimorcos

6 months

Repeat after me: data >>>>>> architecture. Given enough quality data, many different architectures can achieve comparable performance. The secret sauce was, is, and remains the data, not the model.

Samuel L Smith

@SamuelMLSmith

6 months

Announcing RecurrentGemma! - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

9

69

278

20

25

247

Ari Morcos

@arimorcos

11 months

Transformers are great, but I think their importance to the insane progress of the last few years has been massively overstated. The key was and is larger, higher quality datasets.

Yann LeCun

@ylecun

11 months

Compute is all you need. For a given amount of compute, ViT and ConvNets perform the same. Quote from this DeepMind article: "Although the success of ViTs in computer vision is extremely impressive, in our view there is no strong evidence to suggest that pre-trained ViTs

81

314

2K

14

20

232

Ari Morcos

@arimorcos

7 years

Excited to share our blog post on our @ICLR18 paper! *Easy-to-interpret neurons are no more important than hard-to-interpret neurons *Generalizing networks are more robust to neuron deletion than memorizing networks Blog: Paper:

On the importance of single directions for generalization

Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which...

arxiv.org

2

76

228

Ari Morcos

@arimorcos

5 years

Recent studies have suggested that the earliest iterations of DNN training are especially critical. In our #ICLR2020 paper with @jefrankle and @davidjschwab , we use the lottery ticket framework to rigorously examine this crucial phase of training.

1

42

206

Ari Morcos

@arimorcos

6 years

Timely paper from @ShibaniSan , Dimitris Tsipras, @andrew_ilyas , and @aleks_madry providing some new insights into why batch norm works. They perform a number of clever experiments to work it out, finding that internal covariate shift is a red herring!

4

59

165

Ari Morcos

@arimorcos

7 years

Just read through the @distillpub interpretability blog post from @ch402 and others. Stunning (and fun!) visualizations, but I wonder: what did these visualizations actually teach us about these networks? What do we know now that we didn't know before?

1

36

137

Ari Morcos

@arimorcos

2 years

I'll be at #NeurIPS2022 this week! Will be presenting "Beyond neural scaling laws" (Outstanding Paper) Wed morning and at @MetaAI booth, including for our AI Residency Q&A Wed lunch time. Excited to see old friends and new faces. DM me if you want to chat!

4

6

126

Ari Morcos

@arimorcos

5 months

Thrilled to announce that @datologyai recently raised a $46M Series A! We'll be using these funds to grow our team and compute to push the frontier of data research to make data curation easy for everyone. We're hiring across engineering and research!

DatologyAI raises $46M Series A

DatologyAI is excited to announce that we’ve raised a $46M Series A led by Viv Faga and Astasia Myers from Felicis Ventures, with participation from our existing Seed investors at Radical Ventures...

www.datologyai.com

8

11

121

Ari Morcos

@arimorcos

7 months

Our mission @datologyai is to enable anyone to train powerful AI models by making data curation and optimization easy for everyone. Hear more about our mission here:

Introducing DatologyAI — Making models better through better data, automatically

Models are what they eat.AI models trained on large-scale datasets have demonstrated jaw-dropping abilities and have the power to transform every aspect of our daily lives, from work to play. This...

www.datologyai.com

9

12

113

Ari Morcos

@arimorcos

6 years

Happy to (finally!) share our work on Generative Query Networks! In particular, I'm excited about our efforts to make sense of the representations these networks learn. Here are some of the most interesting findings: Blog: Paper:

2

27

112

Ari Morcos

@arimorcos

6 years

Very excited to announce our #ICML2019 workshop on Identifying and Understanding Deep Learning Phenomena! We're looking for papers which rigorously test whether commonly-held, but unproven intuitions/beliefs about DNNs are actually true. #DeepPhenomena

1

18

103

Ari Morcos

@arimorcos

4 years

Why does pruning during training often *improve* generalization? In our #NeurIPS2020 paper led by @bartoldson , we introduce the generalization-stability tradeoff, in which decreasing pruning-induced stability leads to better generalization.

1

15

96

Ari Morcos

@arimorcos

5 months

👀

Ex-DeepMind and Meta AI Researcher Raises Nearly $50 Million To Help Developers With LLMs’ Secret...

For all that’s leaked about the sizes and architectures of the advanced large language models developed by OpenAI and its ilk, we still don’t know what data they were trained on.A pessimist might say...

www.theinformation.com

8

6

98

Ari Morcos

@arimorcos

6 years

Beautiful paper from @ukhndlwl performing a battery of experiments to evaluate the long-range dependencies of LSTMs to word order, part of speech, word frequency, and more. Would be awesome to see these tests become part of the standard evaluation of RNNs!

2

37

94

Ari Morcos

@arimorcos

6 years

Excited to share our recent work showing that pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs! Rather, filter smoothness most directly modulates deformation stability in networks both *with* and *without* pooling.

2

24

95

Ari Morcos

@arimorcos

1 year

Finally had a careful read of the "Textbooks are all you need" paper (). Is anyone else surprised that there are not, in fact, any textbooks in the dataset?

4

12

95

Ari Morcos

@arimorcos

3 years

How can we combine the best features of both CNNs and ViTs? In our @icmlconf paper, led by @stephanedascoli , we introduce ConViT, a ViT with soft convolutional inductive biases which the model can learn to ignore. Paper: Blog:

Better computer vision models by combining Transformers and convolutional neural networks

We’ve developed a new computer vision model called ConVit, which combines two widely used AI architectures — convolutional neural networks (CNNs) and Transformer-based models — in order to overcome...

ai.meta.com

4

25

95

Ari Morcos

@arimorcos

6 years

Check out our latest preprint (with @maithra_raghu and Samy Bengio) on the relationship between representational similarity and generalization and optimization in CNNs and training and sequence dynamics in RNNs! 1/N Blog: Paper:

How Can Neural Network Similarity Help Us Understand Training and Generalization

Posted by Maithra Raghu, Google Brain Team and Ari S. Morcos, DeepMind In order to solve tasks, deep neural networks (DNNs) progressively transform...

research.google

5

17

93

Ari Morcos

@arimorcos

7 years

Thought-provoking work from @julius_adebayo , @goodfellow_ian and others showing that saliency maps don't change much, even when *all* the weights are reinitialized! Do saliency maps tell us what the network cares about, or merely what the task demands?

0

28

91

Ari Morcos

@arimorcos

6 years

Important work from @sarahookr , @doomie , @piekindermans , and @_beenkim on quantifying the extent to which various saliency methods *actually* find relevant portions of images. Really happy to see more work towards bringing rigor into interpretability!

1

24

91

Ari Morcos

@arimorcos

2 years

Excited and honored that our paper on beating power law scaling via data pruning earned an outstanding paper award at #NeurIPS2022 ! Congratulations to my amazing co-authors, Ben Sorscher, Robert Geirhos, @sshkhr16 , and @SuryaGanguli ! Paper:

Beyond neural scaling laws: beating power law scaling via data pruning

We show in theory and practice that power law scaling of error with respect to dataset size can be improved via intelligent data pruning.

openreview.net

NeurIPS Conference

@NeurIPSConf

2 years

Happy to announce the award winners of #NeurIPS2022

3

87

399

3

9

84

Ari Morcos

@arimorcos

5 years

Excited to share our blog summarizing some of our recent work understanding the boundaries of the lottery ticket hypothesis! Do lottery tickets generalize across datasets? Is the phenomenon present in RL and NLP? Can we begin to explain it theoretically?

Understanding the generalization of ‘lottery tickets’ in neural networks

The lottery ticket hypothesis suggests that by training DNNs from “lucky” initializations, we can train networks which are 10-100x smaller with minimal performance losses. In new work, we extend our...

ai.meta.com

1

17

82

Ari Morcos

@arimorcos

4 years

For anyone else closely tracking the new votes coming in, this is incredible:

3

29

82

Ari Morcos

@arimorcos

6 years

Very cool paper from @skornblith , Jon Shlens, and Quoc Le evaluating what factors lead to better feature transfer. I wonder if ResNets are best for transfer because their identity connections prevent task-irrelevant information from being thrown away...

0

22

80

Ari Morcos

@arimorcos

1 year

Really excited that this is finally out!Photorealistic, completely controllable data for better evaluation of models!

AI at Meta

@AIatMeta

1 year

Today we're sharing our work on PUG, new research from Meta AI on photorealistic, semantically controllable datasets using Unreal Engine for robust model evaluation. More details & dataset downloads ➡️

13

90

470

1

9

78

Ari Morcos

@arimorcos

1 year

Another excellent example that data >>> model, this time from neuroscience. Two models trained on the same data will have similar behavior despite architectural differences (if enough data), even if those are between biological and artificial neural nets.

talia konkle

@talia_konkle

1 year

5/ Surprisingly, across both measures, we found that even major differences in model architecture (e.g. CNNs vs. Transformers) did not significantly lead to better or worse brain alignment…

2

10

59

1

19

68

Ari Morcos

@arimorcos

6 years

While the lack of qualified reviewers is certainly a problem, I wish more authors viewed reviewer misunderstandings as a signal to invest more in the clarity of their writing and figures, rather than simply complaining about the reviewer.

3

15

68

Ari Morcos

@arimorcos

2 years

Very excited that this work earned an Outstanding Paper Award at ICLR! Congratulations to @erikwijmans and my other incredible co-authors, @ManolisSavva , @stefmlee , @irrfaan , and @DhruvBatraDB !

erikwijmans

@erikwijmans

2 years

How do 'map-less' agents navigate? They learn to build implicit maps of their environment in their hidden state! We study 'blind' AI navigation agents and find the following 🧵

1

6

46

1

5

63

Ari Morcos

@arimorcos

11 months

Thanks for sharing our work showing data curation can help us train models far faster to far better performance, @martin_casado ! Someone should figure out how to make it easy for companies who want to train their own models to use data curation at scale... 🤔🤔🤔 (stay tuned)

martin_casado

@martin_casado

11 months

Most people engaged in the safety discussion don't have a good sense of how hard it is to get past power law scaling (error drops relative to the power of the training set/model size). The industry is fighting against rapidly diminishing marginal returns.

12

45

217

1

7

61

Ari Morcos

@arimorcos

5 years

Do lottery ticket initializations generalize, or are they overfit to the precise conditions used to generate them? If you're at #NeurIPS2019 , come see our poster #170 happening right now! Paper: Blog:

1

7

59

Ari Morcos

@arimorcos

11 months

"Please give us your most valuable commodity and moat for free so that we can make money off of it instead of you." - Sam Altman, probably

4

5

55

Ari Morcos

@arimorcos

6 years

Though it seems amazing at first, universality isn't actually a particularly useful property. Learnability is what we actually care about. A solution which is possible but not learnable might as well not exist.

Gabriel Peyré

@gabrielpeyre

6 years

A 1 hidden layer perceptron can approximate arbitrary continuous functions using a large enough number of neutrons (Cybenko's theorem).

7

123

326

1

11

48

Ari Morcos

@arimorcos

7 years

Also worth noting that while we only looked at class selective neurons, our recent @ICLR18 paper found that easily interpretable neurons were no more important than confusing neurons, which is at odds with the motivation behind many of these techniques.

1

8

50

Ari Morcos

@arimorcos

7 months

It never made sense to me how startups could ever beat big tech. After working in big tech for years and seeing the politics, it now makes perfect sense. It all comes down to incentives.

Dalton Caldwell

@daltonc

7 months

When a startup is competing against a large competitor, they aren't competing with the *entire* company, they are likely competing with some PM focused on internal politics/career progression. With this framing, it shouldn't be surprising to see startups win as often as they do.

68

389

3K

0

3

46

Ari Morcos

@arimorcos

6 months

A key part of bringing this cost down was @code_star and team's focus on increasing data quality to make training ~2x more token efficient. Massive efficiency gains can be had through better data! And because models don't converge -- compute multipliers = quality multipliers.

clem 🤗

@ClementDelangue

6 months

Just $10M and two months to train from scratch a GPT3.5 - Llama2 level model. For context, it probably cost 10-20x more to OAI just a year ago! The more we improve as a field thanks to open-source, the cheaper & more efficient it gets! All companies should now train their own

11

56

502

0

5

45

Ari Morcos

@arimorcos

7 months

At @datologyai , we are pushing the frontier of data research to build products that make it easy for anyone to make the most of their data, automatically. We're hiring for a number of roles across research and engineering. If you're excited about data, please join us!

1

5

41

Ari Morcos

@arimorcos

6 years

Happy to release our review (with @dgtbarrett and @jakhmack ) exploring ways in which the machine learning and neuroscience communities might interact to best advance analysis and understanding of neural networks, whether they're biological or artificial!

1

13

43

Ari Morcos

@arimorcos

9 months

Nothing is actually all you need, but... "Good data are all you need" is probably the truest form of that statement.

Karol Hausman

@hausman_k

9 months

Given that we realize that data is the most important piece for training foundational models, I'd focus the Manhattan Project on that. Coordinate with many experts to establish the most comprehensive, high-level database of human knowledge that is otherwise difficult to access.

5

6

44

0

3

42

Ari Morcos

@arimorcos

6 years

Nice summary of the advantages and disadvantages of the increased interest in ML, especially with respect to incentive schemes from @zacharylipton at the Critiquing Trends Workshop. #NeurIPS2018

2

8

41

Ari Morcos

@arimorcos

6 months

Congrats to all my friends @AIatMeta for this awesome release! As we see again and again, better data --> better models.

AI at Meta

@AIatMeta

6 months

Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3

350

1K

6K

0

2

42

Ari Morcos

@arimorcos

10 months

I'll be at #NeurIPS2023 next week. If you're interested in chatting about how we can make models better through better data, please reach out!

2

1

40

Ari Morcos

@arimorcos

1 year

This is why data curation can lead to big performance gains for the same training budget. As models learn, less and less of the data they see is useful, so the rate of learning slows. Eventually we just decide to stop training, but if models see better data, they'll learn faster!

Tom Goldstein

@tomgoldsteincs

1 year

The most intriguing line of the llama2 paper: "...the models still did not show any signs of saturation".

14

32

302

0

3

39

Ari Morcos

@arimorcos

6 years

On my way to #NeurIPS2018 ! Happy to chat, especially about ways to build a science of deep learning! Also excited to present our work (joint with @maithra_raghu ) on using CCA to understand DNNs on Wednesday morning! Blog: Paper:

How Can Neural Network Similarity Help Us Understand Training and Generalization

Posted by Maithra Raghu, Google Brain Team and Ari S. Morcos, DeepMind In order to solve tasks, deep neural networks (DNNs) progressively transform...

research.google

0

2

35

Ari Morcos

@arimorcos

6 years

How do we rigorously measure abstract reasoning capabilities in neural networks? Can we clearly defined different types of generalization? With @santoroAI , @dgtbarrett , Felix Hill and Tim Lillicrap, we introduce a new dataset in our @icmlconf paper to try!

Google DeepMind

@GoogleDeepMind

6 years

Measuring abstract reasoning in neural networks - our latest #ICML2018 paper - takes inspiration from human IQ tests to explore abstract reasoning and generalisation in deep neural networks by @dgtbarrett , Felix Hill, @santoroAI , @arimorcos , Tim Lillicrap

9

246

504

0

11

35

Ari Morcos

@arimorcos

7 years

I wonder if all of these visualization methods are actually Rorschach ink blot tests for ML researchers. We see in them what we want to see, which may often just be the easiest to understand explanation.

1

9

34

Ari Morcos

@arimorcos

6 months

Could not be more excited to welcome @josh_wills to the @datologyai team!

JosH100

@josh_wills

6 months

Some personal news: yesterday was my first day at @datologyai ! I will be working on what I consider to be the most interesting problem in data engineering: curating training datasets for machine learning models.

21

2

176

3

1

35

Ari Morcos

@arimorcos

6 years

Inspired by word vector algebra, we also tested whether we could perform "scene algebra". Can you add and subtract the GQN representations for different scenes to create new objects? Yes, you can! This provides further evidence for factorized representations.

1

14

33

Ari Morcos

@arimorcos

6 months

Come join us to do exactly this @datologyai !

Why Should You Join DatologyAI?

Our mission at DatologyAI is to help companies train better models through better use of data.We believe that the next 10x efficiency improvements in deep learning will come from better use of data....

www.datologyai.com

Tapa Ghosh

@semiDL

6 months

Big labs all use active learning data filtering techniques for training- offering them as a service to enterprise users for fine tuning seems like a good startup idea

2

1

18

0

4

31

Ari Morcos

@arimorcos

5 years

Great to see that our recent #NeurIPS2019 paper on generalizing lottery tickets () has been reproduced as part of the NeurIPS reproducibility challenge by @Deepak120199 , @VarunGohil9 , and Atishay Jain! Report:

1

3

32

Ari Morcos

@arimorcos

2 years

Really excited that this is finally out! In work led by @erikwijmans , we show that agents with no sensory input beyond ego-motion can effectively navigate novel environments and do so by storing maps in memory despite no prior for mapping.

AI at Meta

@AIatMeta

2 years

📣 New paper: Emergence of Maps in the Memories of Blind Navigation Agents Humans have the ability to navigate poorly lit spaces by relying on touch and memory. Our research shows that blind AI agents can learn to do the same. Read the paper ➡️

6

106

511

1

4

32

Ari Morcos

@arimorcos

5 months

IMO fairness and bias are the most critical near-term AI risks, yet unfortunately they often get swept under the rug in favor of x-risk discussions. Fairness and bias ultimately come down to data. If your data is skewed relative to the real world, your model will be too!

vik

@vikhyatk

5 months

i just wish we spent more time addressing the real risks with AI - e.g. fairness, bias, privacy, sustainability - instead of talking about science fiction doomsday scenarios

5

18

109

2

6

32

Ari Morcos

@arimorcos

6 years

"Motivating the Rules of the Game for Adversarial Example Research" Really important perspective on adversarial examples from @jmgilmer , @goodfellow_ian , George Dahl et al asking if the security motivation for adversarial research actually makes sense.

0

13

31

Ari Morcos

@arimorcos

2 years

Interested in the intersection between data curation, privacy, and fairness? @kamalikac , Chuan Guo, and I are looking to hire a postdoc at @MetaAI (FAIR) to investigate these directions. We encourage candidates of all backgrounds to apply.

Error

Meta's mission is to give people the power to build community and bring the world closer together. Together, we can help people build stronger communities - join us.

www.metacareers.com

3

7

28

Ari Morcos

@arimorcos

6 years

Very cool reproduction of world models showing that an untrained RNN is basically just as good! Perhaps we as a field should revisit reservoir computing. Or, alternatively, having an RNN may aid credit assignment if gradients can flow through...

1

8

28

Ari Morcos

@arimorcos

7 months

Very true and also something I've been guilty of saying in the past. Transformers have far weaker inductive biases than CNNs or RNNs, but they do still exist. However, this weakness allows them to easily learn the appropriate inductive bias given enough (quality) training

Armen Aghajanyan

@ArmenAgha

7 months

There is a commonly held belief that Transformers have no inductive bias and that this bias is learned throughout the training process. This is not true. Transformers have very strong inductive biases.

4

25

276

3

1

29

Ari Morcos

@arimorcos

6 years

Very nice write up on recent work on understanding generalization in deep learning, including a nice discussion of our recent @ICLR18 paper!

Towards Faster Training and Smaller Generalisation Gap in Deep Learning

An overview of some recent work (part 2 of 2)

medium.com

0

13

27

Ari Morcos

@arimorcos

5 years

At #ICML2019 ? Interested in understanding what's *actually* going on in our networks? Come to our workshop on #DeepPhenomena today in Hall B! We have an awesome lineup of speakers and papers!

0

2

26

Ari Morcos

@arimorcos

1 year

@code_star I guess "code labeled as educational by a random forest trained on GPT-4 outputs is all you need" isn't quite as catchy...

1

0

28

Ari Morcos

@arimorcos

8 months

Models are what they eat!

Aidan Clark

@_aidan_clark_

8 months

you know this whole data thing seems to matter more than I might have thought it did

3

4

54

0

2

28

Ari Morcos

@arimorcos

7 months

So happy to be working with @_RobToews and @radicalvcfund !

Radical Ventures

@radicalvcfund

7 months

We are excited to announce our investment in @datologyai !🚀Led by @arimorcos , Datology is tackling the crucial challenge of data curation, ensuring models are fed quality data for superior performance & efficiency. @_RobToews shares why we invested in this week's #RadicalReads :

2

4

16

1

0

28

Ari Morcos

@arimorcos

6 years

On my way to @ICLR18 ! If you'll be in Vancouver and want to chat/hang out, hit me up! Also, come see our poster on Wednesday!

1

2

27

Ari Morcos

@arimorcos

2 years

Very nice concurrent work complimenting our findings in SemDeDup regarding high levels of duplication in web datasets like LAION. Love the application to finding more copied images in generative models!

AK

@_akhaliq

2 years

On the De-duplication of LAION-2B approach demonstrates that roughly 700 million, or about a third of LAION-2B’s images, are duplicates abs: github:

2

37

183

1

2

27

Ari Morcos

@arimorcos

1 year

Extremely well-reasoned thread on why open-sourcing foundation models isn't actually problematic, but rather necessary. I particularly agree with the hubris point -- LLMs are not that complicated! Building them is within the resources of many companies, let alone state actors.

Delip Rao e/σ

@deliprao

1 year

This is another one of those ill-thought, fear-mongering scientific disinformation about LLMs, and I will explain why in this long thread. 🧶

6

156

646

0

3

27

Ari Morcos

@arimorcos

4 years

Does the lottery ticket hypothesis generalize to RL and NLP? Come check out our #ICLR2020 paper to find out! First poster session starting now! ICLR: Paper: Blog post:

1

2

26

Ari Morcos

@arimorcos

4 years

Are easy to interpret neurons helpful to performance in CNNs? In a new blog post, @leavittron and I summarize our work evaluating the causal impact of selective neurons, finding that easily interpretable neurons can actually be harmful to performance.

Easy-to-interpret neurons may hinder learning in deep neural networks

What does an AI model “understand” and why? A long-held belief is there are easy-to-interpret neurons -- or “class selective” neurons. For instance, finding neurons that

ai.meta.com

2

1

25

Ari Morcos

@arimorcos

2 years

Very excited to work with Amro during his AI residency at @MetaAI this year!

Amro

@amrokamal1997

2 years

Hello world I’m happy to share that I’m starting a new position as AI Resident at Meta AI ( #Facebook AI). A big THANK YOU to all the people I worked with/learned from till this point, especially during the last 18 months.

15

2

105

0

24

Ari Morcos

@arimorcos

4 years

Was really fun to appear on @underrated_ml last week! For my underrated paper: Are all layers created equal? from Chiyuan Zhang and collaborators:

Are All Layers Created Equal?

Understanding deep neural networks is a major research objective with notable experimental and theoretical attention in recent years. The practical success of excessively large networks...

arxiv.org

underrated_ml

@underrated_ml

4 years

We are back! @arimorcos joins the podcast and talks about the role of different layers in a network. We also discuss Ari's journey from neuroscience research to ML. This was a good one, and our last episode in season 2. Time has flown, thanks for all the support!

1

3

13

0

3

24

Ari Morcos

@arimorcos

8 months

This is a serious concern for the training of future models. We're polluting the Internet with low quality data. Synthetic data absolutely has its place, but it has to be targeted and curated.

Grady Booch

@Grady_Booch

8 months

22

90

775

1

24

Ari Morcos

@arimorcos

7 years

My favorite part of the YOLOv3 paper: the section on things which didn't work. More and more papers should include these and reviewers should demand them! It would be amazing if such sections became as widespread as related work sections.

0

6

23

Ari Morcos

@arimorcos

7 months

This is exactly why we started @datologyai : to make creating high-quality training datasets easy and automatic for everyone.

Stella Biderman

@BlancheMinerva

7 months

A good rule of thumb: high quality dataset work is probably more important and impactful long-term than whatever you're currently working on. (Obviously this doesn't apply to *tweeting* which is well-known to be the highest leverage work in ML research 🤭)

5

23

203

1

0

23

Ari Morcos

@arimorcos

7 months

We could not be more excited to partner with @sarahcat21 , @dauber , and @AmplifyPartners to build @datologyai !

Amplify Partners

@AmplifyPartners

7 months

Announcing our investment in @datologyai ! Led by AI pioneers @arimorcos , @hurrycane & @leavittron , Datology is a data curation platform to reduce training costs & improve model performance. @sarahcat21 shares why the future of AI just got brighter:

1

4

28

2

0

23

Ari Morcos

@arimorcos

11 months

Launching ChatGPT plugins makes no sense if you're concerned about AI safety and should make people extremely skeptical about OpenAI's sincerity.

Stella Biderman

@BlancheMinerva

11 months

@TechnologyPat @markchen90 If I ran OAI and believed what they claimed to, I would have faked GPT-3 not working. I would have done everything i could to avoid the scaling race that OpenAI has caused. I wouldn't have rolled out insecure GPT integrations. I wouldn't have given it internet access.

4

7

73

2

23

Ari Morcos

@arimorcos

7 months

It's all about the data!

Aidan McLau

@aidan_mclau

7 months

can someone who knows more about model pretraining than i explain how grok is worse than mixtral at 7 times the size

35

9

250

0

1

23

Ari Morcos

@arimorcos

7 years

And related to the move towards confusing (or "mixed selectivity" neurons) in neuroscience, with lots of recent results showing that they carry tons of information. E.g., from @MattiaRigotti and also among many others.

1

22

Ari Morcos

@arimorcos

6 months

This has never made sense to me, but is common practice in big tech. The cost of ramping someone up is significant and you have far more signal that current high performers will perform well in the future than you do from a 1 day interview. And yet this is standard.

Mike Coutermarsh

@mscccc

6 months

MSFT stock grants New hire: can’t code. has 30k twitter followers. $800k over 4 years. Current eng: finds 500ms backdoor, saves world. $67k retention grant, vests over 5 years.

45

252

6K

1

0

22

Ari Morcos

@arimorcos

6 years

This is a really wonderful idea. It'd be neat to see similar workshops are NIPS and ICML.

Devi Parikh

@deviparikh

6 years

Tomorrow at 9:00 am! How To Be A Good Citizen Of The CVPR Community, Ballroom E Talks on how to write a paper, give a talk, review, do research, manage time, mentor, lead, be inclusive, do reproducible research, collaborate, ... I can't wait! #CVPR18

1

12

70

0

4

21

Ari Morcos

@arimorcos

9 months

When all that matters is compute and data, the highest leverage comes either from working on compute or data. Personally, I don't know how to make GPUs go faster, so...

Julian

@mealreplacer

9 months

There is still a surprising amount of alpha in reading and actually internalizing the material discussed in this very short and clear essay

17

32

507

0

1

21

Ari Morcos

@arimorcos

5 years

Excited to talk about our recent work on understanding the generalization properties of lottery tickets today at #ReWorkDL Montreal!

1

3

21

Ari Morcos

@arimorcos

4 years

It's stunning that researchers think it's appropriate to try to predict "criminality" based on appearances. This is modern day phrenology and should have no place in our field. I have signed and I encourage others to do so as well.

Abeba Birhane

@Abebab

4 years

Springer Nature plans to publish an article "A Deep Neural Network Model to Predict Criminality Using Image Processing" that revives long discredited physiognomist pseudoscience. Sign this petition to urge @SpringerNature to refrain from publishing. RT!

23

225

401

0

3

21

Ari Morcos

@arimorcos

1 year

I'll be at #ICML2023 next week from 7/24-7/29. If you're interested in understanding and improving data, please reach out and happy to chat!

2

0

21

Ari Morcos

@arimorcos

2 years

@erikwijmans @ManolisSavva @stefmlee @irrfaan @DhruvBatraDB I'm especially pleased by this quote from the award committee regarding our paper: “I hope that the demonstrated rigor in building up an argument towards answering questions about learned representations will inform future studies across the ICLR community.”

1

4

20

Ari Morcos

@arimorcos

5 years

Couldn't agree with this more. If you write a paper such that a reader can easily understand the motivation for each experiment, the results will often seem "obvious" even if no one would have predicted them before reading the paper!

Yonatan Belinkov

@boknilev

5 years

Food for thought for #acl2020nlp reviewers: if the work seems "trivial", "expected", or "straightforward", this isn't necessarily a bad thing. In fact, it may mean that the authors did a good and convincing job. @pmphlt has a nice take on this:

3

14

99

0

20

Ari Morcos

@arimorcos

4 years

Though it's not always valued as such, *communicating* science is just as important as doing science. If a paper is difficult to understand, fewer people will read it and fewer will build on it. These are some really fantastic tips for increasing clarity in paper writing.

Jia-Bin Huang

@jbhuang0604

4 years

Sharing one idea I found useful for paper writing: Do NOT ask people to solve correspondence problems. Some Dos and Don'ts examples below: *Figures*: Don't ask people to match (a), (b), (c) ... with the descriptions in the figure caption.

9

309

1K

1

20

Ari Morcos

@arimorcos

6 years

"Science aims to understand and explain." Couldn't agree more with Joelle Pineau's #ICLR2018 talk. We need to view our advances skeptically, and make sure we understand *why* they work. Otherwise, our community will constantly trip trying to build on results which don't hold up.

0

4

20

Ari Morcos

@arimorcos

7 years

Saliency, activation maximization, etc. give us the impression of understanding, but it's often extremely difficult to express the conclusion of this "understanding." Absent a falsifiable hypothesis and rigorous quantification, can we actually say we've learned anything?

2

0

18

Ari Morcos

@arimorcos

7 months

@datologyai @leavittron @hurrycane @JackUrbs @j_mcgraph @Ning_Catsnail @pratyushmaini @priy2201 @sarahcat21 @dauber @AmplifyPartners @_RobToews @radicalvcfund @saranormous @outsetcap @QuietCapital Along with the support of angels including @JeffDean , @geoffreyhinton , @ylecun , @adamdangelo , @aidangomez , @1vnzh , @douwekiela , @NaveenGRao , @jaschasd , @barrald , and @jefrankle .

1

18

Ari Morcos

@arimorcos

6 months

"Garbage in, garbage out" never stopped being true!

Matt Turck

@mattturck

6 months

2014, Big Data panels: "data is the new oil", "garbage in garbage out" 2024, AI panels: "data is the new oil", "garbage in garbage out"

15

20

116

0

19

Ari Morcos

@arimorcos

7 months

Really enjoyed this conversation with @prateekvjoshi on the Infinite ML pod about the potential of fully automated data curation and our mission @datologyai . Have a listen if you'd like to learn more!

Prateek Joshi

@prateekvjoshi

7 months

The topic on Infinite ML pod today is algorithmic data curation. We have @arimorcos to talk about it. He's the cofounder and CEO of @datologyai . In this clip, he talks about the odds that the next data point in your training dataset is going to teach something new to your AI

2

0

10

0

1

19

Ari Morcos

@arimorcos

6 years

As someone who came up in neuroscience, where journal publication is all that matters is, the fixed time from submission to publication for conferences is spectacular.

hardmaru

@hardmaru

6 years

If machine learning research papers are meant for journals rather than for conferences, the original GAN paper might have been published sometime in 2016. DCGAN might have been published in 2018, if at all.

9

36

215

1

0

18

Ari Morcos

@arimorcos

4 years

"Towards falsifiable interpretability research" In this position paper, @leavittron and I discuss case studies to exemplify the importance of falsifiability in interpretability research. Intuition is important, but unverified intuition can be dangerous.

0

1

19

Ari Morcos

@arimorcos

6 years

Fun blog post from @ericjang11 demonstrating that even "dumb" learning rate schedules (such as following the pixels of arbitrary images) work better than a fixed learning rate. Also a great example of how simple experiments can make a general point!

0

4

19

Ari Morcos

@arimorcos

7 months

@datologyai @leavittron @hurrycane @JackUrbs @j_mcgraph @Ning_Catsnail @pratyushmaini @priy2201 We are very fortunate to be supported by an amazing set of institutional investors: @sarahcat21 and @dauber from @AmplifyPartners , @_RobToews from @radicalvcfund , @saranormous , @outsetcap , and @QuietCapital .

2

19

Ari Morcos

@arimorcos

6 years

I literally cannot agree more with this sentiment. Well-controlled, rigorous toy experiments >> unclear large-scale experiments. Unfortunately, much of the field doesn't agree (e.g., criticism of the beautiful adversarial spheres paper for being too toy: )

hardmaru

@hardmaru

6 years

Carefully designed, small-scale toy experiments can explain why your algorithm works better than the baseline in a controlled context, and they are often more useful than demonstrating incremental SOTA improvements on large-scale, compute-intensive tasks.

3

42

208

0

1

19

Ari Morcos

@arimorcos

7 months

100% agree. General purpose models make a ton of sense for consumer use cases, but enterprises can benefit tremendously from smaller, specialized models trained on a company's own data because tasks are far better constrained.

Sharon Zhou

@realSharonZhou

7 months

Fixing hallucinations in general LLMs is very hard. Fixing hallucinations in enterprise LLMs is easy. This is by design. Why? ⚪️ For the general LLM, you want it to perform all sorts of tasks from 5th grade science questions to bedtime stories to even all enterprise use cases.

12

19

135

1

0

18