Ari Morcos Profile
Ari Morcos

@arimorcos

5,956
Followers
1,566
Following
100
Media
1,148
Statuses

CEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind , PhD @PiN_Harvard .

Bay Area, CA
Joined April 2009
Don't wanna be here? Send us removal request.
Pinned Tweet
@arimorcos
Ari Morcos
7 months
I'm incredibly excited to announce our new company, @datologyai ! Training models is hard and identifying the right data is the most important and difficult part -- our goal @datologyai to make optimizing training data at scale easy and automatic across modalities.
53
32
466
@arimorcos
Ari Morcos
2 years
Neural scaling laws are great for predictability, but power law scaling is slow, especially in the large data regime when 10x the data results in small gains. Can we do better? We show that exponential scaling is possible via intelligent data pruning.
Tweet media one
4
54
305
@arimorcos
Ari Morcos
2 years
Web-scale data has driven the incredible progress in AI but do we really need all that data? We introduce SemDeDup, an exceedingly simple method to remove semantic duplicates in web data which can reduce the LAION dataset (& train time) by 2x w/ minimal performance loss. 🧵👇
Tweet media one
7
58
310
@arimorcos
Ari Morcos
3 years
We know invariance is important for generalization, but what is the source of this invariance? Does it come from the architecture, augmentations, or the data itself? In our #NeurIPS2021 paper led by @marksibrahim and @D_Bouchacourt , we aim to find out.
Tweet media one
5
65
270
@arimorcos
Ari Morcos
5 years
Most approaches to learning generalizable representations have focused on constraining the structure of the representation. But what if you instead constrain *how representations can be manipulated*? We introduce latent canonicalization to test this:
Tweet media one
6
70
271
@arimorcos
Ari Morcos
4 years
Are all negatives created equal in contrastive instance discrimination? In new work led by Tiffany Cai, we show that only the hardest 5% of negatives per query are both necessary and largely sufficient for self-supervised learning. Tweetprint time!
Tweet media one
2
55
246
@arimorcos
Ari Morcos
6 months
Repeat after me: data >>>>>> architecture. Given enough quality data, many different architectures can achieve comparable performance. The secret sauce was, is, and remains the data, not the model.
@SamuelMLSmith
Samuel L Smith
6 months
Announcing RecurrentGemma! - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences
Tweet media one
9
69
278
20
25
247
@arimorcos
Ari Morcos
11 months
Transformers are great, but I think their importance to the insane progress of the last few years has been massively overstated. The key was and is larger, higher quality datasets.
@ylecun
Yann LeCun
11 months
Compute is all you need. For a given amount of compute, ViT and ConvNets perform the same. Quote from this DeepMind article: "Although the success of ViTs in computer vision is extremely impressive, in our view there is no strong evidence to suggest that pre-trained ViTs
81
314
2K
14
20
232
@arimorcos
Ari Morcos
7 years
Excited to share our blog post on our @ICLR18 paper! *Easy-to-interpret neurons are no more important than hard-to-interpret neurons *Generalizing networks are more robust to neuron deletion than memorizing networks Blog: Paper:
2
76
228
@arimorcos
Ari Morcos
5 years
Recent studies have suggested that the earliest iterations of DNN training are especially critical. In our #ICLR2020 paper with @jefrankle and @davidjschwab , we use the lottery ticket framework to rigorously examine this crucial phase of training.
Tweet media one
1
42
206
@arimorcos
Ari Morcos
6 years
Timely paper from @ShibaniSan , Dimitris Tsipras, @andrew_ilyas , and @aleks_madry providing some new insights into why batch norm works. They perform a number of clever experiments to work it out, finding that internal covariate shift is a red herring!
Tweet media one
4
59
165
@arimorcos
Ari Morcos
7 years
Just read through the @distillpub interpretability blog post from @ch402 and others. Stunning (and fun!) visualizations, but I wonder: what did these visualizations actually teach us about these networks? What do we know now that we didn't know before?
1
36
137
@arimorcos
Ari Morcos
2 years
I'll be at #NeurIPS2022 this week! Will be presenting "Beyond neural scaling laws" (Outstanding Paper) Wed morning and at @MetaAI booth, including for our AI Residency Q&A Wed lunch time. Excited to see old friends and new faces. DM me if you want to chat!
4
6
126
@arimorcos
Ari Morcos
5 months
Thrilled to announce that @datologyai recently raised a $46M Series A! We'll be using these funds to grow our team and compute to push the frontier of data research to make data curation easy for everyone. We're hiring across engineering and research!
8
11
121
@arimorcos
Ari Morcos
6 years
Happy to (finally!) share our work on Generative Query Networks! In particular, I'm excited about our efforts to make sense of the representations these networks learn. Here are some of the most interesting findings: Blog: Paper:
Tweet media one
2
27
112
@arimorcos
Ari Morcos
6 years
Very excited to announce our #ICML2019 workshop on Identifying and Understanding Deep Learning Phenomena! We're looking for papers which rigorously test whether commonly-held, but unproven intuitions/beliefs about DNNs are actually true. #DeepPhenomena
1
18
103
@arimorcos
Ari Morcos
4 years
Why does pruning during training often *improve* generalization? In our #NeurIPS2020 paper led by @bartoldson , we introduce the generalization-stability tradeoff, in which decreasing pruning-induced stability leads to better generalization.
Tweet media one
1
15
96
@arimorcos
Ari Morcos
6 years
Beautiful paper from @ukhndlwl performing a battery of experiments to evaluate the long-range dependencies of LSTMs to word order, part of speech, word frequency, and more. Would be awesome to see these tests become part of the standard evaluation of RNNs!
Tweet media one
2
37
94
@arimorcos
Ari Morcos
6 years
Excited to share our recent work showing that pooling is neither necessary nor sufficient for appropriate deformation stability in CNNs! Rather, filter smoothness most directly modulates deformation stability in networks both *with* and *without* pooling.
Tweet media one
2
24
95
@arimorcos
Ari Morcos
1 year
Finally had a careful read of the "Textbooks are all you need" paper (). Is anyone else surprised that there are not, in fact, any textbooks in the dataset?
4
12
95
@arimorcos
Ari Morcos
6 years
Check out our latest preprint (with @maithra_raghu and Samy Bengio) on the relationship between representational similarity and generalization and optimization in CNNs and training and sequence dynamics in RNNs! 1/N Blog: Paper:
5
17
93
@arimorcos
Ari Morcos
7 years
Thought-provoking work from @julius_adebayo , @goodfellow_ian and others showing that saliency maps don't change much, even when *all* the weights are reinitialized! Do saliency maps tell us what the network cares about, or merely what the task demands?
0
28
91
@arimorcos
Ari Morcos
6 years
Important work from @sarahookr , @doomie , @piekindermans , and @_beenkim on quantifying the extent to which various saliency methods *actually* find relevant portions of images. Really happy to see more work towards bringing rigor into interpretability!
Tweet media one
1
24
91
@arimorcos
Ari Morcos
2 years
Excited and honored that our paper on beating power law scaling via data pruning earned an outstanding paper award at #NeurIPS2022 ! Congratulations to my amazing co-authors, Ben Sorscher, Robert Geirhos, @sshkhr16 , and @SuryaGanguli ! Paper:
@NeurIPSConf
NeurIPS Conference
2 years
Happy to announce the award winners of #NeurIPS2022
3
87
399
3
9
84
@arimorcos
Ari Morcos
5 years
Excited to share our blog summarizing some of our recent work understanding the boundaries of the lottery ticket hypothesis! Do lottery tickets generalize across datasets? Is the phenomenon present in RL and NLP? Can we begin to explain it theoretically?
1
17
82
@arimorcos
Ari Morcos
4 years
For anyone else closely tracking the new votes coming in, this is incredible:
3
29
82
@arimorcos
Ari Morcos
6 years
Very cool paper from @skornblith , Jon Shlens, and Quoc Le evaluating what factors lead to better feature transfer. I wonder if ResNets are best for transfer because their identity connections prevent task-irrelevant information from being thrown away...
Tweet media one
0
22
80
@arimorcos
Ari Morcos
1 year
Really excited that this is finally out!Photorealistic, completely controllable data for better evaluation of models!
@AIatMeta
AI at Meta
1 year
Today we're sharing our work on PUG, new research from Meta AI on photorealistic, semantically controllable datasets using Unreal Engine for robust model evaluation. More details & dataset downloads ➡️
13
90
470
1
9
78
@arimorcos
Ari Morcos
1 year
Another excellent example that data >>> model, this time from neuroscience. Two models trained on the same data will have similar behavior despite architectural differences (if enough data), even if those are between biological and artificial neural nets.
@talia_konkle
talia konkle
1 year
5/ Surprisingly, across both measures, we found that even major differences in model architecture (e.g. CNNs vs. Transformers) did not significantly lead to better or worse brain alignment…
Tweet media one
2
10
59
1
19
68
@arimorcos
Ari Morcos
6 years
While the lack of qualified reviewers is certainly a problem, I wish more authors viewed reviewer misunderstandings as a signal to invest more in the clarity of their writing and figures, rather than simply complaining about the reviewer.
3
15
68
@arimorcos
Ari Morcos
2 years
Very excited that this work earned an Outstanding Paper Award at ICLR! Congratulations to @erikwijmans and my other incredible co-authors, @ManolisSavva , @stefmlee , @irrfaan , and @DhruvBatraDB !
@erikwijmans
erikwijmans
2 years
How do 'map-less' agents navigate? They learn to build implicit maps of their environment in their hidden state! We study 'blind' AI navigation agents and find the following 🧵
1
6
46
1
5
63
@arimorcos
Ari Morcos
11 months
Thanks for sharing our work showing data curation can help us train models far faster to far better performance, @martin_casado ! Someone should figure out how to make it easy for companies who want to train their own models to use data curation at scale... 🤔🤔🤔 (stay tuned)
@martin_casado
martin_casado
11 months
Most people engaged in the safety discussion don't have a good sense of how hard it is to get past power law scaling (error drops relative to the power of the training set/model size). The industry is fighting against rapidly diminishing marginal returns.
Tweet media one
12
45
217
1
7
61
@arimorcos
Ari Morcos
5 years
Do lottery ticket initializations generalize, or are they overfit to the precise conditions used to generate them? If you're at #NeurIPS2019 , come see our poster #170 happening right now! Paper: Blog:
Tweet media one
1
7
59
@arimorcos
Ari Morcos
11 months
"Please give us your most valuable commodity and moat for free so that we can make money off of it instead of you." - Sam Altman, probably
Tweet media one
4
5
55
@arimorcos
Ari Morcos
6 years
Though it seems amazing at first, universality isn't actually a particularly useful property. Learnability is what we actually care about. A solution which is possible but not learnable might as well not exist.
@gabrielpeyre
Gabriel Peyré
6 years
A 1 hidden layer perceptron can approximate arbitrary continuous functions using a large enough number of neutrons (Cybenko's theorem).
7
123
326
1
11
48
@arimorcos
Ari Morcos
7 years
Also worth noting that while we only looked at class selective neurons, our recent @ICLR18 paper found that easily interpretable neurons were no more important than confusing neurons, which is at odds with the motivation behind many of these techniques.
1
8
50
@arimorcos
Ari Morcos
7 months
It never made sense to me how startups could ever beat big tech. After working in big tech for years and seeing the politics, it now makes perfect sense. It all comes down to incentives.
@daltonc
Dalton Caldwell
7 months
When a startup is competing against a large competitor, they aren't competing with the *entire* company, they are likely competing with some PM focused on internal politics/career progression. With this framing, it shouldn't be surprising to see startups win as often as they do.
68
389
3K
0
3
46
@arimorcos
Ari Morcos
6 months
A key part of bringing this cost down was @code_star and team's focus on increasing data quality to make training ~2x more token efficient. Massive efficiency gains can be had through better data! And because models don't converge -- compute multipliers = quality multipliers.
@ClementDelangue
clem 🤗
6 months
Just $10M and two months to train from scratch a GPT3.5 - Llama2 level model. For context, it probably cost 10-20x more to OAI just a year ago! The more we improve as a field thanks to open-source, the cheaper & more efficient it gets! All companies should now train their own
Tweet media one
11
56
502
0
5
45
@arimorcos
Ari Morcos
7 months
At @datologyai , we are pushing the frontier of data research to build products that make it easy for anyone to make the most of their data, automatically. We're hiring for a number of roles across research and engineering. If you're excited about data, please join us!
1
5
41
@arimorcos
Ari Morcos
6 years
Happy to release our review (with @dgtbarrett and @jakhmack ) exploring ways in which the machine learning and neuroscience communities might interact to best advance analysis and understanding of neural networks, whether they're biological or artificial!
Tweet media one
1
13
43
@arimorcos
Ari Morcos
9 months
Nothing is actually all you need, but... "Good data are all you need" is probably the truest form of that statement.
@hausman_k
Karol Hausman
9 months
Given that we realize that data is the most important piece for training foundational models, I'd focus the Manhattan Project on that. Coordinate with many experts to establish the most comprehensive, high-level database of human knowledge that is otherwise difficult to access.
5
6
44
0
3
42
@arimorcos
Ari Morcos
6 years
Nice summary of the advantages and disadvantages of the increased interest in ML, especially with respect to incentive schemes from @zacharylipton at the Critiquing Trends Workshop. #NeurIPS2018
Tweet media one
2
8
41
@arimorcos
Ari Morcos
6 months
Congrats to all my friends @AIatMeta for this awesome release! As we see again and again, better data --> better models.
@AIatMeta
AI at Meta
6 months
Introducing Meta Llama 3: the most capable openly available LLM to date. Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes. Today's release includes the first two Llama 3
350
1K
6K
0
2
42
@arimorcos
Ari Morcos
10 months
I'll be at #NeurIPS2023 next week. If you're interested in chatting about how we can make models better through better data, please reach out!
2
1
40
@arimorcos
Ari Morcos
1 year
This is why data curation can lead to big performance gains for the same training budget. As models learn, less and less of the data they see is useful, so the rate of learning slows. Eventually we just decide to stop training, but if models see better data, they'll learn faster!
@tomgoldsteincs
Tom Goldstein
1 year
The most intriguing line of the llama2 paper: "...the models still did not show any signs of saturation".
Tweet media one
14
32
302
0
3
39
@arimorcos
Ari Morcos
6 years
On my way to #NeurIPS2018 ! Happy to chat, especially about ways to build a science of deep learning! Also excited to present our work (joint with @maithra_raghu ) on using CCA to understand DNNs on Wednesday morning! Blog: Paper:
0
2
35
@arimorcos
Ari Morcos
6 years
How do we rigorously measure abstract reasoning capabilities in neural networks? Can we clearly defined different types of generalization? With @santoroAI , @dgtbarrett , Felix Hill and Tim Lillicrap, we introduce a new dataset in our @icmlconf paper to try!
@GoogleDeepMind
Google DeepMind
6 years
Measuring abstract reasoning in neural networks - our latest #ICML2018 paper - takes inspiration from human IQ tests to explore abstract reasoning and generalisation in deep neural networks by @dgtbarrett , Felix Hill, @santoroAI , @arimorcos , Tim Lillicrap
Tweet media one
9
246
504
0
11
35
@arimorcos
Ari Morcos
7 years
I wonder if all of these visualization methods are actually Rorschach ink blot tests for ML researchers. We see in them what we want to see, which may often just be the easiest to understand explanation.
1
9
34
@arimorcos
Ari Morcos
6 months
Could not be more excited to welcome @josh_wills to the @datologyai team!
@josh_wills
JosH100
6 months
Some personal news: yesterday was my first day at @datologyai ! I will be working on what I consider to be the most interesting problem in data engineering: curating training datasets for machine learning models.
21
2
176
3
1
35
@arimorcos
Ari Morcos
6 years
Inspired by word vector algebra, we also tested whether we could perform "scene algebra". Can you add and subtract the GQN representations for different scenes to create new objects? Yes, you can! This provides further evidence for factorized representations.
Tweet media one
1
14
33
@arimorcos
Ari Morcos
6 months
Come join us to do exactly this @datologyai !
@semiDL
Tapa Ghosh
6 months
Big labs all use active learning data filtering techniques for training- offering them as a service to enterprise users for fine tuning seems like a good startup idea
2
1
18
0
4
31
@arimorcos
Ari Morcos
5 years
Great to see that our recent #NeurIPS2019 paper on generalizing lottery tickets () has been reproduced as part of the NeurIPS reproducibility challenge by @Deepak120199 , @VarunGohil9 , and Atishay Jain! Report:
1
3
32
@arimorcos
Ari Morcos
2 years
Really excited that this is finally out! In work led by @erikwijmans , we show that agents with no sensory input beyond ego-motion can effectively navigate novel environments and do so by storing maps in memory despite no prior for mapping.
@AIatMeta
AI at Meta
2 years
📣 New paper: Emergence of Maps in the Memories of Blind Navigation Agents Humans have the ability to navigate poorly lit spaces by relying on touch and memory. Our research shows that blind AI agents can learn to do the same. Read the paper ➡️
6
106
511
1
4
32
@arimorcos
Ari Morcos
5 months
IMO fairness and bias are the most critical near-term AI risks, yet unfortunately they often get swept under the rug in favor of x-risk discussions. Fairness and bias ultimately come down to data. If your data is skewed relative to the real world, your model will be too!
@vikhyatk
vik
5 months
i just wish we spent more time addressing the real risks with AI - e.g. fairness, bias, privacy, sustainability - instead of talking about science fiction doomsday scenarios
5
18
109
2
6
32
@arimorcos
Ari Morcos
6 years
"Motivating the Rules of the Game for Adversarial Example Research" Really important perspective on adversarial examples from @jmgilmer , @goodfellow_ian , George Dahl et al asking if the security motivation for adversarial research actually makes sense.
Tweet media one
0
13
31
@arimorcos
Ari Morcos
2 years
Interested in the intersection between data curation, privacy, and fairness? @kamalikac , Chuan Guo, and I are looking to hire a postdoc at @MetaAI (FAIR) to investigate these directions. We encourage candidates of all backgrounds to apply.
3
7
28
@arimorcos
Ari Morcos
6 years
Very cool reproduction of world models showing that an untrained RNN is basically just as good! Perhaps we as a field should revisit reservoir computing. Or, alternatively, having an RNN may aid credit assignment if gradients can flow through...
1
8
28
@arimorcos
Ari Morcos
7 months
Very true and also something I've been guilty of saying in the past. Transformers have far weaker inductive biases than CNNs or RNNs, but they do still exist. However, this weakness allows them to easily learn the appropriate inductive bias given enough (quality) training
@ArmenAgha
Armen Aghajanyan
7 months
There is a commonly held belief that Transformers have no inductive bias and that this bias is learned throughout the training process. This is not true. Transformers have very strong inductive biases.
4
25
276
3
1
29
@arimorcos
Ari Morcos
6 years
Very nice write up on recent work on understanding generalization in deep learning, including a nice discussion of our recent @ICLR18 paper!
0
13
27
@arimorcos
Ari Morcos
5 years
At #ICML2019 ? Interested in understanding what's *actually* going on in our networks? Come to our workshop on #DeepPhenomena today in Hall B! We have an awesome lineup of speakers and papers!
0
2
26
@arimorcos
Ari Morcos
1 year
@code_star I guess "code labeled as educational by a random forest trained on GPT-4 outputs is all you need" isn't quite as catchy...
1
0
28
@arimorcos
Ari Morcos
8 months
Models are what they eat!
@_aidan_clark_
Aidan Clark
8 months
you know this whole data thing seems to matter more than I might have thought it did
3
4
54
0
2
28
@arimorcos
Ari Morcos
7 months
So happy to be working with @_RobToews and @radicalvcfund !
@radicalvcfund
Radical Ventures
7 months
We are excited to announce our investment in @datologyai !🚀Led by @arimorcos , Datology is tackling the crucial challenge of data curation, ensuring models are fed quality data for superior performance & efficiency. @_RobToews shares why we invested in this week's #RadicalReads :
Tweet media one
2
4
16
1
0
28
@arimorcos
Ari Morcos
6 years
On my way to @ICLR18 ! If you'll be in Vancouver and want to chat/hang out, hit me up! Also, come see our poster on Wednesday!
Tweet media one
1
2
27
@arimorcos
Ari Morcos
2 years
Very nice concurrent work complimenting our findings in SemDeDup regarding high levels of duplication in web datasets like LAION. Love the application to finding more copied images in generative models!
@_akhaliq
AK
2 years
On the De-duplication of LAION-2B approach demonstrates that roughly 700 million, or about a third of LAION-2B’s images, are duplicates abs: github:
Tweet media one
2
37
183
1
2
27
@arimorcos
Ari Morcos
1 year
Extremely well-reasoned thread on why open-sourcing foundation models isn't actually problematic, but rather necessary. I particularly agree with the hubris point -- LLMs are not that complicated! Building them is within the resources of many companies, let alone state actors.
@deliprao
Delip Rao e/σ
1 year
This is another one of those ill-thought, fear-mongering scientific disinformation about LLMs, and I will explain why in this long thread. 🧶
6
156
646
0
3
27
@arimorcos
Ari Morcos
4 years
Does the lottery ticket hypothesis generalize to RL and NLP? Come check out our #ICLR2020 paper to find out! First poster session starting now! ICLR: Paper: Blog post:
Tweet media one
1
2
26
@arimorcos
Ari Morcos
4 years
Are easy to interpret neurons helpful to performance in CNNs? In a new blog post, @leavittron and I summarize our work evaluating the causal impact of selective neurons, finding that easily interpretable neurons can actually be harmful to performance.
2
1
25
@arimorcos
Ari Morcos
2 years
Very excited to work with Amro during his AI residency at @MetaAI this year!
@amrokamal1997
Amro
2 years
Hello world I’m happy to share that I’m starting a new position as AI Resident at Meta AI ( #Facebook AI). A big THANK YOU to all the people I worked with/learned from till this point, especially during the last 18 months.
15
2
105
0
0
24
@arimorcos
Ari Morcos
4 years
Was really fun to appear on @underrated_ml last week! For my underrated paper: Are all layers created equal? from Chiyuan Zhang and collaborators:
@underrated_ml
underrated_ml
4 years
We are back! @arimorcos joins the podcast and talks about the role of different layers in a network. We also discuss Ari's journey from neuroscience research to ML. This was a good one, and our last episode in season 2. Time has flown, thanks for all the support!
Tweet media one
1
3
13
0
3
24
@arimorcos
Ari Morcos
8 months
This is a serious concern for the training of future models. We're polluting the Internet with low quality data. Synthetic data absolutely has its place, but it has to be targeted and curated.
@Grady_Booch
Grady Booch
8 months
22
90
775
1
1
24
@arimorcos
Ari Morcos
7 years
My favorite part of the YOLOv3 paper: the section on things which didn't work. More and more papers should include these and reviewers should demand them! It would be amazing if such sections became as widespread as related work sections.
Tweet media one
0
6
23
@arimorcos
Ari Morcos
7 months
This is exactly why we started @datologyai : to make creating high-quality training datasets easy and automatic for everyone.
@BlancheMinerva
Stella Biderman
7 months
A good rule of thumb: high quality dataset work is probably more important and impactful long-term than whatever you're currently working on. (Obviously this doesn't apply to *tweeting* which is well-known to be the highest leverage work in ML research 🤭)
5
23
203
1
0
23
@arimorcos
Ari Morcos
7 months
We could not be more excited to partner with @sarahcat21 , @dauber , and @AmplifyPartners to build @datologyai !
@AmplifyPartners
Amplify Partners
7 months
Announcing our investment in @datologyai ! Led by AI pioneers @arimorcos , @hurrycane & @leavittron , Datology is a data curation platform to reduce training costs & improve model performance. @sarahcat21 shares why the future of AI just got brighter:
1
4
28
2
0
23
@arimorcos
Ari Morcos
11 months
Launching ChatGPT plugins makes no sense if you're concerned about AI safety and should make people extremely skeptical about OpenAI's sincerity.
@BlancheMinerva
Stella Biderman
11 months
@TechnologyPat @markchen90 If I ran OAI and believed what they claimed to, I would have faked GPT-3 not working. I would have done everything i could to avoid the scaling race that OpenAI has caused. I wouldn't have rolled out insecure GPT integrations. I wouldn't have given it internet access.
4
7
73
2
2
23
@arimorcos
Ari Morcos
7 months
It's all about the data!
@aidan_mclau
Aidan McLau
7 months
can someone who knows more about model pretraining than i explain how grok is worse than mixtral at 7 times the size
35
9
250
0
1
23
@arimorcos
Ari Morcos
7 years
And related to the move towards confusing (or "mixed selectivity" neurons) in neuroscience, with lots of recent results showing that they carry tons of information. E.g., from @MattiaRigotti and also among many others.
1
1
22
@arimorcos
Ari Morcos
6 months
This has never made sense to me, but is common practice in big tech. The cost of ramping someone up is significant and you have far more signal that current high performers will perform well in the future than you do from a 1 day interview. And yet this is standard.
@mscccc
Mike Coutermarsh
6 months
MSFT stock grants New hire: can’t code. has 30k twitter followers. $800k over 4 years. Current eng: finds 500ms backdoor, saves world. $67k retention grant, vests over 5 years.
45
252
6K
1
0
22
@arimorcos
Ari Morcos
6 years
This is a really wonderful idea. It'd be neat to see similar workshops are NIPS and ICML.
@deviparikh
Devi Parikh
6 years
Tomorrow at 9:00 am! How To Be A Good Citizen Of The CVPR Community, Ballroom E Talks on how to write a paper, give a talk, review, do research, manage time, mentor, lead, be inclusive, do reproducible research, collaborate, ... I can't wait! #CVPR18
Tweet media one
1
12
70
0
4
21
@arimorcos
Ari Morcos
9 months
When all that matters is compute and data, the highest leverage comes either from working on compute or data. Personally, I don't know how to make GPUs go faster, so...
@mealreplacer
Julian
9 months
There is still a surprising amount of alpha in reading and actually internalizing the material discussed in this very short and clear essay
Tweet media one
17
32
507
0
1
21
@arimorcos
Ari Morcos
5 years
Excited to talk about our recent work on understanding the generalization properties of lottery tickets today at #ReWorkDL Montreal!
1
3
21
@arimorcos
Ari Morcos
4 years
It's stunning that researchers think it's appropriate to try to predict "criminality" based on appearances. This is modern day phrenology and should have no place in our field. I have signed and I encourage others to do so as well.
@Abebab
Abeba Birhane
4 years
Springer Nature plans to publish an article "A Deep Neural Network Model to Predict Criminality Using Image Processing" that revives long discredited physiognomist pseudoscience. Sign this petition to urge @SpringerNature to refrain from publishing. RT!
23
225
401
0
3
21
@arimorcos
Ari Morcos
1 year
I'll be at #ICML2023 next week from 7/24-7/29. If you're interested in understanding and improving data, please reach out and happy to chat!
2
0
21
@arimorcos
Ari Morcos
2 years
@erikwijmans @ManolisSavva @stefmlee @irrfaan @DhruvBatraDB I'm especially pleased by this quote from the award committee regarding our paper: “I hope that the demonstrated rigor in building up an argument towards answering questions about learned representations will inform future studies across the ICLR community.”
1
4
20
@arimorcos
Ari Morcos
5 years
Couldn't agree with this more. If you write a paper such that a reader can easily understand the motivation for each experiment, the results will often seem "obvious" even if no one would have predicted them before reading the paper!
@boknilev
Yonatan Belinkov
5 years
Food for thought for #acl2020nlp reviewers: if the work seems "trivial", "expected", or "straightforward", this isn't necessarily a bad thing. In fact, it may mean that the authors did a good and convincing job. @pmphlt has a nice take on this:
Tweet media one
3
14
99
0
0
20
@arimorcos
Ari Morcos
4 years
Though it's not always valued as such, *communicating* science is just as important as doing science. If a paper is difficult to understand, fewer people will read it and fewer will build on it. These are some really fantastic tips for increasing clarity in paper writing.
@jbhuang0604
Jia-Bin Huang
4 years
Sharing one idea I found useful for paper writing: Do NOT ask people to solve correspondence problems. Some Dos and Don'ts examples below: *Figures*: Don't ask people to match (a), (b), (c) ... with the descriptions in the figure caption.
Tweet media one
9
309
1K
1
1
20
@arimorcos
Ari Morcos
6 years
"Science aims to understand and explain." Couldn't agree more with Joelle Pineau's #ICLR2018 talk. We need to view our advances skeptically, and make sure we understand *why* they work. Otherwise, our community will constantly trip trying to build on results which don't hold up.
0
4
20
@arimorcos
Ari Morcos
7 years
Saliency, activation maximization, etc. give us the impression of understanding, but it's often extremely difficult to express the conclusion of this "understanding." Absent a falsifiable hypothesis and rigorous quantification, can we actually say we've learned anything?
2
0
18
@arimorcos
Ari Morcos
6 months
"Garbage in, garbage out" never stopped being true!
@mattturck
Matt Turck
6 months
2014, Big Data panels: "data is the new oil", "garbage in garbage out" 2024, AI panels: "data is the new oil", "garbage in garbage out"
15
20
116
0
0
19
@arimorcos
Ari Morcos
7 months
Really enjoyed this conversation with @prateekvjoshi on the Infinite ML pod about the potential of fully automated data curation and our mission @datologyai . Have a listen if you'd like to learn more!
@prateekvjoshi
Prateek Joshi
7 months
The topic on Infinite ML pod today is algorithmic data curation. We have @arimorcos to talk about it. He's the cofounder and CEO of @datologyai . In this clip, he talks about the odds that the next data point in your training dataset is going to teach something new to your AI
2
0
10
0
1
19
@arimorcos
Ari Morcos
6 years
As someone who came up in neuroscience, where journal publication is all that matters is, the fixed time from submission to publication for conferences is spectacular.
@hardmaru
hardmaru
6 years
If machine learning research papers are meant for journals rather than for conferences, the original GAN paper might have been published sometime in 2016. DCGAN might have been published in 2018, if at all.
9
36
215
1
0
18
@arimorcos
Ari Morcos
4 years
"Towards falsifiable interpretability research" In this position paper, @leavittron and I discuss case studies to exemplify the importance of falsifiability in interpretability research. Intuition is important, but unverified intuition can be dangerous.
Tweet media one
0
1
19
@arimorcos
Ari Morcos
6 years
Fun blog post from @ericjang11 demonstrating that even "dumb" learning rate schedules (such as following the pixels of arbitrary images) work better than a fixed learning rate. Also a great example of how simple experiments can make a general point!
0
4
19
@arimorcos
Ari Morcos
6 years
I literally cannot agree more with this sentiment. Well-controlled, rigorous toy experiments >> unclear large-scale experiments. Unfortunately, much of the field doesn't agree (e.g., criticism of the beautiful adversarial spheres paper for being too toy: )
@hardmaru
hardmaru
6 years
Carefully designed, small-scale toy experiments can explain why your algorithm works better than the baseline in a controlled context, and they are often more useful than demonstrating incremental SOTA improvements on large-scale, compute-intensive tasks.
3
42
208
0
1
19
@arimorcos
Ari Morcos
7 months
100% agree. General purpose models make a ton of sense for consumer use cases, but enterprises can benefit tremendously from smaller, specialized models trained on a company's own data because tasks are far better constrained.
@realSharonZhou
Sharon Zhou
7 months
Fixing hallucinations in general LLMs is very hard. Fixing hallucinations in enterprise LLMs is easy. This is by design. Why? ⚪️ For the general LLM, you want it to perform all sorts of tasks from 5th grade science questions to bedtime stories to even all enterprise use cases.
12
19
135
1
0
18