Yasaman Bahri @yasamanbb Twitter profile

Last Seen Profiles

@iamwooyeon

@tomyoo23

@LayaShay

@thearchidiary

@Latinitas

@yashkaf

@shawnhollenbach

@ElviiDM

@LauraPAuthor

@stw_pdg

@bokeplokalmalam

@derpder91246895

@PaoSemAcento

@Aldaremlaken

@hamd13999869608

@qinyi_x

@greenwoodfc

@ambasadorfidbek

@JulieCanight

@__MyCall__

@ruru_l0v

@fkdinthehead13

@8Blits

@EmpressBianca_

@ladidaix

@Joe_Joe_Freshx

@DavidDerhy

@kumada_illust

@honeysparklers

@Mariusz00048669

@reignofglitter

@ArdenElizaJones

@Ahmad0566A

@hamd13999869608

@FirstNFTP

@itzi76110881017

Yasaman Bahri

@yasamanbb

7 years

We recently finished a paper, "Deep Neural Networks as Gaussian Processes," . Training without "training"! :)

Deep Neural Networks as Gaussian Processes

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width....

arxiv.org

8

170

526

Yasaman Bahri

@yasamanbb

1 year

The Simons Institute recently held a great workshop on LLMs -- lots of informative talks on a range of topics including studies of in-context learning, localization in LLMs, understanding emergent behavior, and many others.

Large Language Models and Transformers

The goal of this workshop is to try to understand the ongoing revolution in transformers and large language models (LLMs) through a wide lens (including neuroscience, physics, cognitive science, and...

simons.berkeley.edu

3

83

451

Yasaman Bahri

@yasamanbb

6 years

Our ICML 2018 paper on training ultra-deep (10k+ layers) CNNs is now up, from work we've done at Google Brain: . We examine the relationship of trainability to signal propagation and Jacobian conditioning in networks with convolutional layers, ... (1/3)

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train...

In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing...

arxiv.org

3

122

396

Yasaman Bahri

@yasamanbb

4 years

In our preprint “The large learning rate phase of deep learning: the catapult mechanism" , we show that the choice of learning rate (LR) in (S)GD separates deep neural net dynamics into two sharply distinct types (or "phases", in the physics sense). (1/n)

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their...

arxiv.org

1

54

312

Yasaman Bahri

@yasamanbb

2 months

Our work on neural scaling laws now appears in PNAS @PNASNews !

Explaining neural scaling laws | PNAS

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset ...

www.pnas.org

5

38

266

Yasaman Bahri

@yasamanbb

6 years

The time is ripe for a move towards publishing notebooks as standard practice in scientific research. And indeed, both Mathematica (which still amazes me) and Jupyter are elegantly designed and satisfying tools to use.

3

45

156

Yasaman Bahri

@yasamanbb

3 years

Consider submission to our ICML 2021 workshop: Overparameterization: Pitfalls & Opportunities focused specifically on the role of overparameterization in machine learning. Organized by @HanieSedghi @QuanquanGu @aminkarbasi & myself. Deadline: June 21

ICML 2021 OPPO

Overview Modern machine learning models are often highly overparameterized. The prime examples of late are neural network architectures that can achieve state-of-the-art performance while having many...

sites.google.com

1

35

147

Yasaman Bahri

@yasamanbb

4 months

Our recent preprint — truly at the intersection of AI & physics — evaluates the ability of LLMs to perform advanced theoretical physics calculations.

Quantum Many-Body Physics Calculations with Large Language Models

Large language models (LLMs) have demonstrated an unprecedented ability to perform complex tasks in multiple domains, including mathematical and scientific reasoning. We demonstrate that with...

arxiv.org

5

25

137

Yasaman Bahri

@yasamanbb

5 years

We're organizing an ICML 2019 Workshop, "Theoretical Physics for Deep Learning." See for more info. The deadline is April 30. cc @hoonkp

Physics for Deep Learning Workshop

Updates (2019/07/16): Links to talk videos and slides are updated at the Schedule page

sites.google.com

1

31

107

Yasaman Bahri

@yasamanbb

4 years

Our review, focused on some recent efforts at this interface and hopefully accessible to a non-ML science audience.

Surya Ganguli

@SuryaGanguli

4 years

1/ Our new paper in @AnnualReviews of Condensed Matter Physics on “Statistical Mechanics of #DeepLearning ” with awesome collaborators @Stanford and @GoogleAI : @yasamanbb @kadmonj Jeff Pennington @sschoenholz @jaschasd web: free:

2

125

429

0

19

105

Yasaman Bahri

@yasamanbb

5 years

The schedule and papers for our ICML workshop "Theoretical Physics for Deep Learning" on Friday are now updated . We are looking forward to the discussions and hope you find the workshop fruitful!

Physics for Deep Learning Workshop

Updates (2019/07/16): Links to talk videos and slides are updated at the Schedule page

sites.google.com

2

14

101

Yasaman Bahri

@yasamanbb

3 years

Quanta writes on some of the work connecting deep neural networks and kernels.

Quanta Magazine

@QuantaMagazine

3 years

Researchers may have caught a glimpse inside the black box of artificial neural networks by establishing their mathematical equivalence with older algorithms called kernel machines. @Anilananth reports:

4

60

205

1

10

98

Yasaman Bahri

@yasamanbb

4 years

Thrilled to be participating in the Rising Stars 2020 Workshop! Hosted this year by @Berkeley_EECS , .

Rising Stars 2020 - EECS at Berkeley

eecs.berkeley.edu

6

2

89

Yasaman Bahri

@yasamanbb

3 years

Exciting announcement! There will be a Les Houches school on statistical physics & machine learning.

Lenka Zdeborova

@zdeborova

3 years

The Nobel prize for @giorgioparisi seems to be a perfect excuse to announce the summer school we organize in July 2022 in Les Houches where statistical physics applied to understand machine learning will be at the centre of attention.

17

135

572

0

11

82

Yasaman Bahri

@yasamanbb

6 years

"Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent." This seems to hold up surprisingly well for more complex, non-vanilla models studied so far. The dynamics is exact in the infinite width limit (in a certain regime) ...

Wide Neural Networks of Any Depth Evolve as Linear Models Under...

A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of...

arxiv.org

1

11

72

Yasaman Bahri

@yasamanbb

3 years

Listen to a diverse line-up of speakers and panelists at the "Conceptual Understanding of Deep Learning" workshop (Google organized) this Mon May 17: @rinapy Will be live here: #ConceptualDLWorkshop

Conceptual Understanding of Deep Learning Workshop

0:00 Welcome by Rina Panigrahy11:40 Workshop's goal16:23 talk on "How to Augment Supervised Learning with Reasoning" by Leslie Valiant39:40:00 talk on "Langu...

www.youtube.com

0

15

70

Yasaman Bahri

@yasamanbb

4 years

A talk I gave recently at Harvard IACS is now posted online:

Yasaman Bahri: Theoretical Building Blocks From The Study of Wide...

Presented by Yasaman Bahri, Google BrainFull Title: "Understanding Deep Learning: Theoretical Building Blocks From The Study of Wide Networks"Talk Descriptio...

www.youtube.com

2

4

64

Yasaman Bahri

@yasamanbb

5 years

Our poster from earlier this week in case you missed it!

Jaehoon Lee

@hoonkp

5 years

@Locchiu @sschoenholz @yasamanbb @jaschasd

1

4

26

0

3

58

Yasaman Bahri

@yasamanbb

2 years

At APS March Meeting this week! Speaking at the session Statistical Physics Meets Machine Learning I, . Come by if you'd like to learn more.

0

2

40

Yasaman Bahri

@yasamanbb

1 year

This workshop has had an exciting lineup of talks and been wonderfully interactive ... looking forward to the remainder!

Simons Institute for the Theory of Computing

@SimonsInstitute

1 year

Large Language Models and Transformers workshop this week! Registration for in-person attendance is now closed, but you can still join us for the livestream: Workshop schedule:

3

12

48

0

35

Yasaman Bahri

@yasamanbb

4 years

I highly recommend this curriculum for learning about recent work in mean-field theory / random matrix theory in DL. @vinayramasesh , @rileyfedmunds , and Piyush Patil did an exceptional job dissecting these papers and creating a guide that is both pedagogical and thorough.

Depth First Learning

@DepthFirstLearn

4 years

We are excited to debut Resurrecting the Sigmoid! It’s our 8th curriculum and was part of the DFL Jane Street Fellowship, with fellows @vinayramasesh , @rileyfedmunds and Piyush Patil. Check it out → .

2

19

97

0

5

42

Yasaman Bahri

@yasamanbb

5 years

(1/2) Looking forward to two exciting workshops tomorrow! I'll be on a panel at the Science Meets Engineering of DL workshop in the morning as well speaking at the ML and Physical Sciences workshop in the afternoon ...

SEDL NeurIPS 2019

Main - Why SEDL? - Call for Papers - Organizers

sites.google.com

1

7

39

Yasaman Bahri

@yasamanbb

3 years

Consider submitting to our ICML 2021 workshop "Overparameterization: Pitfalls & Opportunities" -- the deadline has been extended to this Sun June 6.

Hanie Sedghi

@HanieSedghi

3 years

Deadline to submit to our #ICML2021 workshop Overparametrization: Pitfalls & Opportunities is now extended to June 6th. We look forward to your submissions! w/ @yasamanbb @QuanquanGu @aminkarbasi

0

7

31

0

1

38

Yasaman Bahri

@yasamanbb

7 years

Two interesting recent blog posts + papers from , on (i) generalization and (ii) the potential for network depth to accelerate optimization.

1

13

35

Yasaman Bahri

@yasamanbb

10 months

Excited for this program at KITP which began this week!

Kavli Institute for Theoretical Physics

@KITP_UCSB

10 months

Welcome to the Deep Learning from the Perspective of Physics and Neuroscience ( #deeplearning23 ) program at #KITP ! Nov 13, 2023 - Dec 22, 2023 Find more information at Watch recorded talks at

0

39

146

0

3

32

Yasaman Bahri

@yasamanbb

2 years

Much appreciation to @zdeborova & @KrzakalaF for a brilliantly organized school! Many stimulating discussions during my time there & I enjoyed the other lecturers' stellar, insightful lectures. It was also personally rewarding & gratifying to participate as lecturer & teach. 1/2

Lenka Zdeborova

@zdeborova

2 years

Join the ride and let everyone know about with a unique set of technical lectures on theory of deep learning with a stat phys touch. Featuring @GiulioBiroli @ylecun @yasamanbb @Andrea__M @boazbaraktcs @SaraASolla @HSompolinsky @BachFrancis @marc_mezard

6

172

751

1

0

37

Yasaman Bahri

@yasamanbb

7 years

Our ICML 2017 paper, "Geometry of Neural Network Loss Surfaces via Random Matrix Theory," has been published:

1

10

34

Yasaman Bahri

@yasamanbb

3 years

Join us tomorrow for our workshop "Overparameterization: Pitfalls & Opportunities":

Amin Karbasi

@aminkarbasi

3 years

Our ICML workshop "Overparameterization: Pitfalls & Opportunities" will take place this Saturday. The updated schedule and accepted papers can be found at @QuanquanGu @HanieSedghi @yasamanbb

1

8

54

0

5

36

Yasaman Bahri

@yasamanbb

7 years

The kind of sight that would be a familiar from an experimental physics lab. IBM brought a prototype of their quantum computer to NIPS. (Most of what you see here is the cryogenics/dilution fridge.)

0

11

33

Yasaman Bahri

@yasamanbb

6 years

A very stimulating first day at this workshop on machine learning and quantum many-body physics at the Simons Foundation's Flatiron Institute.

2

0

32

Yasaman Bahri

@yasamanbb

6 years

(1/3) Couldn't agree more. I get a lot more satisfaction from textbooks and have been on the lookout for great ones in new areas I want to learn since I've graduated. A couple favorites from early coursework: Intro Classical Mechanics -- Kleppner & Kolenkow

Andrej Karpathy

@karpathy

6 years

Discovering (paradoxically late in life) that I get more out of textbooks than books and that I don’t have to stop buying them just because I’m out of school. Good reading list pointers:

27

637

3K

2

1

31

Yasaman Bahri

@yasamanbb

6 years

A spatially non-uniform kernel allows more modes of signal propagation in deep networks. Based off of this, we suggest an initialization scheme which allows us to train plain CNNs with up to 10,000 layers. (3/3)

0

2

31

Yasaman Bahri

@yasamanbb

5 years

An excellent point, and one of a number of reasons why I think it is both natural and valuable to have a part of machine learning theory be theory "in the style of theoretical physics." (Crucially, I don't mean restricted to or artificially connected to existing phenomena .....

Zachary Lipton

@zacharylipton

5 years

Should we be trying to prove anything at all? While it seems much ML theory comes via physicists, *the physicist in the Deep Phenomena audience* characterizes physics as a “purely empirical” discipline, questions whether we should try to prove anything in ML. #icml2019

9

15

89

2

3

31

Yasaman Bahri

@yasamanbb

6 years

Had a really fun time at Fermilab today; appreciate the insightful conversations and questions, and the deep interest in ML ... thank you for hosting!

iamstarborn

@iamstarnord

6 years

. @yasamanbb from @Google is teaching @Fermilab physicists about the correspondences between Gaussian processes and neural nets. #MachineIntelligenceSeminar #MachineLearning

0

16

1

0

30

Yasaman Bahri

@yasamanbb

6 years

We have two papers at ICLR 2018. Come by our posters for (Tues PM) and (Mon PM) if you're interested!

Deep Neural Networks as Gaussian Processes

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width....

arxiv.org

0

4

28

Yasaman Bahri

@yasamanbb

4 years

Looking forward to it!

Greg Yang

@TheGregYang

4 years

Hear @yasamanbb next Wednesday 12pm EDT talk about the effect of large learning rate on the training of wide neural networks! What is beyond the neural tangent kernel? Only at Physics ∩ ML! Sign up for the mailing list here !

0

3

18

1

0

28

Yasaman Bahri

@yasamanbb

7 years

OpenAI Scholars is a new program designed to help individuals from underrepresented groups get into AI. Consider applying!

1

15

24

Yasaman Bahri

@yasamanbb

4 years

@shoyer Not an "intuitive" (first) explanation, but there's a way to cast it using Renormalization Group that I find pretty cool. (Roughly) Consider the N random variables as leaves of a tree and consider the iterative coarse-graining process of transforming pairs of leaves ....

1

26

Yasaman Bahri

@yasamanbb

4 years

I'm really looking forward to digging through this. I've always enjoyed the writing by these authors for the clarity of thought and focus on the important stuff. For those who haven't seen:

Ben Recht

@beenwrekt

4 years

I’m excited to share a new textbook @mrtz and I wrote: "Patterns, Predictions, and Actions: A Story about Machine Learning." 1/5

19

408

2K

0

4

24

Yasaman Bahri

@yasamanbb

7 years

The "test of time" award talk by Ali Rahimi was so refreshing. As a relative newcomer to ML, the alchemy has only added to my culture shock. Thank you to the awardees for giving that talk.

2

0

24

Yasaman Bahri

@yasamanbb

3 years

An interesting (and classic) read about discourse amongst statisticians from 2001: Breiman, "Statistical Modeling: The Two Cultures."

Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)

There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models...

projecteuclid.org

0

3

23

Yasaman Bahri

@yasamanbb

5 years

A kind thanks to the organizers for the opportunity to speak at DeepMath last week . It was a lot of fun to interact with and communicate to a new audience.

Ahmed El Hady

@zamakany

5 years

DeepMath 2019 was a great success . Many speakers told us that they enjoyed it and loved the fact that it brought different theoretical approaches from physics ,math, neuroscience , engineering and computer science . DeepMath 2020 is already in the plans. Stay tuned @deepmath1

0

2

18

1

4

22

Yasaman Bahri

@yasamanbb

6 years

Thanks Miles for organizing this! It was a lot of fun to participate. Fantastic questions from the teachers.

Miles Stoudenmire

@MStoudenmire

6 years

Yasaman Bahri from Google Brain reporting from the front lines of machine learning research at the KITP High School Teachers Conference @KITP_UCSB

0

1

17

0

1

20

Yasaman Bahri

@yasamanbb

4 years

@shoyer (e.g. X1, X2) into random variables that are the sum (X1 + X2) and difference (X1 - X2), and marginalize out the difference. This generates a recursion relation / flow amongst distributions for the RV which is the sum, whose (~unique) fixed point is the Gaussian distribution.

3

0

20

Yasaman Bahri

@yasamanbb

6 years

extending a line of earlier work using mean field theory. The spatial distribution of convolutional kernels appears to play an important role: CNNs initialized with spatially uniform conv kernels perform like fully-connected networks at large depths. .... (2/3)

1

19

Yasaman Bahri

@yasamanbb

6 years

This was fun :) Happy to see curious audiences.

Dmitry Krotov

@DimaKrotov

6 years

Had an interesting experience speaking to the general public about deep learning at the Physics Cafe @AspenScience together with @yasamanbb

0

12

1

0

17

Yasaman Bahri

@yasamanbb

7 years

Attending #NIPS2017 (we have 2 papers in Sat workshops) and would love to chat research -- ML and physics -- with other researchers; if so, please message me!

1

0

17

Yasaman Bahri

@yasamanbb

5 years

We've extended the submission deadline to May 24.

Yasaman Bahri

@yasamanbb

5 years

We're organizing an ICML 2019 Workshop, "Theoretical Physics for Deep Learning." See for more info. The deadline is April 30. cc @hoonkp

1

31

107

1

17

Yasaman Bahri

@yasamanbb

7 years

This paper looks at the early learning period in neural networks with some interesting experiments. They examine final test performance when corrupted training data is used in the early stages of learning (until epoch T) and ...

1

2

17

Yasaman Bahri

@yasamanbb

6 years

(2/3) Principles of Quantum Mechanics -- R. Shankar Statistical Mechanics/Field Theory: books by Frederick Reif; Mehran Kardar (2x) Principles of Math. Analysis: Rudin Algebra: Michael Artin Modern Condensed Matter Theory: A. Altland & B. Simons

2

0

17

Yasaman Bahri

@yasamanbb

6 years

At KITP this week for a conference at the intersection of physics & machine learning -- check out .

0

2

14

Yasaman Bahri

@yasamanbb

7 years

My dissertation has been published, rather surreal : ) Was a labor of love. (1/2)

1

0

13

Yasaman Bahri

@yasamanbb

7 years

while high-level not as much. (2) Using Gaussian noise as the input leads to a critical period but doesn't do as much damage as blurring! (3) Deeper networks experience greater damage. (See paper for comments re learning rate in experiments.)

Critical Learning Periods in Deep Neural Networks

Similar to humans and animals, deep artificial neural networks exhibit critical periods during which a temporary stimulus deficit can impair the development of a skill. The extent of the...

arxiv.org

1

0

11

Yasaman Bahri

@yasamanbb

4 years

the max LR for this.) We find that the best generalizing networks are often obtained when the LR is chosen to lie in the catapult phase. So, we think the existence & properties of the phases are consequential for generalization & LR tuning.

1

0

13

Yasaman Bahri

@yasamanbb

2 years

Nice collection of articles on AI in Daedalus -- a few describe interesting historical evolution:

AI & Society

AI is transforming our relationships with technology and with others, our senses of self, as well as our approaches to health care, banking, democracy, and the courts. But while AI in its many forms...

www.amacad.org

1

4

13

Yasaman Bahri

@yasamanbb

6 years

A historical look at the evolution of a subfield: . The naming "solid-state" brought in industrial physicists and applied questions; later "condensed matter" unified and emphasized the fundamentality of scope. "Physics is what physicists decide it is..."

When condensed-matter physics became king

The story of how solid-state physics emerged in the postwar period and was eventually rebranded as condensed-matter physics illuminates some major shifts in the

pubs.aip.org

0

1

12

Yasaman Bahri

@yasamanbb

1 year

Also was humbled by the opportunity to give a talk alongside such a distinguished group of speakers. I spoke about a taxonomy for scaling laws: .

1

12

Yasaman Bahri

@yasamanbb

5 years

Glad to read and welcome these suggestions for rethinking the ML publication process .

1

0

12

Yasaman Bahri

@yasamanbb

7 years

Quantum Phenomena in Interacting Many-Body Systems: Topological Protection, Localization, & Non-Fermi Liquids, (2/2)

0

11

Yasaman Bahri

@yasamanbb

5 years

I think there is also a discussion to be had about the ways in which machine learning science might qualify as both a natural science and a design science, and how the latter would make such theory different from traditional theoretical physics.

2

1

12

Yasaman Bahri

@yasamanbb

4 years

An interesting thread. This was intended for another discipline (economics) but got me to reflect on how this manifests in other fields. In deep learning (specifically), it seems there is work on both types ("a model of X," by which I mean a single model which has widespread

Jason Abaluck

@Jabaluck

4 years

A critical error that I see many grad students make: they try to estimate Frankenstein's model. Rather than viewing a model as answering a research question, they view a model as an arbitrary hodgepodge of models they learned about in their classes.

5

64

365

1

12

Yasaman Bahri

@yasamanbb

5 years

(2/2) , broadly covering our body of work on wide, deep networks.

0

11

Yasaman Bahri

@yasamanbb

4 years

Among other things, the solvable models illustrate the dynamical mechanism underlying the catapult phase &, we hope, point towards what a more complete theory of NN dynamics would look like.

2

0

10

Yasaman Bahri

@yasamanbb

7 years

Exciting times for revisiting the foundations -- IAS initiative in theoretical machine learning

0

10

Yasaman Bahri

@yasamanbb

6 years

(3/3) ...and *many* others. Better to pick one subject and learn it well than dabble in too many. Great textbooks teach you the things that you remember and build on for years.

0

11

Yasaman Bahri

@yasamanbb

5 years

Great workshop!

Greg Yang

@TheGregYang

5 years

1/2 How can physics and ML inform each other? We hope to find out at Physics ∩ ML workshop @MSFTResearch commencing tomorrow! Feat. awesome folks like Fields medalist Mike Freedman, Rumelhart prize winner Paul Smolensky, Sackler prize winner Mike Douglas

4

60

225

1

0

10

Yasaman Bahri

@yasamanbb

3 years

@michael_nielsen One of his tour de force applications, imo, is when he uses it to solve the Kondo problem. (I believe it's here , though I don't have access!)

The renormalization group: Critical phenomena and the Kondo problem

This review covers several topics involving renormalization group ideas. The solution of the $s$-wave Kondo Hamiltonian, describing a single magnetic impurity in a nonmagnetic metal, is explained in...

journals.aps.org

1

0

10

Yasaman Bahri

@yasamanbb

5 years

approach, it seems an unnatural place to start understanding a complex (and, unlike physics, essentially fully experimentally accessible!) system.

2

0

10

Yasaman Bahri

@yasamanbb

8 years

Quite excited to be attending #nips2016 and #WiML2016 ! First time at an AI/ML conference. Looking forward to meet, learn, and discuss!

0

2

8

Yasaman Bahri

@yasamanbb

2 months

Part of a special feature by PNAS on physics & machine learning.

1

0

7

Yasaman Bahri

@yasamanbb

2 years

Indeed a unique set of lectures that was very thoughtfully curated by the organizers in and between disciplines. It's the most inter- & multi-disciplinary program I've been a part of, a challenge to accomplish well. Highly recommend!

1

0

7

Yasaman Bahri

@yasamanbb

4 years

This work is with Aitor Lewkowycz, @ethansdyer , @jaschasd , and @guygr . We predict these two types of dynamics (distinguished by choosing LR below or above a specific threshold) theoretically in a class of NNs with solvable dynamics. Empirically, the two phases are ...

1

7

Yasaman Bahri

@yasamanbb

4 months

We consider a classic framework used by theorists to study quantum systems & quantum materials — Hartree-Fock mean field theory — and we ask whether LLMs can work through the steps of such calculations for real research problems. Crucially, we draw these calculations from actual

1

0

7

Yasaman Bahri

@yasamanbb

2 years

@BlackHC We pointed this out in , see Fig. 1 plots on the diagonal where all curves have ~the same exponent. (Apologies that this involves advertising one's own paper!) Most large-scale vision / NLP models tend to not be in this regime where you get ...

Explaining Neural Scaling Laws

The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We...

arxiv.org

1

7

Yasaman Bahri

@yasamanbb

4 years

leads to flatter minima because the curvature decreases by a large amount. The distinction between the two regimes -- lazy and catapult -- becomes sharper as the networks get wider. (At yet larger LRs beyond the catapult phase, GD diverges. Our theoretical model predicts ...

1

0

7

Yasaman Bahri

@yasamanbb

4 years

observable across many settings varying architecture and training protocols -- the phenomena here are rather universal. Our notions of "small", "large" (and "divergent") LR regimes are all based on a simple measurement at initialization (!). (cont)

1

7

Yasaman Bahri

@yasamanbb

4 years

The small LR phase (the "lazy phase") is related to existing wide network theory (the Neural Tangent Kernel result). The large LR phase needs a different starting point for its theoretical description. We termed this phase ...

1

2

6

Yasaman Bahri

@yasamanbb

4 years

the "catapult phase" because of the nature of the dynamics that occurs. Signatures of the catapult phase include a growth in the loss early in training, before decreasing again, paired with a simultaneous decrease in local curvature. Ultimately, training in the catapult phase ...

1

6

Yasaman Bahri

@yasamanbb

5 years

@skornblith I agree. I think perhaps a more natural interpretation, to have distributions over representations rather than deterministic mappings, is to instead consider ensembles of networks, where for instance the ensemble is over the random initialization.

0

6

Yasaman Bahri

@yasamanbb

7 years

In latter case, looking at deep linear networks disentangles effect of expressivity from potential acceleration. Key observation: dynamics induced on the the end-to-end map of the whole network is gradient descent with a particular preconditioner.

1

0

6

Yasaman Bahri

@yasamanbb

3 years

@lishali88 Thank you Lisha for the kind words & support! Grateful for your friendship and encouragement. ❤️❤️

0

6

Yasaman Bahri

@yasamanbb

9 months

@ekindogus Congratulations to you and team!

0

6

Yasaman Bahri

@yasamanbb

4 months

This was a great collaboration between Google DeepMind, Google Research, Harvard Physics/SEAS, & Cornell Physics (in particular noting first author Haining Pan, Michael Brenner, & Eun-Ah Kim @eunahkim ).

0

1

6

Yasaman Bahri

@yasamanbb

1 year

The small scale atmosphere made it possible to probe and engage with the material and speakers more deeply than I've experienced in any pure ML program before. (Only in physics programs prior.) Thanks to Umesh Vazirani for creating & facilitating this workshop atmosphere.

1

0

6

Yasaman Bahri

@yasamanbb

5 years

I interpret "slow science" to mean science which relies on some good thoughts (can be, but not necessarily, slow). [Let's say "good" involves some hard thinking or a decent amount of sweating.] "It deserves revival and needs protection."

0

6

Yasaman Bahri

@yasamanbb

7 years

training on uncorrupted data is resumed after time T. Depending on the nature of data corruption and how long it lasts, this can lead to "irreversible damage" in the final performance. Some findings: (1) corruption in low-level statistics do lead to these "critical" periods ...

1

0

5

Yasaman Bahri

@yasamanbb

7 years

Have been waiting for results like these to appear ... right up my alley! : ) "Deep Learning and Quantum Physics"

Deep Learning and Quantum Entanglement: Fundamental Connections...

Deep convolutional networks have witnessed unprecedented success in various machine learning applications. Formal understanding on what makes these networks so successful is gradually unfolding,...

arxiv.org

0

1

6

Yasaman Bahri

@yasamanbb

7 years

Quoting Charlie Kane: "...the intersection of something which is incredibly beautiful, and also comes to life in the real world."

nature

@Nature

7 years

Physics with a twist: Topology is providing a unique insight into the physics of materials

1

39

50

0

4

Yasaman Bahri

@yasamanbb

4 months

Among interesting findings I’d highlight two in particular. (i) We can categorize calculation steps into whether the results appears in the paper explicitly (since intermediate steps are often missing). We find the performance is fairly insensitive to this

1

0

5

Yasaman Bahri

@yasamanbb

3 years

@KordingLab I recall this paper has some nice analysis on the origins of forgetting and relationship between task overlap and overwriting of learned parameters in linear models. Re your question though: why should there be no forgetting in the NTK limit? ...

Anatomy of Catastrophic Forgetting: Hidden Representations and...

A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks....

arxiv.org

2

0

5

Yasaman Bahri

@yasamanbb

4 years

@roydanroy If you'll forgive the shameless promotion, in we analyze a minimal model which has some predictions for GD beyond-NTK dynamics.

The large learning rate phase of deep learning: the catapult mechanism

The choice of initial learning rate can have a profound effect on the performance of deep networks. We present a class of neural networks with solvable training dynamics, and confirm their...

arxiv.org

1

0

5

Yasaman Bahri

@yasamanbb

6 years

and so it's interesting to pinpoint the extent of applicability at finite width. This paper is related to the nice work "Neural Tangent Kernel," , which solved the dynamics of gradient descent in the same limit in function space. We inquire about the

Neural Tangent Kernel: Convergence and Generalization in Neural Networks

At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN...

arxiv.org

1

0

5

Yasaman Bahri

@yasamanbb

6 years

@Reza_Zadeh They’re also great for putting human behavior in perspective :) From many years of watching primate documentaries, my by-far favorite way of archetyping human behavior is to find the analogy in primates. "Oh, you're *that* chimp...!"

0

5

Yasaman Bahri

@yasamanbb

4 months

research papers, on materials that are currently being intensely studied. This extraction process (from research papers to executable prompts) is itself non-trivial — requiring combined human-AI expertise through the design of templates — since many papers lack key parts of

1

0

5

Yasaman Bahri

@yasamanbb

5 years

@deepmath1 Looking forward to this!

1

5

Yasaman Bahri

@yasamanbb

7 years

Excerpts from Bellman (who also coined "curse of dimensionality"), on the term "dynamic programming" and RAND corp:

0

3

Yasaman Bahri

@yasamanbb

6 years

dynamics in parameter space, which can be obtained through a linearization of the function at initialization. With @hoonkp , @Locchiu , @sschoenholz , @jaschasd , and Jeffrey Pennington.

0

4

Yasaman Bahri

@yasamanbb

7 years

@KyleCranmer @WonderMicky Do you know of anything similar for Condensed Matter Physics?

1

0

4

Yasaman Bahri

@yasamanbb

7 years

@lishali88 @AmplifyPartners @pabbeel Congrats Lisha on your first deal!

1

0

4

Yasaman Bahri

@yasamanbb

3 years

@jsusskin @hayou_soufiane @HanieSedghi @QuanquanGu @aminkarbasi Yes, that's right! It's been extended to June 6. Thanks for the interest. : )

0

4