Greg Yang Profile
Greg Yang

@TheGregYang

58,024
Followers
686
Following
334
Media
1,431
Statuses

Cofounder . Morgan Prize Honorable Mention 2018. Developing the theory of #TensorPrograms and the practice of scaling #neuralnetworks .

Joined February 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@TheGregYang
Greg Yang
1 year
Finally launched ! The mathematics of deep learning is profound, beautiful, and unreasonably effective. Developing the "theory of everything" for large neural networks will be central to taking AI to the next level. Conversely, this AI will enable everyone
441
903
7K
@TheGregYang
Greg Yang
1 year
Since folks are asking: The books I mentioned on @xai spaces are "Linear Algebra Done Right" by Axler and "Naive Set Theory" by Halmos. Other math books that I really enjoyed over the years: "Introduction to Algorithms" by Thomas H. Cormen & Charles E. Leiserson & Ronald L.
257
871
6K
@TheGregYang
Greg Yang
11 months
Grok LFG🚀🚀🚀 Last few weeks been some of the best time of my life, fr fr When a small, motivated group of world class people all push in the same direction, they punch way above their weight. I really did not appreciate this enough a year ago, but now
194
331
4K
@TheGregYang
Greg Yang
11 months
Ball so hard...
Tweet media one
223
180
3K
@TheGregYang
Greg Yang
1 year
You asked for it...a dump of my book collection, in rough chronological order (1/2) "Naive Set Theory" - Paul R Halmos "Linear Algebra Done Right Second Edition" - Sheldon Axler "Mixing Secrets for the Small Studio" - Mike Senior "Introduction to Algorithms, Third Edition" -
213
443
4K
@TheGregYang
Greg Yang
1 year
Nontrivial ∞width neural nets are either kernel machines or feature learners. Latter's scaling makes optimal hyperparams invariant to width What if depth→∞as well? 🆕 Feature diversity is key; maxed out by abs (not relu); gives invariance to depth! But GPT flawed 🧵
161
337
2K
@TheGregYang
Greg Yang
5 days
Tweet media one
46
159
2K
@TheGregYang
Greg Yang
10 months
Tweet media one
15
24
2K
@TheGregYang
Greg Yang
10 months
Tweet media one
138
59
1K
@TheGregYang
Greg Yang
11 months
Hands down best chat UI I've ever used @TobyPhln supaman'd this shii
@TobyPhln
Toby Pohlen
11 months
These are some of the UI features in Grok. First, it allows you to multi-task. You can run several concurrent conversations and switch between them as they progress.
143
349
2K
40
159
757
@TheGregYang
Greg Yang
3 years
1/ You can't train GPT-3 on a single GPU, much less tune its hyperparameters (HPs). But what if I tell you… …you *can* tune its HPs on a single GPU thanks to new theoretical advances? paper code blog
Tweet media one
19
281
2K
@TheGregYang
Greg Yang
7 months
Grok belongs to the people
176
87
1K
@TheGregYang
Greg Yang
4 years
1/ The histogram of eigenvals in a large random symmetric matrix ≈ a semicircle!! So sick! This "Semicircle Law" is essentially "Central Limit" for rand symmetric mats (even more elegant bc u knew what a semicircle is by 1st grade, but wtf was a Gaussian?). Let me tell ya why
17
236
1K
@TheGregYang
Greg Yang
27 days
Hunnids, hunnids Throwin' hunnids, hunnids Hunnids, hunnids Rack city bitch Rack rack city bitch
@elonmusk
Elon Musk
28 days
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent
5K
8K
75K
83
81
1K
@TheGregYang
Greg Yang
1 year
So my trick for reading and grokking all the foundational textbooks intently is... Anki flash cards ...ie spaced repetition. Works really well for knowledge you know you will need in the future
77
80
1K
@TheGregYang
Greg Yang
4 days
I took a leave from college (aside from DJing) just to crawl libgen and read textbooks cover to cover and making anki flash cards to retain those knowledge. Absolutely one of the best periods of my life because you can feel the rapid self improvement. Taking classes in school in
@TheGregYang
Greg Yang
1 year
You asked for it...a dump of my book collection, in rough chronological order (1/2) "Naive Set Theory" - Paul R Halmos "Linear Algebra Done Right Second Edition" - Sheldon Axler "Mixing Secrets for the Small Studio" - Mike Senior "Introduction to Algorithms, Third Edition" -
213
443
4K
110
144
3K
@TheGregYang
Greg Yang
11 months
1/ μP is optimal scaling rule of learning rate & init as network width → ∞. Been confused? 🆕μP = holding the "natural" (I'll explain) operator norm constant for every weight W & its updates ΔW: μP <=> ‖W‖_nat = Θ(1) = ‖ΔW‖_nat. 🆕Frobenius norm is the wrong norm to measure!
Tweet media one
79
163
848
@TheGregYang
Greg Yang
5 days
> be me debugging > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "fuck" > "oooooh"
@jacob_pfau
Jacob Pfau
5 months
Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵
Tweet media one
55
222
1K
58
81
1K
@TheGregYang
Greg Yang
5 years
1/ Why do wide, random neural networks form Gaussian processes, *regardless of architecture*? Let me give an overview in case you are too lazy to check out the paper or the code . The proof has two parts…
Tweet media one
10
259
1K
@TheGregYang
Greg Yang
10 months
Nah u gotta admit @X bussin
62
46
855
@TheGregYang
Greg Yang
11 months
Got me feelin' like deja vu
30
72
689
@TheGregYang
Greg Yang
11 days
a single model should have real world + real time understanding, universal search, and user understanding all at the same time
59
59
989
@TheGregYang
Greg Yang
2 months
our lil susboi all grown up n shit
69
72
898
@TheGregYang
Greg Yang
4 years
Training a neural network (NN) can suffer from bad local minima. But as the NN gets wider, its optimization landscape in *function space* converges & becomes convex; when width=∞, this convex landscape is described by Neural Tangent Kernel.
8
160
913
@TheGregYang
Greg Yang
10 months
Yessiiiiir
@elonmusk
Elon Musk
10 months
@WholeMarsBlog xAI is burning the 4am oil LFG!!!
262
260
3K
31
23
882
@TheGregYang
Greg Yang
1 year
I have about 400+ books in my collection from which these come from. I can dump them here if people are really interested.
139
22
862
@TheGregYang
Greg Yang
11 months
So now @sama really has to use @MicrosoftTeams
28
31
768
@TheGregYang
Greg Yang
11 months
Announcing the formation of @G (reg)PT, a new company to create the best AG(reg)I, by me @gdb and @greg16676935420
45
26
770
@TheGregYang
Greg Yang
4 months
calling all hackers with ambition burning in your heart: hop on the fkn rocketship 🚀
@xai
xAI
4 months
xAI is pleased to announce..
1K
2K
10K
89
71
630
@TheGregYang
Greg Yang
10 months
Human knowledge is at most a negligible fraction of what's out there. Sooo much left to learn and discover. What a beautiful world 💕💐
47
48
623
@TheGregYang
Greg Yang
6 months
Looking for top engineers and designers passionate about harnessing our AI capabilities to create never-before-seen consumer products. 🛼 come roll w us!
@xai
xAI
6 months
👀
731
1K
7K
21
117
484
@TheGregYang
Greg Yang
11 months
Summa yall b straight cappin
51
39
718
@TheGregYang
Greg Yang
10 months
TIL @AIatMeta is just an @OpenAI wrapper
Tweet media one
73
33
609
@TheGregYang
Greg Yang
7 months
💀💀💀
@grok
Grok
7 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░
2K
2K
16K
41
31
593
@TheGregYang
Greg Yang
3 years
Serious mathematics underlies the feature learning limit of wide neural networks, which made it possible to tune large models by tuning small ones. I'll be explaining this on Wednesday on . Sign up here!
8
103
661
@TheGregYang
Greg Yang
6 months
You seeing this sh—
@xai
xAI
6 months
👀
731
1K
7K
36
41
504
@TheGregYang
Greg Yang
1 year
(2/2) "Additive Combinatorics (Cambridge Studies in Advanced Mathematics)" - Terence Tao "Lie Groups: An Approach Through Invariants and Representations (Universitext)" - Claudio Procesi "Algebraic Geometry in Coding Theory and Cryptography" - HARALD NIEDERREITER & CHAOPING XING
54
79
694
@TheGregYang
Greg Yang
6 months
Unhinging in progress
@xai
xAI
6 months
772
1K
7K
32
45
464
@TheGregYang
Greg Yang
4 years
1/ Crazy exp: take Resnet embedding of Imagenet as dataset A. Train linear predictor on A; get accuracy p. Now make fake dataset B = a mixture of Gaussians w/ same class mean & covariance as A. Train linear predictor on B => get *SAME ACCURACY* p. WTF
Tweet media one
8
119
576
@TheGregYang
Greg Yang
4 years
So essentially ML researcher=rapper🤣: Paper=single Book=album Arxiv=SoundCloud Elsevier=Spotify Blogpost=music video universities/industrial labs=labels Conference=music festival Plenary speaker=headliner ICML=coachella Neurips=lollapalooza ICLR=rolling loud ... What else?🤣🤣
25
43
533
@TheGregYang
Greg Yang
4 years
1/ You can't comb a ball of hair without having some hair sticking up -- this is known as the "Hairy Ball Theorem" (no joke). Since wind on earth is like (the projection of) hair on a ball, this theorem implies that there is always a place with no wind! Let me tell ya why ↓
5
133
517
@TheGregYang
Greg Yang
2 months
our lil susboi 💜
@xai
xAI
2 months
2K
2K
9K
21
43
492
@TheGregYang
Greg Yang
2 years
I'm looking for a phd intern that will work with me on the theory of infinite size neural networks beyond width and applications to hyperparameter transfer and design of large scale neural networks. Email me at gregyang at Microsoft dot com with your CV and a blurb about yourself
21
79
488
@TheGregYang
Greg Yang
4 days
My mom definitely thought I was retarded lol to skip school to school myself
@TheGregYang
Greg Yang
4 days
I took a leave from college (aside from DJing) just to crawl libgen and read textbooks cover to cover and making anki flash cards to retain those knowledge. Absolutely one of the best periods of my life because you can feel the rapid self improvement. Taking classes in school in
110
144
3K
41
27
925
@TheGregYang
Greg Yang
10 months
📟 W△KE △P F1LTH¥
56
25
282
@TheGregYang
Greg Yang
1 year
Our world is built on hacks --- and we should celebrate it. This is me speaking as a theorist.
25
31
460
@TheGregYang
Greg Yang
22 days
@emollick A single intelligence that has a real time pulse on what is happening and what could happen in the future. Many ways to sell this but it easily is a massive use case for biz. E.g. Bloomberg already resells X data at a big price tag and Grok will be this but on an entirely
17
16
464
@TheGregYang
Greg Yang
5 years
Neural networks tend to Gaussian processes (GPs) as their widths tend to infinity --- now you can play with these GP kernels in @GoogleColab ! Try out RNN-GP, GRU-GP, Transformer-GP, or Batchnorm-GP today! Repo: Colab Entry Point:
0
97
451
@TheGregYang
Greg Yang
9 months
It's been an honor working with the elite troopers at @X !
@rpoo
Ross
9 months
this team ships if you love the product, understand the vision, and want the challenge of life time- should really consider working at X some high impact positions * client eng - ios / android / web * infra eng - k8s / supercomputing / large distributed systems / network /
1K
73
694
70
10
290
@TheGregYang
Greg Yang
1 month
uv is very very fast. Super nice for when I ssh into a pod and just want to quickly install all dependencies and run some python script good job @charliermarsh et al
18
52
404
@TheGregYang
Greg Yang
4 years
1/ Existing theories of neural networks (NN) like NTK don't learn features so can't explain success of pretraining (e.g. BERT, GPT3). We derive the *feature learning* ∞-width limit of NNs & pretrained such an ∞-width word2vec model: it learned semantics!
Tweet media one
4
57
384
@TheGregYang
Greg Yang
5 years
1/ I can't teach you how to dougie but I can teach you how to compute the Gaussian Process corresponding to infinite-width neural network of ANY architecture, feedforward or recurrent, eg: resnet, GRU, transformers, etc ... RT plz💪
Tweet media one
4
108
373
@TheGregYang
Greg Yang
10 months
162
52
153
@TheGregYang
Greg Yang
6 months
Talagrand has an aptitude for distilling complex insights into clear and tasty bites in his research and exposition. I've benefited immensely from applying his most famous inequality but even more so from his books, which are written with wit and lucidity matched by none: Upper
@QuantaMagazine
Quanta Magazine
6 months
Michel Talagrand has been awarded the Abel Prize, one of the highest honors in mathematics, for applying tools from high-dimensional geometry to complex probability problems. @jordanacep reports:
Tweet media one
27
396
2K
32
29
215
@TheGregYang
Greg Yang
5 days
this tweet is so goated you have no idea
@ekzhang1
Eric Zhang
6 months
update on the 0.02% bug: it was because nginx, by default drops connections every 10,000 HTTP requests. why can't computers just work!!
23
8
641
11
7
377
@TheGregYang
Greg Yang
11 months
FE!N
24
21
322
@TheGregYang
Greg Yang
11 months
This is THE Linear Algebra book! Open access! Oh my 🤘🤘🎸🚀
@AxlerLinear
Sheldon Axler
11 months
I am delighted to announce publication of the 4th edition of Linear Algebra Done Right as an Open Access book. The electronic version is legally free to the world at . That website also has links to pre-order the print version of the book. #linearalgebra
Tweet media one
126
1K
6K
9
51
325
@TheGregYang
Greg Yang
5 years
1/ Does batchnorm make optimization landscape more smooth? says yes, but our new @iclr2019 paper shows BN causes grad explosion in randomly initialized deep BN net. Contradiction? We clarify below
Tweet media one
1
79
316
@TheGregYang
Greg Yang
1 year
1/ How to scale hyperparams (eg learning rate) as neural network gets wider? Esp w/ adaptive optimizers like Adam? I derived the answer (μP) in 2020 & verified it on GPT3 This required some beautiful new math that’s just been completely written down w/ @EtaiLittwin 🧵👇
Tweet media one
11
62
304
@TheGregYang
Greg Yang
4 years
We deepen a mysterious connection btw #topology & #learning in new paper appearing in Advances in Applied Mathematics Somehow # of samples needed to learn = the highest dimension of holes in some topological space! I wrote the paper but I'm still like WTF
Tweet media one
6
50
309
@TheGregYang
Greg Yang
4 years
1/ Gradients improve weights, so they better depend on the weights, right? Somehow, for calculating e.g. grad norm or NTK at init, grads might as well be backproped by random weights, independent from those used in forward pass. WTF? Let me explain (from )
Tweet media one
5
59
304
@TheGregYang
Greg Yang
9 months
the secret prime benefit they don't want you to know
Tweet media one
71
12
169
@TheGregYang
Greg Yang
1 year
Really cool library giving a type system to different kinds of matrices to speed up linear algebra! I really resonate with this because a key idea of Tensor Programs is that different (random) matrices have entirely different "types" (like random init vs gradients); if you track
@andrewgwils
Andrew Gordon Wilson
1 year
We're ecstatic to officially announce our new library, CoLA! CoLA is a framework for large-scale linear algebra in machine learning and beyond, supporting PyTorch and JAX. repo: paper: w/ amazing @m_finzi , Andres Potap, Geoff Pleiss
Tweet media one
9
152
701
7
31
284
@TheGregYang
Greg Yang
6 years
1/8 Modern deep networks (with conv, (self-)attention, batchnorm, LSTM, etc) become Gaussian Processes when randomly initialized, as their widths grow to infinity. This and more are shown in my new paper . SOTA GPs here we come, @Jasch ?
4
74
283
@TheGregYang
Greg Yang
4 years
1/ The nonzero singular values histogram of a large square random matrix looks like a "quarter circle", sticking to the y-axis. However, if the sides are not equal, then the histogram "buds off" from the y-axis. In any case, we still can calculate the asymptotic shape of it!
1
52
282
@TheGregYang
Greg Yang
10 days
12 hr naps yum
22
19
285
@TheGregYang
Greg Yang
1 year
Will chat about infinitely deep neural networks on Friday. Come hang out and ask questions about our new paper!
@TheGregYang
Greg Yang
1 year
Nontrivial ∞width neural nets are either kernel machines or feature learners. Latter's scaling makes optimal hyperparams invariant to width What if depth→∞as well? 🆕 Feature diversity is key; maxed out by abs (not relu); gives invariance to depth! But GPT flawed 🧵
161
337
2K
11
40
260
@TheGregYang
Greg Yang
6 days
40 hr hacker high
@TheGregYang
Greg Yang
10 days
12 hr naps yum
22
19
285
15
10
287
@TheGregYang
Greg Yang
11 months
instead of QED, ima end my proofs with FE!N from now on
28
16
255
@TheGregYang
Greg Yang
4 years
Tune in next Wednesday at Physics ∩ ML to hear @cosmo_shirley talk about their recent breakthrough work "Discovering Symbolic Models in Physical Systems using Deep Learning" with @MilesCranmer @KyleCranmer @DavidSpergel @PeterWBattaglia et al!
Tweet media one
5
64
261
@TheGregYang
Greg Yang
3 years
You can now train your own feature learning infinite-width neural networks on word2vec and metalearning (w/ MAML) ! Our paper "Feature Learning in Infinite-Width Neural Networks" will also appear in ICML 2021. Cya there! @edwardjhu
Tweet media one
0
48
260
@TheGregYang
Greg Yang
1 year
My cliff notes vers of holography <-> quantum error correction: Holography says information in interior of universe can be recovered from info on its boundary, just like a message can be recovered from its encoding by error correcting code. Actual error correction comes from
Tweet media one
26
32
261
@TheGregYang
Greg Yang
2 years
Messiii!!!!!
Tweet media one
7
2
257
@TheGregYang
Greg Yang
4 years
1/4 WTF guys I think I broke ML: loss & acc 🡅 together! reproduced here . Somehow good accuracy is achieved *in spite of* classic generalizn theory (wrt the loss) - What's goin on? @roydanroy @prfsanjeevarora @ShamKakade6 @BachFrancis @SebastienBubeck
Tweet media one
18
43
256
@TheGregYang
Greg Yang
4 years
1/ It's exciting when an "applied" area feeds back to pure math. e.g. Witten's new proof of Positive Energy Thm by physics won him a Fields Medal. A reason I'm rly hyped about Tensor Programs: new proof of Semicircle Law by "neural network arguments"
2
54
255
@TheGregYang
Greg Yang
10 months
Come throw down some eigenvalues in 30 min!
@TheGregYang
Greg Yang
10 months
With @jxbz & Jamie Simon, we will chat about our recent work on a spectral understanding of feature learning. See ya in a few days!
4
13
127
16
5
68
@TheGregYang
Greg Yang
1 year
GUYS DID YOU KNOW THE RED WEDDING OF #GoT HAPPENED IN FRANCE IN 1572? The Catholic queen mother forced her daughter to marry a protestant prince, invited all the protestant nobilities to the wedding at the predominantly catholic Paris, then bodied them all. Aka St
Tweet media one
20
12
235
@TheGregYang
Greg Yang
5 years
RNNs and batchnorm will be coming soon, but you can already play with them here The general theory for this is based on tensor programs Give Neural Tangents a try and let us know what you think!
@GoogleAI
Google AI
5 years
Announcing Neural Tangents, a new easy-to-use, open-source neural network library that enables researchers to build finite- and infinite-width versions of neural networks simultaneously. Grab the code and try it for yourself at
13
624
2K
1
51
241
@TheGregYang
Greg Yang
1 year
Who needs grad school when you have the whole library genesis at your finger tips?
@ftlsid
ftlsid
1 year
mood
Tweet media one
3
12
233
21
16
231
@TheGregYang
Greg Yang
4 years
1/ In a neural network, activation vectors depend on the weight matrices in really complex, nonlinear ways. New paper : the activations are "independent" from the weights in a randomly initialized wide NN of any architecture! WTF!!
2
38
238
@TheGregYang
Greg Yang
1 year
Our society is built on trust: * human-human trust * human-organization trust * human-machine trust It would run so slowly without trust. It takes years to build but an instant to destroy. We really need to treasure it!
Tweet media one
30
27
230
@TheGregYang
Greg Yang
1 year
What an incredible journey it's been over these 5+ yrs @MSFTResearch . I still remember the eureka moments, in the serenity of building 99 past midnight, leading to Tensor Programs & μP. Forever grateful for MSR taking a chance on a kid straight out of undergrad.
4
4
227
@TheGregYang
Greg Yang
4 years
1/ Neural network (NN) parametrization is super important folks!! The wrong param -- e.g. NTK, or, in fact, the pytorch/tensorflow defaults -- can make you diverge or prevent you from learning features in wide NNs!
Tweet media one
3
42
228
@TheGregYang
Greg Yang
4 years
1/ I reveal the evolution under gradient descent of neural network of *any architecture*, by showing how to compute its tangent kernel (NTK). This includes RNN, transformer, resnet, GANs, Faster RCNN, and more! Let's have theory catch up to practice!
Tweet media one
3
52
226
@TheGregYang
Greg Yang
5 years
1/2 How can physics and ML inform each other? We hope to find out at Physics ∩ ML workshop @MSFTResearch commencing tomorrow! Feat. awesome folks like Fields medalist Mike Freedman, Rumelhart prize winner Paul Smolensky, Sackler prize winner Mike Douglas
Tweet media one
4
60
225
@TheGregYang
Greg Yang
1 year
Pretty informative note on the connection between Ads/CFT (aka gauge/gravity duality) and quantum error correcting codes
Tweet media one
19
24
217
@TheGregYang
Greg Yang
10 months
Homixide homixide homixide
14
9
112
@TheGregYang
Greg Yang
1 year
@skominers I only read electronic books and they are all in calibre.
8
5
245
@TheGregYang
Greg Yang
11 months
🤣🤣🤣
@blue_but_dark
Blue
11 months
@TheGregYang omg I can’t believe what just happened 🤣🤣 I tried it on bing bing deleted the answer it was writing I blame Grok😂😂 @jenny____ai @elonmusk
14
3
157
12
13
206
@TheGregYang
Greg Yang
5 years
1/ Neural networks evolve like linear models just because 1st order taylor expansion -- key intuition behind #NeuralTangentKernels . What's nontrivial & surprising: a *wide* NN can fit *any* data without moving params too much as to break the approximation of the taylor expansion.
Tweet media one
1
46
210
@TheGregYang
Greg Yang
4 years
Infinitely-wide recurrent networks (i.e. RNN Neural Tangent Kernel) are good at time series prediction with low data, whodvethought! Such calculation with infinite-width RNN wouldn't have been possible without Tensor Programs!
Tweet media one
0
43
205
@TheGregYang
Greg Yang
4 years
1/ A ∞-wide NN of *any architecture* is a Gaussian process (GP) at init. The NN in fact evolves linearly in function space under SGD, so is a GP at *any time* during training. With Tensor Programs, we can calculate this time-evolving GP w/o trainin any NN
1
29
201
@TheGregYang
Greg Yang
1 year
"Hi Greg, im a 15 years boy in love with AI" - @pablocpz_ai Keep learning, stay motivated, and you'll be one of the greats one day!
Tweet media one
11
4
199
@TheGregYang
Greg Yang
4 years
How do infinitely wide neural networks learn features? Come hear me this Monday at @MIT_CSAIL , hosted by @aleks_madry Hope to see yall there!
Tweet media one
2
29
202
@TheGregYang
Greg Yang
4 days
@ibab This was my pretraining phase
6
2
306
@TheGregYang
Greg Yang
6 years
1/4 Batchnorm causes grad explosion in random-init MLP! Can’t fix this by changing nonlinearities! Relu+batchnorm explodes grad norm^2 by >=1.47 per layer, but linear activation minimizes the explosion rate at (B-2)/(B-3), B=batchsize. Our ICLR 2019 paper
Tweet media one
6
55
190
@TheGregYang
Greg Yang
5 years
Learnability (VC dim) is a *topological property*, as I proved in for parity, conjunctions, poly threshold fctns. Now this extends to downward-closed classes, conjunction of parities, and k-CNFs, as well! Just how far does this go?
Tweet media one
2
42
194
@TheGregYang
Greg Yang
5 years
Often, VC dimension of a concept class (“how many samples needed to learn a pattern?”) in #learning theory can be recovered from the * #algebraic #topology * of the class (“What are the holes in this topological space?”). Beautiful and mysterious phenomenon!
1
43
190
@TheGregYang
Greg Yang
5 years
1/ Neural networks are Gaussian Processes --- the Poster Edition from #NeurIPS2019 last week. In case you missed it, here’s a twitter version of the poster presentation, following the format of @colinraffel ; and here’s the previous tweet thread
Tweet media one
@TheGregYang
Greg Yang
5 years
1/ Why do wide, random neural networks form Gaussian processes, *regardless of architecture*? Let me give an overview in case you are too lazy to check out the paper or the code . The proof has two parts…
Tweet media one
10
259
1K
1
57
187