milton @tensor_fusion Twitter profile

Pinned Tweet

milton

@tensor_fusion

4 months

gm gm. hope your weekends are going better than mine

30

169

2K

Last Seen Profiles

@Vojtaa_Petru

@lsaanz1

@EdWhelanEPPC

@lekasantoos

@syanovee

@1_is_undeniable

@parasitefacts

@AAlaboosh

@TheSolHawk

@pasurtihot

@WestGent2

@pasurtihot

@pencitaemanedut

@Renka_yuuki1919

@Gam_Universe

@willx_eth

@Dr_MoizZahid

@Nakazako

@stephkili

@bokeplokalmalam

@MFA_fi

@msbaerei

@stw_pdg

@kake_t0m0

@PamalaDalrympl2

@Senganal_Seetha

@chacrlie

@bokeplokalmalam

@stw_pdg

@riyupoino

@mcboostersfam

@vagabondjack

@tenkoo_sgf

@Bahri07_

@Brenda_Calserj

milton

@tensor_fusion

2 months

I don't mean to sound rude but "you're probably smart enough" is a lie that +130 IQ tech workers who haven't deeply interacted with a <100 IQ person tell people out of niceness bc they never struggled with not being able to solve an AP calculus problem for more than 3 hours.

anpaure

@anpaure

2 months

he's probably smart enough, this is just learned helplessness

138

74

3K

160

79

3K

milton

@tensor_fusion

2 months

Russell-Norvig section 2 is all you need.

Stanislav Fort

@stanislavfort

2 months

A great reddit analysis of the AlphaGeometry paper and how it uses very little "modern" AI for the majority of the geometry IMO problems it solves. Super interesting that a repeated application of a heuristic + tree search is so good!

5

37

451

10

192

2K

milton

@tensor_fusion

21 days

I don’t usually watch YT videos on AI/ML explainers (sans Karpathy/3blue1brown) but this has good production value. Nice overview of the Scaling laws (from Kaplan 2020 to more recent results). (bonus points bc it cites a criminally-not-so-known paper on intrinsic manifold dim).

8

177

2K

milton

@tensor_fusion

3 months

“RAND signed a contract with John von Neumann to produce a general theory of war, to be completed during a small slice of his time: that spent shaving. For his shaving thoughts, von Neumann received $200 a month” I first read that story in this book. Seriously, von Neumann was

Asterisk

@asteriskmgzn

3 months

RAND’s halcyon days lasted two decades, during which it produced some of the most influential developments in science and foreign policy. How did it become just another think tank? @PradyuPrasad and @jordanschnyc on when RAND made magic in Santa Monica

2

33

248

26

173

2K

milton

@tensor_fusion

3 months

For those who love LLMs ✍️

Physics In History

@PhysInHistory

3 months

For those who love calculus ✍️

329

4K

28K

20

141

2K

milton

@tensor_fusion

3 months

Friendly reminder that Noam Shazeer is extremely cracked and uncommonly obsessive and that’s why we got the transformer.

6

82

1K

milton

@tensor_fusion

4 months

twitter this week:

comma

@comma_ai

4 months

We heard you guys like lossless compression challenges. Today, we're launching the comma compression challenge, along with . $500 for the best compression ratio by July 1st. LZMA gets 1.6x; can you beat it?

31

47

761

9

62

1K

milton

@tensor_fusion

2 months

@justalexoki socialist dictatorship hope this helps

10

4

1K

milton

@tensor_fusion

3 months

@DamiBlancoo loko tenés menos bíceps que nora cortiñas levantá una pesa te lo pido por favor

11

18

1K

milton

@tensor_fusion

4 months

"A picture may be worth a thousand words, a formula is worth a thousand pictures" -Dijkstra I think a nice exercise is to try to code a transformer purely from this paper algos/formulas. Probably like 3 ppl will join but I might livestream it on X at some point during the week.

18

54

861

milton

@tensor_fusion

7 months

@AveAveces El like del presidente es inminente

2

3

702

milton

@tensor_fusion

2 months

@goth600 It’s a cosmic prison, holding back a horde of demons. Ancient and furious, their anger drives the storm. The hexagon is a lock designed to trap them. As eons pass, the storm grows more violent, and the prison weakens. One day they’ll be unleashed.

24

17

573

milton

@tensor_fusion

2 months

Reasoning, pattern recognition, short-term working memory, numerical problem-solving, processing speed, etc. all of these impact performance in high-complexity envs that fundamentally rely on cognitive ability.

2

13

563

milton

@tensor_fusion

3 months

I never found category theory useful, not even when doing Haskell type astronautics, so I'm a bit skeptic this can meaningfully help constrain/explore the design space of transformers and derive practical archs that scale but I liked the nice looking diagrams.

6

39

519

milton

@tensor_fusion

3 months

Aligning my chakra with the CUDA gods (basic transformer + mh attention kernel). Karpathy was right there's quite a gap between the book and the voodoo ppl write in the wild (cuBLAS ops, stochastic rounding, block hierarchy indexing trickery, etc.). Still so far I recommend it.

milton

@tensor_fusion

3 months

On my way to becoming CUDA cracked.

1

5

153

10

33

510

milton

@tensor_fusion

28 days

@fluxtheorist Who are all these people jfc how many fucking retards does it take to make a payment processor.

3

4

501

milton

@tensor_fusion

2 months

I changed my mind I like xAI now.

Hieu Pham

@hyhieu226

2 months

If you want to learn to manipulate tensors, just use PyTorch, Jax, or NumPy. Want efficiency? Learn CUDA with CUTLASS & CuTe. Want much more? All while working on cool projects with inspiring friends? Join @xai ! Of course we are hiring: .

16

127

686

4

8

483

milton

@tensor_fusion

1 month

This seems big for DDP training. Data transferred per step reduced from 74.4 GB to just 86.8 MB (857x) w/o impacting convergence rate and/or loss. Don't throw away your old GPUs yet. We'll unlock the shoggoth on a budget.

Nous Research

@NousResearch

1 month

What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of

233

595

3K

8

51

476

milton

@tensor_fusion

2 months

@justalexoki they do that's why their sons and daughters are rich

La lujosa vida de la hija mayor de Chávez: una fortuna de 3.200 millones y pasión por las marcas de...

La primogénita del expresidente es la mujer más rica de Venezuela.

www.libremercado.com

1

5

448

milton

@tensor_fusion

2 months

@alesitoide estoy firmemente convencido de que: > no leíste el proyecto > el razonamiento inductivo no es lo tuyo

3

2

370

milton

@tensor_fusion

2 months

@usdtermo loko cada vez que viralizas esta imagen me llega un mensaje diciendo "basado" como si fuera el único milton de la argentina jjajsd

16

3

366

milton

@tensor_fusion

4 months

@AlejandraTuk qué garrón no saber cuánto pesa la del mundo y que la mina más linda tenga bigotes

3

4

324

milton

@tensor_fusion

3 months

@realjuanruocco no irónicamente tenés razón en esta loko. los ricos van a tener acceso premium a médicos/tutores/etc humanos. los pobres, máquinas.

4

7

298

milton

@tensor_fusion

2 months

@justalexoki considering the avg salary for venezuelans is like $100 dollars a month, it's a lot

3

2

282

milton

@tensor_fusion

2 months

I picture myself telling von Neumann I can't solve two sum. How would the man react? Then, I move on.

6

3

269

milton

@tensor_fusion

4 months

@growing_daniel I don't think he's ugly actually just british

8

0

260

milton

@tensor_fusion

3 months

AI is entering its "can you hack into facebook" era. Told someone I study AI and they said "oh can you make a video of Milei fighting a giant rat?" How do I tell 'em that this is what I do all day.

3

15

263

milton

@tensor_fusion

4 months

@iamnotrya @CarlosMaslaton sjsjdjsd parece el que te espera en el juego para darte la misión

2

4

259

milton

@tensor_fusion

3 months

@cosmeluichito jajaj lpm me llamó la atención lo mismo, por qué todos los peronistas tienen bracitos de estrógeno?

milton

@tensor_fusion

3 months

@DamiBlancoo loko tenés menos bíceps que nora cortiñas levantá una pesa te lo pido por favor

11

18

1K

5

4

246

milton

@tensor_fusion

4 months

@FrancoAlva32 pov: tenés 6 meses de vida

1

0

228

milton

@tensor_fusion

3 months

Moreover, apparently he had an eidetic memory meaning he could perfectly recall books/articles by heart.

1

4

219

milton

@tensor_fusion

2 months

Which is not to say this person should give up ofc

1

0

217

milton

@tensor_fusion

4 months

@MichaelTrazzi you forgot he has great hair sir

7

1

186

milton

@tensor_fusion

8 months

@munman3p hagase cargo de los odiadores seriales que generan sus discursos ministro munman

0

174

milton

@tensor_fusion

3 months

@fluxtheorist It's great but in general I can't help feeling bittersweet when I read these kinds of books it's daunting to realize I'm dumb as fuck compared to this kind of human and the only thing left is to grind and consistently study/work hard to maybe achieve moderate competence at

8

4

170

milton

@tensor_fusion

3 months

Me presenting the results of my latest refactoring.

3

8

168

milton

@tensor_fusion

2 months

@gptcrosa los de web3 sólo se enteran cuando se cae google slides

3

161

milton

@tensor_fusion

1 month

[Clarification of this tweet bc of many misinterpretations] Distribution of learning ability not being uniform means skill acquisition efforts can vary significantly but those to the right of the curve might fail to appreciate the proportional required by others so the notion of

milton

@tensor_fusion

2 months

I don't mean to sound rude but "you're probably smart enough" is a lie that +130 IQ tech workers who haven't deeply interacted with a <100 IQ person tell people out of niceness bc they never struggled with not being able to solve an AP calculus problem for more than 3 hours.

160

79

3K

8

10

156

milton

@tensor_fusion

3 months

On my way to becoming CUDA cracked.

1

5

153

milton

@tensor_fusion

8 months

@munman3p Nunca termino de descifrar bien tu ideología munman sos este meme

2

0

144

milton

@tensor_fusion

5 months

me watching @sama do his usual trolling with these little drops instead of releasing GPT-5 already

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

5 months

It's back?! The saga continues! 👀 "im-a-good-gpt2-chatbot" is currently only available in battle mode at

7

24

165

7

9

145

milton

@tensor_fusion

3 months

@madorni sabés que te quedan 6 meses de vida cuando la receta del médico se ve así

2

4

141

milton

@tensor_fusion

4 months

hi folks coming from the @ludwigABAP tweet :) in this account I mostly: > talk about ML papers > post interesting AI/CS history tidbits > recommend books > shitpost ofc > post codez > post short explainers on ML/math concepts e.g.

milton

@tensor_fusion

6 months

The transformer self-attention mechanism can be low-rank approximated by adding linear projections for key and value matrices reducing their dimensions from n x d to k x d. Time and space complexities are reduced from quadratic (pairwise dot-products of input tokens) to linear.

3

9

73

1

8

137

milton

@tensor_fusion

4 months

you follow based on anime pfp I follow based on banner pic we are not the same

8

2

136

milton

@tensor_fusion

2 months

See this is why I'm bullish on energy-based models. Reframe attention as the gradient of some energy function and some statistical mechanics rigmarole later, efficiently scale decoding to longer contexts. And impl is a short JAX snippet <3

3

4

130

milton

@tensor_fusion

2 months

Wait so all these hours I spent solving Lagrangians for margin classifiers and proving convergence theorems were useless?

jack morris

@jxmnop

2 months

things you definitely do NOT need to understand to be an expert on LLMs: - linear regression - bias variance trade off - most probability distributions (Gaussian Bernoulli poisson etc.) - RNNs - LSTMs - CNNs - higher-order calculus (beyond first derivatives + chain rule) -

79

46

1K

9

1

128

milton

@tensor_fusion

2 years

These @alfcnz 's lecture slides are *colourful*!

1

30

129

milton

@tensor_fusion

4 months

Almost 2K likes? I have to assume tpot is really into: > JAX > ML papers > Accuracy plots > Junji Ito > All of the above combined

milton

@tensor_fusion

4 months

gm gm. hope your weekends are going better than mine

30

169

2K

9

1

125

milton

@tensor_fusion

2 years

@EmilioRaiden @agustin_pistone Salario digno de Porcel Jr intensifies

1

2

111

milton

@tensor_fusion

4 months

@petrogustavo

3

7

115

milton

@tensor_fusion

11 months

@hiperfalcon "no hay plata"

1

4

116

milton

@tensor_fusion

6 months

@francoisfleuret Yeah graphics in Fukushima papers were something else. Also notice he casually introduces the ReLU back in 1969. Based.

1

13

117

milton

@tensor_fusion

10 months

@tselden jajajs las fotos que pusiste son perfectas. era obvio que el tipo ya había perdido la negociación con totinho antes de que empiece

1

108

milton

@tensor_fusion

10 months

@ArrepentidosLLA NK asumió con superávit gemelos y previo: - Ajuste brutal de Duhalde con deval de 300%. - Default de deuda de R. Saa. Así y todo, 2 mandatos después, y teniendo las mejores condiciones externas posibles, se fueron con default, cepo, y déficit gemelo. Por favor, agarren un libro.

6

4

102

milton

@tensor_fusion

10 months

@zulemitamenem @JMilei me atrapaste es cine

0

1

104

milton

@tensor_fusion

11 months

@0xCroto @BetoMendeleiev_ Es un feriado que genera un fin de semana XL con fines turísticos en el medio de un ballotage, con claras intenciones de modificación estadísticamente significativa del voto. Y se escribe "por qué" en una oración interrogativa.

6

0

100

milton

@tensor_fusion

3 months

ludwig

@ludwigABAP

3 months

"It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration." "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a

11

9

334

4

100

milton

@tensor_fusion

11 months

@PregoneroL "Formularios sobre discriminación en los estadios de fútbol" sjajsj y este muñeco tuvo la caradurez de mandar gente a laburar más de una vez acá.

0

4

96

milton

@tensor_fusion

3 months

Shaping up to be a good day. (It's 5 °C outside I ain't getting out).

5

3

100

milton

@tensor_fusion

2 months

@radiomitre Por qué no lo hizo en 2019?

3

0

98

milton

@tensor_fusion

2 months

@cosmeluichito Un país entero con daddy issues qué desgracia.

1

99

milton

@tensor_fusion

11 months

@tomasrebord Sos yeta pelado, eh. No me tomaría un avión con vos al lado ni en pedo.

2

0

88

milton

@tensor_fusion

3 months

@elmerkst +10 fav a reco bloqueado y desbloqueado lince de las praderas

1

93

milton

@tensor_fusion

4 years

Alan Kay, Niklaus Wirth, Dijkstra, David Parnas, Fred Brooks, Michael Jackson, Tony Hoare, etc. What an amazing program and list of speakers. I don't know why I never came across this before but I'm definitely spending a few hours binge watching the videos.

7

27

91

milton

@tensor_fusion

2 months

@yacineMTB

3

2

91

milton

@tensor_fusion

4 months

@unhilo 3 rappis ya me cancelaron el pedido loko así no se puede seguir

0

91

milton

@tensor_fusion

1 year

@Solanopo Nunca tendrías que haber salido de la fotocopiadora de UBA sociales.

0

1

87

milton

@tensor_fusion

2 years

@drmichaellevin The ghost seeing the defrosted samples.

0

1

87

milton

@tensor_fusion

5 months

@brancowitz

1

5

81

milton

@tensor_fusion

1 year

@Bracesco2023 En muchos hilos forjó un teclado inmortal Con experiencia, sedienta ambición de baitear De cebollita soñaba con un tweet viral Y consagrarse en tendencias Tal vez baiteando pudiera A todo twitter domar 🎶

3

73

milton

@tensor_fusion

21 days

Wasn’t lying about the production value. Go praise this dude.

0

6

76

milton

@tensor_fusion

9 months

@miniapeur "The origins of the simplex method go back to one of two famous unsolved problems in mathematical statistics [...] which I mistakenly solved as a homework problem". Based.

1

5

71

milton

@tensor_fusion

1 month

This is also why ML illiterate doomer-coded muh compute regulations discourse is fundamentally unserious.

5

2

74

milton

@tensor_fusion

6 months

The transformer self-attention mechanism can be low-rank approximated by adding linear projections for key and value matrices reducing their dimensions from n x d to k x d. Time and space complexities are reduced from quadratic (pairwise dot-products of input tokens) to linear.

3

9

73

milton

@tensor_fusion

4 months

@fernandezpablo La mejor señal de burbuja son los CEOs/CTOs de unicornios argentinos en Silicon Valley que hasta hace 2 años tenían el mono NFT de foto de perfil y ahora tienen en bio "democratizing AI", "AI education" o fantasmeadas así.

2

1

72

milton

@tensor_fusion

4 months

@ndvoskin lo que no entiendo leyendo tus comentarios es cómo fue posible que te dieran el título de economista

4

1

70

milton

@tensor_fusion

27 days

@_Mira___Mira_ This whole thing reeks of something sus, but I’m ready to be pleasantly surprised. I hope it’s not “overfitting to GSM8k for fun and profit”.

0

68

milton

@tensor_fusion

2 months

@justalexoki nothing has a literal 0% chance of happening if you're a bayesian

7

0

66

milton

@tensor_fusion

3 months

@SwannMarcus89 he's having a baby now? wow so happy for this fella. he made it

2

0

65

milton

@tensor_fusion

3 months

one more trip around the sun mood

11

2

63

milton

@tensor_fusion

4 months

@miniapeur Mikhail Belkin, Andrea Montanari (actually physicist), Michael Jordan, John Duchi. They usually publish in pure prob/stats/opt journals.

2

0

64

milton

@tensor_fusion

8 months

@Trumperizar pov: sos un avocado toast con nutella a 13000 pesos en un café de palermo soho

1

61

milton

@tensor_fusion

2 months

@fernandezpablo alberto despidiéndose después de contar que cristina mató a néstor y a nisman y destruir el kirchnerismo para siempre

0

5

61

milton

@tensor_fusion

7 months

@GordoDan_

0

9

59

milton

@tensor_fusion

2 months

@madorni De acuerdo con la noticia de la que me voy a enterar el Lunes. Abrazo.

0

2

59

milton

@tensor_fusion

2 months

Re: Llama-3: Are we gonna get a Sonnet 3.5 level SOTA CodeGen at Groq inference costs without rate limits + smol local distilled models this week? If yes, I bow.

4

2

60

milton

@tensor_fusion

2 months

@muhammeddev1337 Not what I said at all. Broader point is it wouldn't hurt to be aware of lottery of life limitations (IQ is in part genetic) bc that can probably help you devise better learning/studying strategies, etc.

milton

@tensor_fusion

2 months

Which is not to say this person should give up ofc

1

0

217

4

1

59

milton

@tensor_fusion

5 months

@superavitfiscal ATENTOS ‼️ ∆ M= iPR + CR - [SP + %ROx(k+i)] CAPUTO MASTERCLASS. Fin.

0

57

milton

@tensor_fusion

1 month

1

56

milton

@tensor_fusion

4 months

@anpaure man remember when you could solve a leetcode medium design a pub-sub and get a 150k offer + equity

2

0

54

milton

@tensor_fusion

4 months

I can’t get over how elegantly autodiff can be implemented in Haskell with dual numbers via typeclasses. If you want to see how to build a tiny autodiff lib + a working feedforward net on top of it, all in pure Haskell, from scratch, I just released this:

4

3

54

milton

@tensor_fusion

3 months

> "We propose a novel dReLU function" * checks * It's ReLU applied twice. Still, yet another nice LLM inference opt paper from China, and promising results on sparsification. > 90% sparsity in each Mistral-7B FFN > 97% overall sparsity in Mixtral-47B (85% in each FFN)

5

1

52

milton

@tensor_fusion

3 months

I believe an efficient way to get comfortable with a PL is to read the standard library code. The problem is that the Agda std lib looks like some Kardashev type 2 shit they found on the screens of the alien spaceship at Roswell.

2

0

53

milton

@tensor_fusion

12 days

Nucleus☕️

@EsotericCofe

13 days

what do people actually mean when they “learn ML”? are we talking about training mnist classifiers or proving chebyshev’s inequality?

95

29

1K

2

50