milton Profile Banner
milton Profile
milton

@tensor_fusion

3,437
Followers
1,901
Following
619
Media
2,996
Statuses

AI skunk works

High-dimensional latent space
Joined January 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@tensor_fusion
milton
4 months
gm gm. hope your weekends are going better than mine
Tweet media one
30
169
2K
@tensor_fusion
milton
2 months
I don't mean to sound rude but "you're probably smart enough" is a lie that +130 IQ tech workers who haven't deeply interacted with a <100 IQ person tell people out of niceness bc they never struggled with not being able to solve an AP calculus problem for more than 3 hours.
@anpaure
anpaure
2 months
he's probably smart enough, this is just learned helplessness
Tweet media one
138
74
3K
160
79
3K
@tensor_fusion
milton
2 months
Russell-Norvig section 2 is all you need.
Tweet media one
@stanislavfort
Stanislav Fort
2 months
A great reddit analysis of the AlphaGeometry paper and how it uses very little "modern" AI for the majority of the geometry IMO problems it solves. Super interesting that a repeated application of a heuristic + tree search is so good!
5
37
451
10
192
2K
@tensor_fusion
milton
21 days
I don’t usually watch YT videos on AI/ML explainers (sans Karpathy/3blue1brown) but this has good production value. Nice overview of the Scaling laws (from Kaplan 2020 to more recent results). (bonus points bc it cites a criminally-not-so-known paper on intrinsic manifold dim).
Tweet media one
8
177
2K
@tensor_fusion
milton
3 months
“RAND signed a contract with John von Neumann to produce a general theory of war, to be completed during a small slice of his time: that spent shaving. For his shaving thoughts, von Neumann received $200 a month” I first read that story in this book. Seriously, von Neumann was
Tweet media one
@asteriskmgzn
Asterisk
3 months
RAND’s halcyon days lasted two decades, during which it produced some of the most influential developments in science and foreign policy. How did it become just another think tank? @PradyuPrasad and @jordanschnyc on when RAND made magic in Santa Monica
2
33
248
26
173
2K
@tensor_fusion
milton
3 months
For those who love LLMs ✍️
Tweet media one
@PhysInHistory
Physics In History
3 months
For those who love calculus ✍️
Tweet media one
329
4K
28K
20
141
2K
@tensor_fusion
milton
3 months
Friendly reminder that Noam Shazeer is extremely cracked and uncommonly obsessive and that’s why we got the transformer.
6
82
1K
@tensor_fusion
milton
4 months
twitter this week:
Tweet media one
@comma_ai
comma
4 months
We heard you guys like lossless compression challenges. Today, we're launching the comma compression challenge, along with . $500 for the best compression ratio by July 1st. LZMA gets 1.6x; can you beat it?
31
47
761
9
62
1K
@tensor_fusion
milton
2 months
@justalexoki socialist dictatorship hope this helps
10
4
1K
@tensor_fusion
milton
3 months
@DamiBlancoo loko tenés menos bíceps que nora cortiñas levantá una pesa te lo pido por favor
11
18
1K
@tensor_fusion
milton
4 months
"A picture may be worth a thousand words, a formula is worth a thousand pictures" -Dijkstra I think a nice exercise is to try to code a transformer purely from this paper algos/formulas. Probably like 3 ppl will join but I might livestream it on X at some point during the week.
Tweet media one
18
54
861
@tensor_fusion
milton
7 months
@AveAveces El like del presidente es inminente
2
3
702
@tensor_fusion
milton
2 months
@goth600 It’s a cosmic prison, holding back a horde of demons. Ancient and furious, their anger drives the storm. The hexagon is a lock designed to trap them. As eons pass, the storm grows more violent, and the prison weakens. One day they’ll be unleashed.
Tweet media one
24
17
573
@tensor_fusion
milton
2 months
Reasoning, pattern recognition, short-term working memory, numerical problem-solving, processing speed, etc. all of these impact performance in high-complexity envs that fundamentally rely on cognitive ability.
2
13
563
@tensor_fusion
milton
3 months
I never found category theory useful, not even when doing Haskell type astronautics, so I'm a bit skeptic this can meaningfully help constrain/explore the design space of transformers and derive practical archs that scale but I liked the nice looking diagrams.
Tweet media one
6
39
519
@tensor_fusion
milton
3 months
Aligning my chakra with the CUDA gods (basic transformer + mh attention kernel). Karpathy was right there's quite a gap between the book and the voodoo ppl write in the wild (cuBLAS ops, stochastic rounding, block hierarchy indexing trickery, etc.). Still so far I recommend it.
Tweet media one
@tensor_fusion
milton
3 months
On my way to becoming CUDA cracked.
Tweet media one
1
5
153
10
33
510
@tensor_fusion
milton
28 days
@fluxtheorist Who are all these people jfc how many fucking retards does it take to make a payment processor.
3
4
501
@tensor_fusion
milton
2 months
I changed my mind I like xAI now.
Tweet media one
@hyhieu226
Hieu Pham
2 months
If you want to learn to manipulate tensors, just use PyTorch, Jax, or NumPy. Want efficiency? Learn CUDA with CUTLASS & CuTe. Want much more? All while working on cool projects with inspiring friends? Join @xai ! Of course we are hiring: .
16
127
686
4
8
483
@tensor_fusion
milton
1 month
This seems big for DDP training. Data transferred per step reduced from 74.4 GB to just 86.8 MB (857x) w/o impacting convergence rate and/or loss. Don't throw away your old GPUs yet. We'll unlock the shoggoth on a budget.
Tweet media one
@NousResearch
Nous Research
1 month
What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
Tweet media one
233
595
3K
8
51
476
@tensor_fusion
milton
2 months
@alesitoide estoy firmemente convencido de que: > no leíste el proyecto > el razonamiento inductivo no es lo tuyo
3
2
370
@tensor_fusion
milton
2 months
@usdtermo loko cada vez que viralizas esta imagen me llega un mensaje diciendo "basado" como si fuera el único milton de la argentina jjajsd
16
3
366
@tensor_fusion
milton
4 months
@AlejandraTuk qué garrón no saber cuánto pesa la del mundo y que la mina más linda tenga bigotes
3
4
324
@tensor_fusion
milton
3 months
@realjuanruocco no irónicamente tenés razón en esta loko. los ricos van a tener acceso premium a médicos/tutores/etc humanos. los pobres, máquinas.
4
7
298
@tensor_fusion
milton
2 months
@justalexoki considering the avg salary for venezuelans is like $100 dollars a month, it's a lot
3
2
282
@tensor_fusion
milton
2 months
I picture myself telling von Neumann I can't solve two sum. How would the man react? Then, I move on.
6
3
269
@tensor_fusion
milton
4 months
@growing_daniel I don't think he's ugly actually just british
8
0
260
@tensor_fusion
milton
3 months
AI is entering its "can you hack into facebook" era. Told someone I study AI and they said "oh can you make a video of Milei fighting a giant rat?" How do I tell 'em that this is what I do all day.
Tweet media one
3
15
263
@tensor_fusion
milton
4 months
@iamnotrya @CarlosMaslaton sjsjdjsd parece el que te espera en el juego para darte la misión
2
4
259
@tensor_fusion
milton
3 months
@cosmeluichito jajaj lpm me llamó la atención lo mismo, por qué todos los peronistas tienen bracitos de estrógeno?
@tensor_fusion
milton
3 months
@DamiBlancoo loko tenés menos bíceps que nora cortiñas levantá una pesa te lo pido por favor
11
18
1K
5
4
246
@tensor_fusion
milton
4 months
@FrancoAlva32 pov: tenés 6 meses de vida
1
0
228
@tensor_fusion
milton
3 months
Moreover, apparently he had an eidetic memory meaning he could perfectly recall books/articles by heart.
1
4
219
@tensor_fusion
milton
2 months
Which is not to say this person should give up ofc
1
0
217
@tensor_fusion
milton
4 months
@MichaelTrazzi you forgot he has great hair sir
7
1
186
@tensor_fusion
milton
8 months
@munman3p hagase cargo de los odiadores seriales que generan sus discursos ministro munman
0
0
174
@tensor_fusion
milton
3 months
@fluxtheorist It's great but in general I can't help feeling bittersweet when I read these kinds of books it's daunting to realize I'm dumb as fuck compared to this kind of human and the only thing left is to grind and consistently study/work hard to maybe achieve moderate competence at
8
4
170
@tensor_fusion
milton
3 months
Me presenting the results of my latest refactoring.
Tweet media one
3
8
168
@tensor_fusion
milton
2 months
@gptcrosa los de web3 sólo se enteran cuando se cae google slides
3
3
161
@tensor_fusion
milton
1 month
[Clarification of this tweet bc of many misinterpretations] Distribution of learning ability not being uniform means skill acquisition efforts can vary significantly but those to the right of the curve might fail to appreciate the proportional required by others so the notion of
@tensor_fusion
milton
2 months
I don't mean to sound rude but "you're probably smart enough" is a lie that +130 IQ tech workers who haven't deeply interacted with a <100 IQ person tell people out of niceness bc they never struggled with not being able to solve an AP calculus problem for more than 3 hours.
160
79
3K
8
10
156
@tensor_fusion
milton
3 months
On my way to becoming CUDA cracked.
Tweet media one
1
5
153
@tensor_fusion
milton
8 months
@munman3p Nunca termino de descifrar bien tu ideología munman sos este meme
Tweet media one
2
0
144
@tensor_fusion
milton
5 months
me watching @sama do his usual trolling with these little drops instead of releasing GPT-5 already
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
5 months
It's back?! The saga continues! 👀 "im-a-good-gpt2-chatbot" is currently only available in battle mode at
Tweet media one
7
24
165
7
9
145
@tensor_fusion
milton
3 months
@madorni sabés que te quedan 6 meses de vida cuando la receta del médico se ve así
2
4
141
@tensor_fusion
milton
4 months
hi folks coming from the @ludwigABAP tweet :) in this account I mostly: > talk about ML papers > post interesting AI/CS history tidbits > recommend books > shitpost ofc > post codez > post short explainers on ML/math concepts e.g.
@tensor_fusion
milton
6 months
The transformer self-attention mechanism can be low-rank approximated by adding linear projections for key and value matrices reducing their dimensions from n x d to k x d. Time and space complexities are reduced from quadratic (pairwise dot-products of input tokens) to linear.
Tweet media one
3
9
73
1
8
137
@tensor_fusion
milton
4 months
you follow based on anime pfp I follow based on banner pic we are not the same
Tweet media one
8
2
136
@tensor_fusion
milton
2 months
See this is why I'm bullish on energy-based models. Reframe attention as the gradient of some energy function and some statistical mechanics rigmarole later, efficiently scale decoding to longer contexts. And impl is a short JAX snippet <3
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
4
130
@tensor_fusion
milton
2 months
Wait so all these hours I spent solving Lagrangians for margin classifiers and proving convergence theorems were useless?
Tweet media one
@jxmnop
jack morris
2 months
things you definitely do NOT need to understand to be an expert on LLMs: - linear regression - bias variance trade off - most probability distributions (Gaussian Bernoulli poisson etc.) - RNNs - LSTMs - CNNs - higher-order calculus (beyond first derivatives + chain rule) -
79
46
1K
9
1
128
@tensor_fusion
milton
2 years
These @alfcnz 's lecture slides are *colourful*!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
30
129
@tensor_fusion
milton
4 months
Almost 2K likes? I have to assume tpot is really into: > JAX > ML papers > Accuracy plots > Junji Ito > All of the above combined
@tensor_fusion
milton
4 months
gm gm. hope your weekends are going better than mine
Tweet media one
30
169
2K
9
1
125
@tensor_fusion
milton
2 years
@EmilioRaiden @agustin_pistone Salario digno de Porcel Jr intensifies
1
2
111
@tensor_fusion
milton
4 months
Tweet media one
3
7
115
@tensor_fusion
milton
11 months
@hiperfalcon "no hay plata"
Tweet media one
1
4
116
@tensor_fusion
milton
6 months
@francoisfleuret Yeah graphics in Fukushima papers were something else. Also notice he casually introduces the ReLU back in 1969. Based.
Tweet media one
Tweet media two
1
13
117
@tensor_fusion
milton
10 months
@tselden jajajs las fotos que pusiste son perfectas. era obvio que el tipo ya había perdido la negociación con totinho antes de que empiece
1
1
108
@tensor_fusion
milton
10 months
@ArrepentidosLLA NK asumió con superávit gemelos y previo: - Ajuste brutal de Duhalde con deval de 300%. - Default de deuda de R. Saa. Así y todo, 2 mandatos después, y teniendo las mejores condiciones externas posibles, se fueron con default, cepo, y déficit gemelo. Por favor, agarren un libro.
6
4
102
@tensor_fusion
milton
10 months
@zulemitamenem @JMilei me atrapaste es cine
Tweet media one
0
1
104
@tensor_fusion
milton
11 months
@0xCroto @BetoMendeleiev_ Es un feriado que genera un fin de semana XL con fines turísticos en el medio de un ballotage, con claras intenciones de modificación estadísticamente significativa del voto. Y se escribe "por qué" en una oración interrogativa.
6
0
100
@tensor_fusion
milton
3 months
Tweet media one
@ludwigABAP
ludwig
3 months
"It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration." "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a
Tweet media one
11
9
334
4
4
100
@tensor_fusion
milton
11 months
@PregoneroL "Formularios sobre discriminación en los estadios de fútbol" sjajsj y este muñeco tuvo la caradurez de mandar gente a laburar más de una vez acá.
0
4
96
@tensor_fusion
milton
3 months
Shaping up to be a good day. (It's 5 °C outside I ain't getting out).
Tweet media one
5
3
100
@tensor_fusion
milton
2 months
@radiomitre Por qué no lo hizo en 2019?
3
0
98
@tensor_fusion
milton
2 months
@cosmeluichito Un país entero con daddy issues qué desgracia.
1
1
99
@tensor_fusion
milton
11 months
@tomasrebord Sos yeta pelado, eh. No me tomaría un avión con vos al lado ni en pedo.
2
0
88
@tensor_fusion
milton
3 months
@elmerkst +10 fav a reco bloqueado y desbloqueado lince de las praderas
1
1
93
@tensor_fusion
milton
4 years
Alan Kay, Niklaus Wirth, Dijkstra, David Parnas, Fred Brooks, Michael Jackson, Tony Hoare, etc. What an amazing program and list of speakers. I don't know why I never came across this before but I'm definitely spending a few hours binge watching the videos.
Tweet media one
Tweet media two
7
27
91
@tensor_fusion
milton
2 months
3
2
91
@tensor_fusion
milton
4 months
@unhilo 3 rappis ya me cancelaron el pedido loko así no se puede seguir
0
0
91
@tensor_fusion
milton
1 year
@Solanopo Nunca tendrías que haber salido de la fotocopiadora de UBA sociales.
0
1
87
@tensor_fusion
milton
2 years
@drmichaellevin The ghost seeing the defrosted samples.
Tweet media one
0
1
87
@tensor_fusion
milton
5 months
Tweet media one
1
5
81
@tensor_fusion
milton
1 year
@Bracesco2023 En muchos hilos forjó un teclado inmortal Con experiencia, sedienta ambición de baitear De cebollita soñaba con un tweet viral Y consagrarse en tendencias Tal vez baiteando pudiera A todo twitter domar 🎶
3
3
73
@tensor_fusion
milton
21 days
Wasn’t lying about the production value. Go praise this dude.
Tweet media one
0
6
76
@tensor_fusion
milton
9 months
@miniapeur "The origins of the simplex method go back to one of two famous unsolved problems in mathematical statistics [...] which I mistakenly solved as a homework problem". Based.
Tweet media one
1
5
71
@tensor_fusion
milton
1 month
This is also why ML illiterate doomer-coded muh compute regulations discourse is fundamentally unserious.
5
2
74
@tensor_fusion
milton
6 months
The transformer self-attention mechanism can be low-rank approximated by adding linear projections for key and value matrices reducing their dimensions from n x d to k x d. Time and space complexities are reduced from quadratic (pairwise dot-products of input tokens) to linear.
Tweet media one
3
9
73
@tensor_fusion
milton
4 months
@fernandezpablo La mejor señal de burbuja son los CEOs/CTOs de unicornios argentinos en Silicon Valley que hasta hace 2 años tenían el mono NFT de foto de perfil y ahora tienen en bio "democratizing AI", "AI education" o fantasmeadas así.
2
1
72
@tensor_fusion
milton
4 months
@ndvoskin lo que no entiendo leyendo tus comentarios es cómo fue posible que te dieran el título de economista
4
1
70
@tensor_fusion
milton
27 days
@_Mira___Mira_ This whole thing reeks of something sus, but I’m ready to be pleasantly surprised. I hope it’s not “overfitting to GSM8k for fun and profit”.
0
0
68
@tensor_fusion
milton
2 months
@justalexoki nothing has a literal 0% chance of happening if you're a bayesian
7
0
66
@tensor_fusion
milton
3 months
@SwannMarcus89 he's having a baby now? wow so happy for this fella. he made it
2
0
65
@tensor_fusion
milton
3 months
one more trip around the sun mood
Tweet media one
11
2
63
@tensor_fusion
milton
4 months
@miniapeur Mikhail Belkin, Andrea Montanari (actually physicist), Michael Jordan, John Duchi. They usually publish in pure prob/stats/opt journals.
2
0
64
@tensor_fusion
milton
8 months
@Trumperizar pov: sos un avocado toast con nutella a 13000 pesos en un café de palermo soho
1
1
61
@tensor_fusion
milton
2 months
@fernandezpablo alberto despidiéndose después de contar que cristina mató a néstor y a nisman y destruir el kirchnerismo para siempre
0
5
61
@tensor_fusion
milton
7 months
Tweet media one
0
9
59
@tensor_fusion
milton
2 months
@madorni De acuerdo con la noticia de la que me voy a enterar el Lunes. Abrazo.
0
2
59
@tensor_fusion
milton
2 months
Re: Llama-3: Are we gonna get a Sonnet 3.5 level SOTA CodeGen at Groq inference costs without rate limits + smol local distilled models this week? If yes, I bow.
Tweet media one
4
2
60
@tensor_fusion
milton
2 months
@muhammeddev1337 Not what I said at all. Broader point is it wouldn't hurt to be aware of lottery of life limitations (IQ is in part genetic) bc that can probably help you devise better learning/studying strategies, etc.
@tensor_fusion
milton
2 months
Which is not to say this person should give up ofc
1
0
217
4
1
59
@tensor_fusion
milton
5 months
@superavitfiscal ATENTOS ‼️ ∆ M= iPR + CR - [SP + %ROx(k+i)] CAPUTO MASTERCLASS. Fin.
0
0
57
@tensor_fusion
milton
1 month
Tweet media one
1
1
56
@tensor_fusion
milton
4 months
@anpaure man remember when you could solve a leetcode medium design a pub-sub and get a 150k offer + equity
Tweet media one
2
0
54
@tensor_fusion
milton
4 months
I can’t get over how elegantly autodiff can be implemented in Haskell with dual numbers via typeclasses. If you want to see how to build a tiny autodiff lib + a working feedforward net on top of it, all in pure Haskell, from scratch, I just released this:
Tweet media one
4
3
54
@tensor_fusion
milton
3 months
> "We propose a novel dReLU function" * checks * It's ReLU applied twice. Still, yet another nice LLM inference opt paper from China, and promising results on sparsification. > 90% sparsity in each Mistral-7B FFN > 97% overall sparsity in Mixtral-47B (85% in each FFN)
Tweet media one
5
1
52
@tensor_fusion
milton
3 months
I believe an efficient way to get comfortable with a PL is to read the standard library code. The problem is that the Agda std lib looks like some Kardashev type 2 shit they found on the screens of the alien spaceship at Roswell.
Tweet media one
2
0
53
@tensor_fusion
milton
12 days
Tweet media one
@EsotericCofe
Nucleus☕️
13 days
what do people actually mean when they “learn ML”? are we talking about training mnist classifiers or proving chebyshev’s inequality?
95
29
1K
2
2
50