comonoidal esotericist 🦀 @adamnemecek1 Twitter profile

Pinned Tweet

comonoidal esotericist 🦀

2 years

My paper is on arxiv now The main contribution is that it identifies the “backprop without a backward pass” in transformers.

Coinductive guide to inductive transformer heads

We argue that all building blocks of transformer models can be expressed with a single concept: combinatorial Hopf algebra. Transformer learning emerges as a result of the subtle interplay between...

arxiv.org

9

28

174

Last Seen Profiles

@jfperes

@TomParker1990

@yuruyururoom

@charollik

@wdfafootball

@Goldship_001

@RayPilley

@Chris_K_T_

@Bobbyetmoi

@Djane_S

@y8101031

@bieltfss07

@Fuwafivego_go

@overactivegg

@Gaassiigg

@sk8xoker

@AliceDionnet

@christianfrench

@_MrArredondo

@FakKrba

@VivekSa79820544

@KirilKrat

@beeeeero0

@JamilBhati2

@Saifullah5799

@Denissagov_

@MbekiSchool

@HNCHDynamite

@homer103877181

@FanPorn11

@Rocafuerte_fc_

@rathersauliha

@Mat4265

@iwonjae891016

@sancortes_95

comonoidal esotericist 🦀

@adamnemecek1

2 months

The entire field of machine learning exists because the partition function of the Boltzmann distribution is computationally intractable.

74

174

3K

comonoidal esotericist 🦀

@adamnemecek1

5 months

5

91

1K

comonoidal esotericist 🦀

@adamnemecek1

6 months

This is the ideal mathematical outfit. You may not like it, but this is what peak performance looks like.

22

81

637

comonoidal esotericist 🦀

@adamnemecek1

4 months

“One way to learn a lot of mathematics is by reading the first chapters of many books.” — Paul R. Halmos

18

77

559

comonoidal esotericist 🦀

@adamnemecek1

10 months

Braid theory is really underestimated. It’s not about braids, it’s about partial commutativity, braids are a way of controlling noncommutativity.

19

43

417

comonoidal esotericist 🦀

@adamnemecek1

5 months

The existence of Hamilton-Jacobi-Bellman equation is kind of a miracle. You are telling me that there is an equation that combines physics and dynamic programming?

2

43

364

comonoidal esotericist 🦀

@adamnemecek1

6 months

All machine learning architectures are trying to find a convolutional inverse. Get ready for the convolution revolution.

13

20

335

comonoidal esotericist 🦀

@adamnemecek1

5 months

The entirety (most?) of fields such as machine learning, algebraic geometry, term rewriting, combinatorial optimization (including dynamic programming), and many others are just ways of finding suitable convolution inverses.

13

28

285

comonoidal esotericist 🦀

@adamnemecek1

2 months

Have been doing a lot of statistical mechanics lately. It’s actually really cool and a lot more satisfying as an interpretation of ml than some alternatives. Have a new result that I will publish soonish.

8

11

287

comonoidal esotericist 🦀

@adamnemecek1

4 months

I just realized that activation layers (at least ReLUs, don’t care about others) are just projections from geometry/linear algebra

12

20

268

comonoidal esotericist 🦀

@adamnemecek1

5 months

Machine learning makes so much more sense in the context of tropical mathematics. ReLU? A max-semiring.

13

30

211

comonoidal esotericist 🦀

@adamnemecek1

4 years

@SebAaltonen

2

49

182

comonoidal esotericist 🦀

@adamnemecek1

4 months

Just found this paper. DO I EVER MISS, DO I FUCKING MISS? NOTHING BUT NET BABY.

comonoidal esotericist 🦀

@adamnemecek1

4 months

I just realized that activation layers (at least ReLUs, don’t care about others) are just projections from geometry/linear algebra

12

20

268

10

14

183

comonoidal esotericist 🦀

@adamnemecek1

6 months

Go (the game) can be modeled via an Ising model. Not surprising, you have a grid with some states on the grid crossings and the states interact with each other.

6

25

166

comonoidal esotericist 🦀

@adamnemecek1

4 months

Over the last week, I have figured out how to do neural architecture search. Developers writing architectures by hand makes as much sense as writing assembly by hand. Even less, the suboptimality factor is orders of magnitude larger. Watch this space for more info.

5

14

160

comonoidal esotericist 🦀

@adamnemecek1

26 days

I’m building a startup to make energy-based models viable. One gains interpretability, composability (larger models from smaller), faster (cheaper) training and inference. I’m finalizing an investment round, but if you or someone you know want to invest, DM me. The math is crazy.

4

13

162

comonoidal esotericist 🦀

@adamnemecek1

8 months

Shear is hands down the worst affine transformation, prove me wrong. Rotation? Awesome! Translation? Keep it coming! Reflection? You better believe it. Shear? Who? Who invited you?

18

10

135

comonoidal esotericist 🦀

@adamnemecek1

1 month

Hopf algebra based ML is coming, mark my words.

Lorena Barba @[email protected]

@LorenaABarba

1 month

The architecture of an explicit Runge-Kutta method is a Residual Network—Y. Kevrekidis …what is old is new again!

12

105

810

5

133

comonoidal esotericist 🦀

@adamnemecek1

10 months

The halting problem is solved for total functional programming

9

108

comonoidal esotericist 🦀

@adamnemecek1

10 months

Writing proofs using theorem provers is in some sense a return to the origins. Math started being written in sand, and it returns to being written in sand (silicon).

3

12

97

comonoidal esotericist 🦀

@adamnemecek1

7 months

Tropical geometry is much more important to ML than people realize. Softmax has an interpretation in tropical analysis

Softmax function - Wikipedia

en.wikipedia.org

5

11

95

comonoidal esotericist 🦀

@adamnemecek1

5 months

Calling category theory “abstract nonsense” is the same as saying that a dictionary is lacking in terms of plot.

5

10

88

comonoidal esotericist 🦀

@adamnemecek1

3 months

The sifting property is a crazy idea, it is as far as I know, the only definition of function evaluation. f(x) amounts to convolving f with a Dirac delta shifted by x.

7

13

86

comonoidal esotericist 🦀

@adamnemecek1

2 months

@beagles_bagel Bingo

2

0

87

comonoidal esotericist 🦀

@adamnemecek1

4 months

To clarify, they are both idempotent i.e. P=P^2. The magic stems from the fact that things can be linear in the tropical sense, while being nonlinear in the classical sense. From

comonoidal esotericist 🦀

@adamnemecek1

4 months

I just realized that activation layers (at least ReLUs, don’t care about others) are just projections from geometry/linear algebra

12

20

268

4

12

81

comonoidal esotericist 🦀

@adamnemecek1

2 years

@naval I thought being American meant overpaying for insulin.

1

78

comonoidal esotericist 🦀

@adamnemecek1

5 months

Kleene introduced regular expressions to describe McCulloch-Pitts neural networks.

2

14

76

comonoidal esotericist 🦀

@adamnemecek1

3 months

Lol, all I did was asking if it really is a good time to start doing victory laps for Kolmogorov-Arnold networks?

12

2

73

comonoidal esotericist 🦀

@adamnemecek1

1 year

Cohomology for anyone

Cohomology for Anyone

Crystallography has proven a rich source of ideas over several centuries. Among the many ways of looking at space groups, N. David Mermin has pioneered the Fourier-space approach. Recently, we...

arxiv.org

3

12

67

comonoidal esotericist 🦀

@adamnemecek1

4 months

AGI is diagonalization of distributive laws. The distributive law, while seemingly humble, captures a bimonoidal structure which in turn captures behavior of a neural network.

3

10

64

comonoidal esotericist 🦀

@adamnemecek1

1 year

Hopf algebra is the future of machine learning. Hopf algebra is a tensor and a cotensor at once. It is capable of "learning" by updating its inner state so that the middle path (counit->unit) is the convolution identity (Dirac delta) of the top path (comult->id->antipode->mult).

4

5

62

comonoidal esotericist 🦀

@adamnemecek1

7 months

Lately, I have been using 6 degrees of wikipedia as a research tool and I can honestly say that it is much better than I would have guessed .

Six Degrees of Wikipedia

Find the shortest hyperlinked paths between any two pages on Wikipedia.

www.sixdegreesofwikipedia.com

2

62

comonoidal esotericist 🦀

@adamnemecek1

29 days

@JackSpitz5 @MyLordBebo if Russia wins, they will invade Poland or the Baltics. US will then have to get involved.

29

1

59

comonoidal esotericist 🦀

@adamnemecek1

6 months

“A crystal can be regarded as convolution”. Ok, I’m sold on the magic of crystals now.

1

8

59

comonoidal esotericist 🦀

@adamnemecek1

2 months

@tayholliday They are all different flavors of the same statement.

0

58

comonoidal esotericist 🦀

@adamnemecek1

10 months

Superposition and polysemanticity in machine learning have a natural explanation via convolution. To quote wikipedia “wherever there is a linear system with a ‘superposition principle’, a convolution operation makes an appearance”

Convolution - Wikipedia

en.wikipedia.org

4

5

56

comonoidal esotericist 🦀

@adamnemecek1

8 months

@CarlHedgren @Aron_Adler Most sane go workaround.

0

53

comonoidal esotericist 🦀

@adamnemecek1

4 months

Möbius transformation shows up way more than it has any right to.

2

5

44

comonoidal esotericist 🦀

@adamnemecek1

7 months

cpp is so over

4

5

46

comonoidal esotericist 🦀

@adamnemecek1

10 months

Daily reminder that all of machine learning, math, physics, probability, and computation, can be explains in the context of convolution. Ask away and I will try to explain a particular connection.

4

6

44

comonoidal esotericist 🦀

@adamnemecek1

6 months

I might be late to the party but it sure feels like integral transforms cover like 95% of the interesting things in integration.

4

1

41

comonoidal esotericist 🦀

@adamnemecek1

4 months

Ya boy has the market CORNERED, I tells ya.

comonoidal esotericist 🦀

@adamnemecek1

4 months

AGI is diagonalization of distributive laws. The distributive law, while seemingly humble, captures a bimonoidal structure which in turn captures behavior of a neural network.

3

10

64

1

2

43

comonoidal esotericist 🦀

@adamnemecek1

6 months

Math is the ultimate legacy codebase.

2

5

40

comonoidal esotericist 🦀

@adamnemecek1

9 months

1

5

40

comonoidal esotericist 🦀

@adamnemecek1

1 year

My paper made it to the HN front page.

1

0

41

comonoidal esotericist 🦀

@adamnemecek1

6 months

@SnailOXD_YT @okimstillhungry This is such a bait

2

0

37

comonoidal esotericist 🦀

@adamnemecek1

1 year

How come everyone blames MKUltra for how Ted Kaczynski turned out but no one blames the real culprit, complex analysis.

1

3

37

comonoidal esotericist 🦀

@adamnemecek1

8 months

This person was destined to be a mathematician

Marked poset polytopes: Minkowski sums, indecomposables, and...

We analyze marked poset polytopes and generalize a result due to Hibi and Li, answering whether the marked chain polytope is unimodular equivalent to the marked order polytope. Both polytopes...

arxiv.org

3

1

39

comonoidal esotericist 🦀

@adamnemecek1

1 year

@miniapeur “Why can’t he be normal and just watch porn”

0

36

comonoidal esotericist 🦀

@adamnemecek1

10 months

- “You can write slow programs in any language.” - Right, but you cannot write fast programs in any language.

2

3

34

comonoidal esotericist 🦀

@adamnemecek1

4 months

I have been writing @nu_shell lately, I feel liberated and in control of my machine. Mostly utilities I wanted but had a hard time writing in zsh and Rust was too much. Before you say, shell is for simple things, no, you are forced to write simple things because your shell is bad

3

2

34

comonoidal esotericist 🦀

@adamnemecek1

10 months

Newtonian prism is a physical manifestation of the Fourier transform

1

6

34

comonoidal esotericist 🦀

@adamnemecek1

3 years

@SlobodanDmitrov Is Bjarne holding you at gunpoint? Write a line of code with undefined behavior if yes. Write a line of code without undefined behavior if no.

2

0

32

comonoidal esotericist 🦀

@adamnemecek1

6 months

🤔🧐from

4

3

32

comonoidal esotericist 🦀

@adamnemecek1

1 year

Groebner bases run so much deeper than people realize

Rewriting as a Special Case of Noncommutative Groebner Basis Theory

Rewriting for semigroups is a special case of Groebner basis theory for noncommutative polynomial algebras. The fact is a kind of folklore but is not fully recognised. The aim of this paper is to...

arxiv.org

1

4

30

comonoidal esotericist 🦀

@adamnemecek1

2 months

@sycramore Everything.

1

33

comonoidal esotericist 🦀

@adamnemecek1

1 month

@HighFreqAsuka @0xfdf @Robob54174945 Calling bullshit

1

0

31

comonoidal esotericist 🦀

@adamnemecek1

6 months

Sometimes a manifold is an overkill, sometimes I need a just-a-couple-fold.

6

5

30

comonoidal esotericist 🦀

@adamnemecek1

4 months

Note that last week was a culmination of a longer term endeavor. I am constantly reminded of the importance of choosing appropriate formalisms. If you pick wrong, boy are you in for a bad time. This also explains why the current research into this topic is not quite there.

comonoidal esotericist 🦀

@adamnemecek1

4 months

Over the last week, I have figured out how to do neural architecture search. Developers writing architectures by hand makes as much sense as writing assembly by hand. Even less, the suboptimality factor is orders of magnitude larger. Watch this space for more info.

5

14

160

1

4

30

comonoidal esotericist 🦀

@adamnemecek1

4 months

All layers of a neural network are linear, some in the classical sense, some in the tropical sense.

1

2

28

comonoidal esotericist 🦀

@adamnemecek1

4 years

brain dump on adjointness:

GitHub - adamnemecek/adjoint: Thoughts on adjoint, norm and such.

Thoughts on adjoint, norm and such. Contribute to adamnemecek/adjoint development by creating an account on GitHub.

github.com

1

5

29

comonoidal esotericist 🦀

@adamnemecek1

1 year

Polynomial Functors: A General Theory of Interaction

0

4

28

comonoidal esotericist 🦀

@adamnemecek1

5 days

If you want to work for an ML startup which approaches AGI from an angle that has been ignored by mainstream ML, DM me. Job Posting: Also, financing is reaching final stages but there is still space left in the round. If you are interested, DM me.

comonoidal esotericist 🦀

@adamnemecek1

26 days

I’m building a startup to make energy-based models viable. One gains interpretability, composability (larger models from smaller), faster (cheaper) training and inference. I’m finalizing an investment round, but if you or someone you know want to invest, DM me. The math is crazy.

4

13

162

0

4

31

comonoidal esotericist 🦀

@adamnemecek1

3 months

@IY7Z1L @KirkegaardEmil The one on the right is still better looking that 80% of British men.

1

0

26

comonoidal esotericist 🦀

@adamnemecek1

4 years

@pcwalton Are you being held at a gunpoint? Is Rob Pike forcing you to tweet this? If yes, respond with "monomorphization".

2

0

27

comonoidal esotericist 🦀

@adamnemecek1

3 years

1

2

26

comonoidal esotericist 🦀

@adamnemecek1

3 months

@predict_addict Is it all that different from deep learning?

4

0

25

comonoidal esotericist 🦀

@adamnemecek1

2 months

@sp_monte_carlo @walhiskaz Energy-based models are a generalization of probabilistic models. I see why you are suspicious but look into it.

2

0

27

comonoidal esotericist 🦀

@adamnemecek1

1 year

Attention in transformer models is just weighted recurrence relations that are progressively pruned. As a result, one can view transformers as a term rewriting system. HVM by @VictorTaelin will be the foundation of future ML.

2

3

25

comonoidal esotericist 🦀

@adamnemecek1

2 months

@BRussellsimp Check out Hopf algebras, integrable systems, matroids (Coxeter matroids book). Anything with Coxeter groups is good. Symplectic and Poisson geometry. Tropical geometry. Combinatorics. Check out this paper and tell me if you find it interesting

Hopf monoids and generalized permutahedra

Generalized permutahedra are a family of polytopes with a rich combinatorial structure and strong connections to optimization. We prove that they are the universal family of polyhedra with a...

arxiv.org

2

1

24

comonoidal esotericist 🦀

@adamnemecek1

9 months

I motion to rename tensors to “tensies”, after tenders and “tendies".

2

3

24

comonoidal esotericist 🦀

@adamnemecek1

6 months

Best description of pushforward and pullback from

2

1

23

comonoidal esotericist 🦀

@adamnemecek1

1 year

@Stone_Tao Haha I found out yesterday.

comonoidal esotericist 🦀

@adamnemecek1

1 year

Jensen Huang and Lisa Su are cousins once removed!?!?!

1

0

11

1

0

23

comonoidal esotericist 🦀

@adamnemecek1

6 months

@blue_eyed__soul Yeah I dont understand why that is not more prominently stressed.

1

0

20

comonoidal esotericist 🦀

@adamnemecek1

10 months

Statistical correlation has a natural interpretation in projective geometry

Correlation (projective geometry) - Wikipedia

en.wikipedia.org

2

3

22

comonoidal esotericist 🦀

@adamnemecek1

1 year

@karpathy I have recently written a paper on this . Hopf algebras are besties with Markov chains .

Hopf algebras and Markov chains: Two examples and a theory

The operation of squaring (coproduct followed by product) in a combinatorial Hopf algebra is shown to induce a Markov chain in natural bases. Chains constructed in this way include widely studied...

arxiv.org

0

2

22

comonoidal esotericist 🦀

@adamnemecek1

10 months

Mathematicians don’t pay the cost of their abstractions.

2

3

22

comonoidal esotericist 🦀

@adamnemecek1

3 months

I have been slowly migrating all my CLI utilities to the Rust based ones. They are better in every single aspect. nushell is insane for example.

Miles Cranmer

@MilesCranmer

5 months

It's crazy how over time I have slowly replaced all of my command line tools with Rust equivalents 🦀 - cat → bat - pip → uv - grep → ripgrep - htop → zenith - fswatch → watchexec Any other good ones?

87

116

2K

0

2

22

comonoidal esotericist 🦀

@adamnemecek1

2 years

@visegrad24 I have extensive Call of Duty experience and can 360 no scope like a motherfucker. Can I volunteer?

0

18

comonoidal esotericist 🦀

@adamnemecek1

9 months

This thread will contain some Rust tips and tricks, 1 per tweet. I will gradually add more. 1.) `matches!` macro ``` switch a { A::S1 | A::S2 => true, A::S3 | A:: S4 => false } ``` into ``` matches!(a, A::S1 | A::S2) ```

2

4

21

comonoidal esotericist 🦀

@adamnemecek1

1 year

Interpolation is just a special type of convolution

1

2

21

comonoidal esotericist 🦀

@adamnemecek1

1 year

This Persi Diaconis/Amy Pang paper provides a solid intuition for Hopf learning theory. Both diffusion models and transformer models are Markov chains and the connections between Hopf algebras and Markov chains is well known.

Hopf algebras and Markov chains: Two examples and a theory

The operation of squaring (coproduct followed by product) in a combinatorial Hopf algebra is shown to induce a Markov chain in natural bases. Chains constructed in this way include widely studied...

arxiv.org

0

21

comonoidal esotericist 🦀

@adamnemecek1

1 year

Automatic Differentiation With Higher Infinitesimals, or Computational Smooth Infinitesimal Analysis in Weil Algebra

Automatic Differentiation With Higher Infinitesimals, or...

We propose an algorithm to compute the $C^\infty$-ring structure of arbitrary Weil algebra. It allows us to do some analysis with higher infinitesimals numerically and symbolically. To that end,...

arxiv.org

0

4

20

comonoidal esotericist 🦀

@adamnemecek1

1 year

@DimitrisPapail There should be a unified diagramming language for ml architectures.

1

0

20

comonoidal esotericist 🦀

@adamnemecek1

5 months

The Go programming language should be renamed to “Errnil”, you write “err != nil”so much more frequently than “go”.

2

19

comonoidal esotericist 🦀

@adamnemecek1

8 months

@miniapeur understanding p-values

1

20

comonoidal esotericist 🦀

@adamnemecek1

4 years

@allison_horst You should also consider teaching how to do this with automatic differentiation. it makes so much more sense.

Dual Numbers & Automatic Differentiation

In the last post, I talked about imaginary numbers, complex numbers, and how to use them to rotate vectors in 2d. In this post, I want to share another interesting type of number called a “Du…

blog.demofox.org

3

1

20

comonoidal esotericist 🦀

@adamnemecek1

6 months

@hansmriess No, all.

Convolution, attention and structure embedding

Deep neural networks are composed of layers of parametrised linear operations intertwined with non linear activations. In basic models, such as the multi-layer perceptron, a linear layer operates...

arxiv.org

1

0

19

comonoidal esotericist 🦀

@adamnemecek1

3 years

@ekuber @pcwalton

1

18

comonoidal esotericist 🦀

@adamnemecek1

11 months

Idris replacing Haskell’s :: with : makes the language so much more pleasant to use.

3

0

19

comonoidal esotericist 🦀

@adamnemecek1

1 year

Stable homotopy? What is that? Homotopy for horses?

1

2

17

comonoidal esotericist 🦀

@adamnemecek1

7 days

"Being able to transform states from one representation to another by the Fourier transform is not only convenient but also the underlying reason of the Heisenberg uncertainty principle. "

0

1

20

comonoidal esotericist 🦀

@adamnemecek1

4 years

From the makers of hardware and software comes Delaware.

1

19

comonoidal esotericist 🦀

@adamnemecek1

7 months

People are surprised by 4chan and research cultures interacting.Neets make the best researchers, case in point Hennig Brandt > be me > want to cook stinky peepee to make gold > get myself a sugarmommy to support this > make 400 peepee recipes > discover phosphorus in the process

4

0

16

comonoidal esotericist 🦀

@adamnemecek1

4 months

I have made a CLI app for controlling previewing of PDFs from the terminal on macOS

GitHub - adamnemecek/qlpro

Contribute to adamnemecek/qlpro development by creating an account on GitHub.

github.com

3

0

19

comonoidal esotericist 🦀

@adamnemecek1

9 months

Everybody is talking about AI coding assistants - my brother in Christ, all you need is an improved GitHub search.

3

0

17

comonoidal esotericist 🦀

@adamnemecek1

4 years

@litgenstein It's the concepts that are interesting, not applications. Renormalization is just adjoint functors. I've written a brain dump . But fundamentally, renormalization, being adjoint functors, is just a thing that allows you to optimize.

Adjoint functors - Wikipedia

en.wikipedia.org

2

18