The existence of Hamilton-Jacobi-Bellman equation is kind of a miracle. You are telling me that there is an equation that combines physics and dynamic programming?
The entirety (most?) of fields such as machine learning, algebraic geometry, term rewriting, combinatorial optimization (including dynamic programming), and many others are just ways of finding suitable convolution inverses.
Have been doing a lot of statistical mechanics lately. It’s actually really cool and a lot more satisfying as an interpretation of ml than some alternatives. Have a new result that I will publish soonish.
Go (the game) can be modeled via an Ising model. Not surprising, you have a grid with some states on the grid crossings and the states interact with each other.
Over the last week, I have figured out how to do neural architecture search. Developers writing architectures by hand makes as much sense as writing assembly by hand. Even less, the suboptimality factor is orders of magnitude larger. Watch this space for more info.
I’m building a startup to make energy-based models viable. One gains interpretability, composability (larger models from smaller), faster (cheaper) training and inference. I’m finalizing an investment round, but if you or someone you know want to invest, DM me. The math is crazy.
Shear is hands down the worst affine transformation, prove me wrong. Rotation? Awesome! Translation? Keep it coming! Reflection? You better believe it. Shear? Who? Who invited you?
Writing proofs using theorem provers is in some sense a return to the origins. Math started being written in sand, and it returns to being written in sand (silicon).
The sifting property is a crazy idea, it is as far as I know, the only definition of function evaluation. f(x) amounts to convolving f with a Dirac delta shifted by x.
To clarify, they are both idempotent i.e. P=P^2. The magic stems from the fact that things can be linear in the tropical sense, while being nonlinear in the classical sense. From
AGI is diagonalization of distributive laws. The distributive law, while seemingly humble, captures a bimonoidal structure which in turn captures behavior of a neural network.
Hopf algebra is the future of machine learning. Hopf algebra is a tensor and a cotensor at once. It is capable of "learning" by updating its inner state so that the middle path (counit->unit) is the convolution identity (Dirac delta) of the top path (comult->id->antipode->mult).
Superposition and polysemanticity in machine learning have a natural explanation via convolution. To quote wikipedia “wherever there is a linear system with a ‘superposition principle’, a convolution operation makes an appearance”
Daily reminder that all of machine learning, math, physics, probability, and computation, can be explains in the context of convolution. Ask away and I will try to explain a particular connection.
AGI is diagonalization of distributive laws. The distributive law, while seemingly humble, captures a bimonoidal structure which in turn captures behavior of a neural network.
I have been writing
@nu_shell
lately, I feel liberated and in control of my machine. Mostly utilities I wanted but had a hard time writing in zsh and Rust was too much. Before you say, shell is for simple things, no, you are forced to write simple things because your shell is bad
@SlobodanDmitrov
Is Bjarne holding you at gunpoint? Write a line of code with undefined behavior if yes. Write a line of code without undefined behavior if no.
Note that last week was a culmination of a longer term endeavor. I am constantly reminded of the importance of choosing appropriate formalisms. If you pick wrong, boy are you in for a bad time. This also explains why the current research into this topic is not quite there.
Over the last week, I have figured out how to do neural architecture search. Developers writing architectures by hand makes as much sense as writing assembly by hand. Even less, the suboptimality factor is orders of magnitude larger. Watch this space for more info.
If you want to work for an ML startup which approaches AGI from an angle that has been ignored by mainstream ML, DM me.
Job Posting:
Also, financing is reaching final stages but there is still space left in the round. If you are interested, DM me.
I’m building a startup to make energy-based models viable. One gains interpretability, composability (larger models from smaller), faster (cheaper) training and inference. I’m finalizing an investment round, but if you or someone you know want to invest, DM me. The math is crazy.
Attention in transformer models is just weighted recurrence relations that are progressively pruned. As a result, one can view transformers as a term rewriting system.
HVM by
@VictorTaelin
will be the foundation of future ML.
@BRussellsimp
Check out Hopf algebras, integrable systems, matroids (Coxeter matroids book). Anything with Coxeter groups is good. Symplectic and Poisson geometry. Tropical geometry. Combinatorics. Check out this paper and tell me if you find it interesting
It's crazy how over time I have slowly replaced all of my command line tools with Rust equivalents 🦀
- cat → bat
- pip → uv
- grep → ripgrep
- htop → zenith
- fswatch → watchexec
Any other good ones?
This thread will contain some Rust tips and tricks, 1 per tweet. I will gradually add more.
1.) `matches!` macro
```
switch a {
A::S1 | A::S2 => true,
A::S3 | A:: S4 => false
}
```
into
```
matches!(a, A::S1 | A::S2)
```
This Persi Diaconis/Amy Pang paper provides a solid intuition for Hopf learning theory. Both diffusion models and transformer models are Markov chains and the connections between Hopf algebras and Markov chains is well known.
"Being able to transform states from one representation to another by the Fourier transform is not only convenient but also the underlying reason of the Heisenberg uncertainty principle. "
People are surprised by 4chan and research cultures interacting.Neets make the best researchers, case in point Hennig Brandt
> be me
> want to cook stinky peepee to make gold
> get myself a sugarmommy to support this
> make 400 peepee recipes
> discover phosphorus in the process
@litgenstein
It's the concepts that are interesting, not applications. Renormalization is just adjoint functors. I've written a brain dump .
But fundamentally, renormalization, being adjoint functors, is just a thing that allows you to optimize.