I think I just discovered a very strange text-davinci-002-render-sha artifact. If you type the letter A (in all caps) 373 times, with a space in between each A, the model will produce a hyper specific answer to a random question. Has anyone seen this before?
I'd place my bets that discretized (symbolic) systems will have the greatest chance of being the first to demonstrate these properties. Extremely early progress at Symbolica has already demonstrated this. As scale increases, discrete systems may be the most capable models yet.
NEW: Vinod Khosla
@vkhosla
is betting on a former Tesla autopilot engineer
@vr4300
, who quit to found
@symbolica
, which will build small AI models that can reason: “We love people coming from left field,” Khosla said in an exclusive chat with
@fortune
.
@ylecun
I think this is approaching the problem from the wrong direction. If you can write a simulator why not just build the model correctly from induction to begin with?
It's becoming increasingly clear to me that
@VictorTaelin
is onto something huge with HVM. This is going to be a massive deal for AI & computing in the upcoming years.
HVM is becoming the world's fastest λ-calculator! For a perspective, let's perform a Radix Sort on a Scott Tree with millions of ints, vs state-of-art runtimes:
JavaScript (V8): 29.081s
Haskell (GHC): 11.073s
Kind (HVM): 2.514s
How is that possible? See below ↓
@VictorTaelin
Seeding your brain with the correct abstractions to come to powerful conclusions is always the most time consuming part. It's not time wasted!
What's really going on in machine learning? Just finished a deep dive using (new) minimal models. Seems like ML is basically about fitting together lumps of computational irreducibility ... with important potential implications for science of ML, and future tech...
The crappiness of the Humane AI Pin reported here is a great example of the underappreciated capability-reliability distinction in gen AI. If AI could *reliably* do all the things it's *capable* of, it would truly be a sweeping economic transformation.
@lexfridman
Or, also like computer science, you know you've found the correct tweet, friendship, love, marriage, or meaning of existence when you can reduce the computational complexity to O(1).
Synthetic data is an incredibly silly concept. If you can build a synthetic data generation engine, by extension it should be possible to directly build a model that understands the innate structure of the task completely skipping the need for more training data to learn it.
In heated debate about dongle chaining:
"Build me a keyboard to display adapter." - me, confidently
"I think that's just a computer." -
@harlanhaskins
I am foiled again.
@souplovr23
Plots like these are shadows of beautiful high dimensional structure that we will never be able to visualize. We can pattern match with symbols, but imagine having the ability to see data like this in its true form.
Amazing how far
@symbolica
has come since I first met Tim at NeurIPS in 2022. I'm very excited for the upcoming MLST episode on categorical deep learning which will dive into the tech that we are building to pave the way towards structured symbolic cognition in machines!
Exciting times ahead for
@symbolica
- we caught up with Dr. Paul Lessard, a Principal Scientist in their founder team about their neurosymbolic/category theory approach to taming deep learning as a prelude to our special edition we filmed in London. Cameo from
@vr4300
@bindureddy
This is total nonsense, honestly. If they had a reasoning engine, wasting time and energy using it to generate data to train a larger non-reasoning engine model would be an absurd waste.
It was inspiring to see much of what I spent 4.5 years of my life working on launched yesterday at the Tesla We, Robot event. I'm proud of my friends and coworkers who pushed it over the finish line. Tesla is building the future. It continues to inspire me to do the same.
To reiterate something that I can't stress enough, models that will be capable of dynamic adaptation (true generalization) will necessarily need to define their own loss function. This is incompatible with gradient descent or any gradient based learning models we know of today.
At this point I'm fairly confident that AGI will never be achieved by scaling a single system / architecture. It will most certainly involve bootstraping many systems / architectures with themselves. The first architecture in this process hasn't yet been created. LLMs are not it.
Reasoning cannot magically emerge from data alone. You need the right architecture, the right data, and critically: the ability to interact with a mechanism that validates the result of your reasoning. Can you learn to write programs by looking only at code? No, you need to run
Was a blast building the hardware and firmware for this project. If anyone is interested, I can post a thread on how we pulled it off! Go watch the video!
I finished making all 456 wireless explosive charges for MrBeast's Squid Game. Go watch it!! Ill send a board to a random person who retweets this (does not include explosive)
@ID_AA_Carmack
It's getting there, especially with embedded support improving. SPARK charges licenses for the compiler, but also provides support. Rust is free, but changes rapidly. The biggest roadblock to using it in safety critical applications is lack of good LLVM support for hardened archs
@nice_byte
Getting fearless concurrency / async threading working in C / C++ is a complete nightmare, for instance. Not to mention the crazy number of crates Rust has that you can just cargo add and be off to the races instead of dealing with compiler and linker errors for days.
There won't be *one* way to build AGI, similarly to how there isn't *one* way to derive the value of Pi. Instead, we will see many subtly different algorithms that yield AGI bootstrapping each other until we find an algorithm that optimizes for the hardware we have.
@mattmireles
@OpenAI
@ilyasut
@sama
What are you talking about? If AGI was achieved internally OpenAI's financial issues would be solved and the board would not have any complaints.
@hi_tysam
Love the premise but this is like saying "the reduction of entropy" in a zip file is a world model. I don't disagree with the premise insofar as this is the fundamental information theoretic container of the world built by the model, but like with zip this isn't actually useful.
@chipro
Building tools is like building a factory that builds a product, often overlooked entirely or underestimated in complexity. Shoutout to the tool developers! 😁
@akbirthko
This take is operating at the wrong level of abstraction. Of course the brain is not doing category theory. Think of category theory more like the description language / compiler that provides a definition of what the brain is doing. It's more like the language than the program.
Computer vision didn't work until convolution. Sequence modelling didn't work until recurrence. NLP didn't work until the transformer. But yea, it's just all scale and data right? And also, the transformer is definitely the last architecture.
> flow matching outperforms diffusion
> llama model outperforms DiTs
> scaling data, and compute
does this mean data and scaling is the moat, and model architecture do not matter that much?
With Yann's latest admittance that LLMs aren't enough to get to AGI, it's surprising to see this kind of attitude. Why are these companies focused on copying what's already being done? A huge focus needs to be placed on figuring out what next-gen architectures will look like.
1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applications (LaMDA). Coming soon: Bard, a new experimental conversational
#GoogleAI
service powered by LaMDA.
My niche symbolic AI twitter audience: what do you think about OpenAI's o1? I think we still have a long long way to go to true reasoning. What do you think?
There exists complete and absolute truth. The machines can tell us this absolute truth (a proof) with certainty of its correctness. A proof is a program that we can run. If we build this machine, we unlock the secrets of the computational universe. We must build the orb.
The basic idea is to bring out the SysAD bus and control signals and then commandeer the bus with the FPGA pictured on the right. This will let the FPGA change writes to the internals of the RCP on the fly. Lets see what happens!
NeurIPS 2022 was very insightful. It's interesting to see so much attention to "wide" as opposed to "deep" learning models. Hinton's new Forward-Forward Algorithm introduces a possible framework for learning over the forward pass, but still using gradients.
Re: the path forward to solve ARC-AGI...
If you are generating lots of programs, checking each one with a symbolic checker (e.g. running the actual code of the program and verifying the output), and selecting those that work, you are doing program synthesis (aka "discrete
@FutureJurvetson
We can't continue scaling compute and expecting reasoning to emerge. We must build a formal rigorous study of architecture and then construct models that we know will exhibit properties at scale. In civil engineering you know the bridge you've designed won't collapse at scale.
The year is 2021.
A flagship product used by almost everyone who has anything to do with images, graphics design, or pixels gets *an option* to draw gradients that are not simply broken.
@VictorTaelin
This is sad but makes total sense. I think you will be very effective putting all your attention back into HVM! It will help HOC move along much faster. And that is all in direct favor of better symbolic AI algorithms too. Really excited to see the Mac Mini cluster come together!
I am willing to bet that all of the compute necessary to run a model in real time with emergence indistinguishable from human intelligence is contained in a single RTX 4090 GPU.
@ID_AA_Carmack
I feel that most good ML papers, and technical papers in general, could be presented in a 15 minute 3Blue1Brown style video that make the actual-thing-that-helps extremely obvious. If the paper can’t be presented in such a format, it is likely that it is just fluff.
@ID_AA_Carmack
I think there should be an "ImageNet in the smallest possible self contained executable" contest. Zip everything up into a single CPU only executable and see who can achieve the best runtime vs executable size ratio.
Today, I’m proud to announce
@DayOneVC
’s $150M fund III that brings our AUM to over $450M.
Our north star remains the same: we will keep betting on the most exceptional founders of our time working on the biggest ideas possible.
Diffusion is the rising tide that eventually submerges all frequencies, high and low 🌊
Diffusion is the gradual decomposition into feature scales, fine and coarse 🗼
Diffusion is just spectral autoregression 🤷🌈
@killroy42
@mckaywrigley
@Ciaran2493
Agreed. It's certainly impressive that it can extract the spatial information from the arrows but its understanding of the hierarchy of the story almost certainly comes from the information about the movie in its text training set.