I love my team a lot and sometimes it’s stressful but life has never been so fulfilling. If you want to build AGI on a small team of people who care a lot with thousands of GPUs, please apply :)
We've raised $117M from
@natfriedman
and others to build an AI software engineer.
Code generation is both a product and a path to AGI, requiring new algorithms, lots of CUDA, frontier-scale training, RL, and a new UI.
We are hiring!
When I was 15, I decided I'll dedicate my career to building superhuman AI. Today, some of the coolest people on Earth funded this endeavor with *$28 million*!
Nat is a great sparring partner, coach and supporter. He has consistently pushed us to be even more ambitious while remaining practical. We are incredibly fortunate to have him as our major backer and now also as a board member at Magic.
has trained a groundbreaking model with many millions of tokens of context that performed far better in our evals than anything we've tried before.
They're using it to build an advanced AI programmer that can reason over your entire codebase and the
AI with long-term memory!
*A lot* of work left to do but happy to share a little more about what we've been up to.
It's been incredibly fulfilling to work with a wonderful team and the trust of our backers towards this milestone. Thank you for the opportunity <3
Meet LTM-1: LLM with *5,000,000 prompt tokens*
That's ~500k lines of code or ~5k files, enough to fully cover most repositories.
LTM-1 is a prototype of a neural network architecture we designed for giant context windows.
Nat and I are delighted to be investing $100m in , which is building a superhuman software engineer.
If a copilot generates $10b of revenue, how much is a colleague worth?
I believe in
#opensource
, and in making scientific results reproducable. Therefore, I am open-sourcing an implementation of DREAM, the state-of-the-art in Multi-agent model-free Deep RL.
GitHub:
#ai
#deeplearning
DREAM come true! :) Literally.
@polynoamial
,
@adamlerer
, and I developed an AI algorithm for multi-agent imperfect information games that's *100x more data-efficient* than the previous state-of-the-art. Check out the preprint (under NeurIPS review):
DREAM come true! :) Literally.
@polynoamial
,
@adamlerer
, and I developed an AI algorithm for multi-agent imperfect information games that's *100x more data-efficient* than the previous state-of-the-art. Check out the preprint (under NeurIPS review):
@polynoamial
@BRussellsimp
Yup, email - in high school, I got a DeepMind research scientist as a mentor through an enormously long email (multiple A4 pages) of the shape "here's what I want to do to beat your algo, can you talk to me every 2 weeks to tell me when I'm stupid so I can get smart?"
2 years ago I invested in
@magicailabs
as my first-ever seed-round investment. I felt comfortable doing it because I'd worked with
@EricSteinb
for years and believed he'd succeed. I'm so impressed with the progress they've made and can't wait to see what they do next!
Pretty damn cool that this worked out exactly as Nat hoped it would. Which other problems traditional approaches are stuck on could be solved with crowdsourcing young AI talent + startup thinking + prize pool?
Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD.
Today we are overjoyed to announce that our crazy project has succeeded. After 2000
Autism is a superpower, not a condition. Many of the most genuine, authentic, humble, and tenacious people I know are on the spectrum. Another one of my friends recently found out - loveliest dude ever, builds amazing things. Autism keeps you away from the prison of normalcy.
Introducing
@magicailabs
- what if AI was a colleague, not a tool? We're starting with an AI software engineer, aiming to help developers build, ship, and review code faster and more effectively.
@john_lam
It was just hard to make training stable and LTM-1 was the first time we succeeded at it. Had to make our own version of etc. Scaling should be less pain going forward now that we better understand the thing we made.
Paul Erdős (1500+ papers published) on your left. Terence Tao (10 years old on this photo, now a Field's medallist) on your right. Pondering about a maths question together. This is adorable 😄
We're hiring across ML Eng, GPU kernel Eng, product & design, full-stack, infra - you name it, we'd like to pay you for it. If you're hungry to build something truly awesome, please ping eric
@magic
.dev
The coolest thing about it to me really isn't just that it happened (that's very very cool too!) but that it happened in exactly the way
@natfriedman
thought it would happen on a timeline that's roughly what I recall him saying it would be, like an item on a todo list
@SebastianDeRo
- thank you for leaving your previous CTO role to join us before there was any funding, and thank you for bringing your amazing team with you. The energy we have is uncomparable <3
@GaryMarcus
@mustafasuleymn
Hey
@GaryMarcus
! I'd be happy to.
- Random half of prompts open
- other half secret & put out crypto hash
-
@goodside
or some other credible judge
Would need to define "LLM" as "any AI system"
Btw - love your contribution to the regulation discussion, seriously thank you.
@crazydonkey200
@arankomatsuzaki
Why did you choose to compare with Adafactor for LMs? AdamW, as used in most parts of the paper, is a more competitive baseline. Afaik nobody is using factorization to train their models these days.
@therealmjpoker
@bp22
Who coded it? ;) Jokes aside, well done! Such incredible improvement in bp's game - you 1000% deserve this win and I hope you had a ton of side bets
@jxmnop
@NeelNanda5
Getting AI wrong has an existential (i.e. millions+ of years, all people) downside case. A brilliant person like
@NeelNanda5
can singlehandedly move the needle on alignment, so this seems like a highly effective way for him to spend his time.
Other issues matter too and most
@jacobaustin132
@jeremyphoward
@ESYudkowsky
Hi! I should have been explicit, my bad. I don't have long tweets, so short here then long in next:
GANs have no signal to get complexity beyond Discriminator capabilitiy into the Generator. LLMs do during pretraining, and RLHF doesn't try to remove it.
More detail below
@migtissera
@magicailabs
We got more emails in the last 60 minutes than I would have thought we'd get in the first few days. We're targeting people who are creative about how to get tokens at low $ and effort. I think the $ is fair for many types of data sources.
@ylecun
To minimize perplexity on next-token prediction, it's helpful to predict further ahead, so it's likely that LLMs learn do this internally as an emergent ability. How else would they guess the next word ftera [Your paper] is [is], if not by thinking about what could come after?
@PrimordialAA
@saranormous
Yep, had 2 days worth of todos for the weekend and
@aranku_
and I did ~80% of those 2 days in one day today. I did check the news every now and then though, I have to admit!
@jacobaustin132
@jeremyphoward
@ESYudkowsky
- max learnable cognitive complexity for LLMs is a mix of the set of all forward generators and backward generators that generated the dataset (people, algorithms, randomness, ..), and the complexity of compressing them.
@SteverRobbins
@magicailabs
I believe automating work can shift societal incentives to focus on character over achievement. I know kind people with IQ 100 and a**holes with IQ 145. I'd like society to value kindness over achievement.
Rough plan in next tweet.
@mervyn_z
@polynoamial
It could, you are right! We just tested the pure form to see how well it learns from individual trajectories. Search is powerful and the direction Rebel takes is, in my opinion, likely closer to where the field is going to go than DREAM with no search.
@Tim_Dettmers
@VAribandi
You say k/v norms - do you mean q/k? It makes more sense to me that k/v norms could struggle, but neither q nor k contribute to the residual state and you can get sharp attn with much lower logit values.
@LeeLeepenkman
@magicailabs
LTM-1 is just a PoC model for giant ctx. We're now working on making it smart too (= model scale). First needed to iron out the architecture.
@jacobaustin132
@jeremyphoward
@ESYudkowsky
- GANs don't ever learn beyond Discriminator's ability
- RLHF RM provides no signal beyond RM's ability (like GAN's D), yep.
HOWEVER: by same reasoning, RM signal can't encourage the LLM to unlearn skills obtained in pretraining. And in practise, people actively try to keep them
@alexgraveley
"Bad people might use it" is the direction of the vector but the magnitude matters too. With encryption, risk seems predictably low. Would you say the same about nukes?
There are many more examples like this - capped downside, large upside - but this doesn't represent AGI.
@RichardSocher
@elonmusk
I see you shitting on other (awesome) companies a lot lately. Why do you do this? Just build something great and make it big. If you're best you'll win. Don't try to make others small, especially not those who do good things well.
@Govithinks
I wish CVs would work. But yes, I also rejected people with CVs so beautiful that they could be art pieces and took in some people who didn't send a CV at all. Experience matters, but CVs don't reliably predict character and intelligence - the two things I care about most.
@SteverRobbins
@magicailabs
In this order
1. build & align ASI
2. protect against misuse thru work with gov et al
3. broad access @ everyone
4. pray it's a stable equilibrium
Then:
- let ASI do science in health, energy, ..
- automate work. UBI; no poverty
- Explore universe
- Do whatever fulfills you
@kipperrii
Solo-funding a *new* charity/project to get its thing off the ground. Early stage charity fundraising and grant writing sucks if you don't have the network.
@GuillaumeLample
@GuillaumeLample
I suspect the LR schedule plays into "even after 1T tokens the 7B model was still improving". A 7B run on 500B tokens could give clarity.
@jacobaustin132
@jeremyphoward
@ESYudkowsky
Extremely good question. I currently think the answer is no for simple distillation and it would be capped at GPT4 smartness, but I'm not sure. This is not the case for humans
@BusterFranken
I think (almost) free is best. Maybe 5-10% of the average yearly income of a country and free for anyone who's in the bottom X% of earners. I observed that if it's completely free, people don't take it as seriously.
@BusterFranken
If data shows free is better, I'm for free! My reasoning was that people at free universities seem to take longer to complete their degrees (comparing Austria vs UK here)
@eladgil
Your music tweets are good - I remember you posted Ben Böhmer a while ago. Hadn't come across him before you posted and have been listening quite frequently since then
@RichardSocher
@elonmusk
And for what it's worth, character has tons of users:
Great team, product people use a ton, own the stack, moving fast, familiar interface for their audience.
@Govithinks
Alright, maybe that works. I haven't thought about this enough to know better but precisely because of that, I'll still remain humble in that I believe Warren Buffet and the slightly younger giants outsmart me in investing. Cool that you found a way to get a margin though!