I have very exciting news to share.
@henshall_will
from TIME magazine wrote an amazing profile featuring yours truly!! 🤩 The link is in the next tweet.
I very very much encourage people to not publicly associate political positions with postures about AI.
We are possibly in the critical juncture where we decide whether this is going to be a problem we all face together or divided.
Do not let AI become party coded.
.
@TrentonBricken
explains how we know LLMs are actually generalizing - aka they're not just stochastic parrots:
- Training models on code makes them better at reasoning in language.
- Models fine tuned on math problems become better at entity detection.
- We can just
We are excited to announce our new research organization: Epoch!
We are working on investigating AI developments and forecasting the development of Transformative AI. You can learn more in our announcement:
Summary below 🧵⬇️
My sense talking to researchers working on AI safety related work is that in the last two years there has been an update towards:
1. Shorter timelines
2. Slow takeoff
3. Less worrying about extinction and more about other catastrophic outcomes
Big personal announcement: I am taking a break from my PhD to work as a contractor for
@open_phil
, to research trends in Artificial Intelligence.
You can read more of what I have been up lately in a post I've written:
At Epoch we have been publicly releasing compute estimates of major models such as GPT4 and Claude2.
Do you think we should keep doing this, even in cases where companies keep the compute deliberately secret?
Ok, I have changed my mind on moving compute thresholds. The EU AI Office does not have and does not plan for the capacity to update compute thresholds every six months. A dynamically moving threshold is a no-go.
I wrote an opinionated list of open research questions in AI forecasting, with some input from
@tamaybes
.
This will be useful if you are considering applying for a job at
@EpochAIResearch
, or want to build a portfolio to break into the field.
What a year. Epoch has gone from a small research group to a major research institute that governments are relying on. And it still is the best workplace in the world, thanks to my awesome colleagues!
2023 was a great year for Epoch!
We just published our annual impact report, listing our achievements in the past year and our plans for the coming year.
Here’s a summary 🧵:
AlphaGo Master and AlphaGo Zero were such massive outliers in scale. They single-handedly warp trends.
Analyses at Epoch need to be very deliberate on whether to include them!
A $100,000 training run in early 2019 costs $700, a 140x improvement.
@EpochAIResearch
's paper on algorithmic efficiency estimated a 3x/year improvement in efficiency, which would imply an expected 240x improvement over 5 years.
@maurosicard
This information was never released but I'd expect it was a lot more. In terms of multipliers let's say 3X from data, 2X from hardware utilization, in 2019 this was probably a V100 cluster (~100 fp16 TFLOPS), down from H100 (~1,000), so that's ~10X. Very roughly let's say ~100X
**ML training compute has been doubling every 6 months since 2010!**
Our preprint "Compute Trends Across Three Eras of Machine Learning" is out.
🧵 Thread below ↓
1/
Underappreciated fact: OpenAI is investing more compute in training than in inference.
GPT-4 has ~240B active parameters and was trained on a 25,000 A100 cluster. At 20% utilisation, this cluster serves 260B token/day. In Feb, OpenAI was serving 100B tokens/day.
Anthropic cofounder
@samsamoa
states they will discontinue non-disparagement agreements and promises not to enforce existing agreements.
Is there confirmation from
@samsamoa
that this is indeed their account and Anthropic's official position?
After writing this article, we were invited to contribute to the national emergency plan of Argentina, which will subsequently be the first country in the world with a national emergency plan for nuclear winter.
Also, check out a summary here!
New report from
@RiesgosGlobales
and
@ALLFEDALLIANCE
on "Food Security in Argentina in the event of an Abrupt Sunlight Reduction Scenario: A Strategic Proposal". Lots of useful ideas; counterpart reports for other countries would be great.
About two-thirds of performance improvements in language models can be attributed to scaling. The remaining one-third corresponds to innovations in model architecture and training.
This has profound implications.
Language models have come a long way since 2012, when recurrent networks struggled to form coherent sentences. Our new paper finds that the compute needed to achieve a set performance level has been halving every 5 to 14 months on average. (1/10)
A recent NYT article showcased
@EpochAIResearch
's data to push a China vs US narrative. Let’s set the record straight - the graph they made (reproduced below) is misleading. I explain why below 🧵
Short report by
@EpochAIResearch
!
We argue that we won’t see ML training runs over 1.22 years - longer runs will be outcompeted by runs that start later and use better hardware and algorithms.
Presenting a new Epoch double feature! Today we release an interactive model of AI timelines and an opinion piece by researcher
@MatthewJBar
explaining our approach to modeling the future of AI. 🧵
A recent paper asseses whether AI could cause explosive growth and suggests no.
It's good to have other economists seriously engage with the arguments that suggest that AI that substitutes for humans could accelerate growth, right?
Do you want to be paid for reading ML papers? At Epoch we are looking for contractors who can help annotate information from notable ML papers to inform our research and visualizations.
@EgeErdil2
I think people usually refer to an economic arrangement in which basic goods like food and clothing have a negligible cost compared to everyone's wealth, such that lowering their price would not increase demand.
What are the limits to the energy efficiency of CMOS microprocessors? In our new paper, published in the International Conference on Rebooting Computing, we propose a simple model to shed light on this question:
My experience with LW people is that they consistently underestimate how seriously other people will take the issue and overestimate how sudden AI developments will be
@StefanFSchubert
I believe that a share of why technical people are very pessimistic is the experience of banging their head against the problem with potential solutions and not succeeding.
I also believe that the underlying threat models for why an intelligent thing may be dangerous are more
How many large AI models are out there, who developed them, and for what applications? To answer this question, we present a new dataset tracking every AI model we could find trained with over 10^23 FLOP.
Highlights in thread 🧵
- It's expensive. Backprop costs twice as much compute as inference, so you would be tripling costs.
- You want to choose your target model size in advance depending on the training target, due to scaling laws.
- It's an attack vector. Remember Tai?
Why have continually learning agents not become a big thing yet?
It seems like from GPT-4, it wouldn't be hard to build one for OpenAI, and it will massively change the pace of capabilities progress.
Over 2023,
@RiesgosGlobales
has become a fully-fledged science-policy organization.
I am incredibly proud of their work. It includes some major successes. I cover some highlights on thread 🧵
One of the most impactful work that can be done in the next couple of months on AI governance is developing frameworks for how to assess risks from AI that governments could readily incorporate into their workflows.
Arrived in Bahamas! The place is absolutely amazing, and the people mind-blowing. I am very grateful to the FTX Foundation for organizing the fellowship!
This paper is the first comprehensive analysis of how the efficiency of language models has been improving over time. It's importance cannot be overstated!
Language models have come a long way since 2012, when recurrent networks struggled to form coherent sentences. Our new paper finds that the compute needed to achieve a set performance level has been halving every 5 to 14 months on average. (1/10)
1/ How quickly are state-of-the-art AI models growing?
The amount of compute used in AI training is a critical driver of progress in AI. Our analysis of over 300 machine learning systems reveals that the amount of compute used in training is consistently being scaled up at
I do predict it, because as a matter of fact this is something we have (mounting) evidence on.
2024 is not the year when AI hardware scaling will hit a wall -- both algorithms and compute will continue being important facets of AI development.
I don't, particularly, predict it, because the future is rarely that predictable -- but if 2024 is the year when AI hardware scaling seems to hit a temporary wall, and further progress past GPT-4 seems to be all about algorithms, this won't surprise me.
I can already guess that,
Some key tips if you want to talk about trends in compute:
1. Use logarithmic axes.
2. Do not fit your trends to only outliers.
3. Do not confuse FLOP and FLOPS.
I currently think this open letter is quite bad, and possibly net harmful. The proposed policy appears vague and misguided. I want to explain some of my thoughts. 🧵
@ATabarrok
Note that currently I would not trust manifold markets much more than a Twitter poll.
Metaculus has a track record, so I would put more trust there.
This old report gives 0.35% chance of full scale nuclear war.
Very thoughtful piece on the future of AI. I think the basic picture that we are going to be rushing through many OOMs of compute soon and that will unlock drastic capability increases is basically right.
Virtually nobody is pricing in what's coming in AI.
I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project.
SITUATIONAL AWARENESS: The Decade Ahead
@alexandr_wang
25% chance before when?
This sentence is vacuous otherwise.
Anyway, if its before 2050 Metaculus agrees with you that 25% is in the right ballpark. But for boring baseline reasons rather than any recent events.
Did you know that there is already a system falling within the purview of the recent AI Executive Order?
Learn more about this and biological ML models on
@EpochAIResearch
's new report!
The recently issued Executive Order requests regulatory oversight of AI models trained on primarily biological sequence data whose training compute exceeds 1e23 operations. Our report examines trends in training compute, data availability and points to potential regulatory gaps🧵
@3blue1brown
Disclaimer: I am a Bayesian
Having said that:
1) maliciously choosing a prior can allow you to infer whatever conclusions you want
2) Bayesian approaches are often computationally intractable
1/ This was an exciting article to write! We establish that compute growth is blazingly fast, doubling twice per year. I am particularly proud of how we expanded on previous work. I explain how below 🧵
1/ How quickly are state-of-the-art AI models growing?
The amount of compute used in AI training is a critical driver of progress in AI. Our analysis of over 300 machine learning systems reveals that the amount of compute used in training is consistently being scaled up at
@SashaMTL
FWIW this seems to me like a case of "you used an outdated model and so you got outdated results". Here is midjourney v6 on the prompt "Mother Teresa fighting against poverty"
If you have recently received an email inviting you to the "First Latin American Conference AI Safety" that claims that I am a confirmed participant, please be aware that this is false. I did not confirm attendance nor I endorse the organizing team.
This is such a clever short argument, but with important implications about the AI progress to come.
I only recently learned of
@EgeErdil2
. And already I have learned a lot from his work.
Our paper "Power laws in Speedrunning and Machine Learning" is out now!
@EgeErdil2
and I develop a model for predicting record improvements in video game speedrunning 🎮 and apply it to predicting Machine Learning benchmarks 🤖. (1/6)
Epoch was born out of a project to systematically collect data about ML systems. I am elated to announce that the database keeps growing and becoming more useful by the moment!
Monkeypox in this week's Sentinel minutes: ~60% Public Health Emergency of International Concern (PHEIC) in the next 12 months, case fatality rate currently 3-5.5% but probably extrapolates to 0.2% if it goes global; probably 1-5x times as worse as seasonal flu if so.
After six months of working and teasing results on Twitter, our report on scaling constraints is finally out. One of the most ambitious
@EpochAIResearch
pieces to date.
1/ Can AI scaling continue through 2030?
We examine whether constraints on power, chip manufacturing, training data, or data center latencies might hinder AI growth. Our analysis suggests that AI scaling can likely continue its current trend through 2030.
The paper I wrote with
@Jess_Riedel
about forecasting timelines for quantum computing is now available in the arXiv!
I also wrote a short explainer on Jess' blog if you want an overview of the results
I am coordinating a research effort to collate the biggest ever public dataset on parameters, compute and dataset size for landmark AI models. And we are looking for collaborators! (details in thread)
I am constantly moving countries and changing phone numbers. It is very tiresome that many of my apps are tied to mobile numbers, which I get subsequently locked out of.
What is a good solution to this?
This is a somewhat misleading picture.
AlphaZero and AlphaGoZero are outliers in terms of compute, and with more data the trend appears substantially slower, doubling every ~6 months.
Compute caps, if imperfectly enforced, can lead to a large compute overhang, plus have a large cost in preventing the development of useful AI.
I'd much rather we focused on improving auditing and threat detection, and addressing vulnerabilities as we scale AI systems.
I have written about a new forecasting aggregation method suggested by
@ericneyman
in a recent paper.
It is still early to say with confidence, but I am moderately excited about their method. It performs well on
@metaculus
binary questions too!
I have inaugurated a new AI art exhibition — Spellbound.
Today I will reveal the first six exhibits.
Every day through November, I will show additional pieces from the collection.
See the gallery with the paintings released so far at
They seem somewhat uncalibrated on how much ai can grow in the coming years. Energy use for training has been going up 3.2x/year for the last few years. That's a 1000x in six more year.
NEW SHIFT KEY:
We talked to Jonathan Koomey, one of the top researchers on the internet’s energy and environmental impact, about whether the AI boom will break the US electricity system.
His verdict: “Everyone needs to calm the heck down.”
.
@EpochAIResearch
has released an interactive website as a supplement to the recent report from Tom Davidson about AI Takeoff Speeds.
We hope you will find it useful!
1/7 Is Claude 3.5 Sonnet actually better than GPT-4o on GPQA?
Benchmark results can be noisy due to randomness in model outputs, so we put Claude 3.5 Sonnet to a more rigorous test.
Here's what we found. 🧵
The majority of my followers think that inference compute will exceed training compute.
Interestingly, my colleague
@EgeErdil2
has a compelling argument that they will be roughly similar. Follow
@EpochAIResearch
to learn about it as soon as it comes out!
The Chinchilla scaling paper by Hoffmann et al. has been highly influential in the language modeling community. We tried to replicate a key part of their work and discovered discrepancies. Here's what we found. (1/9)
@DavidSKrueger
We shouldn't take people's stances from 2016 as overwhelming evidence of what they think now. The field of AI has changed enough that it would be strange if experts hadn't changed their minds on several key issues since.
Excited to release this new
@GovAI
report outlining the risks and benefits of open-sourcing highly capable AI systems and alternative methods for pursuing some open-source goals. (1/10)
Summary thread below 🧵
First, Riesgos Globales has advised the Spanish presidency of the EU council on regulation foundation models. It's hard to understand the counterfactual impact, but all our major recommendations were adopted in the EU AI Act.
I've read
@random_walker
's article three times by now and I just found it though provoking and a good summary of the current epistemic status of AI risk -- uncertain.
We received a paper review that points out we are missing an important reference to a suspiciously similar previous work - which happens to be the preprint version of our own paper.
How are we supposed to address that without breaching blindness?
#AcademicTwitter
I have been asked whether this overturns our previous result that training runs should not take longer than 14-15 months. The TL;DR is that I still think > 15-month training runs are unlikely.
New data insight: The training time for notable AI models is growing steadily.
Since 2010, we've seen a 1.2x increase per year in training duration for notable models (excluding those fine-tuned from base models). This trend has significant implications for AI development. 1/4
@eigenrobot
@sama
can you confirm if the quote is correct or misleading?
Was it >$100M including salaries of devs, or just the cost of the compute?
And it's the cost of operating the cluster or the cost of buying the hardware?
Does this factor in that the cluster can be reused?
Over 2022 and 2023, OpenPhil has pulled $350m in planned funding from GiveWell. This money could save about 70,000 lives today. That's the price of longtermism.