Yoshua Bengio:
'For most of these years, I did not think about the dual-use nature of science because our research results seemed so far from human capabilities and the work was only academic. It was a pure pursuit of knowledge, beautiful, but mostly detached from society until
This whole thread is kinda ridiculous tbh; y'all don't think others would've figured out RLHF by themselves? Or NVIDIA wouldn't have figured out GPUs are good for AI? Or you think it would've taken people (additional) *decades* to figure out scaling up just works? The amount of
❝ the longtermist/rationalist EA memes/ecosystem were very likely causally responsible for some of the worst capabilities externalities in the last decade
– Linch, longtermist grantmaker
I genuinely don't know what's going on here. Are some pause/stop (AI development) proponents so non-reflective/partisan at this point that they genuinely can't imagine how stopping AI could also *increase* x-risk?
spicy take: LM agents for automated safety research in the rough shape of will be the ultimate meta-approach to neglected safety approaches (); see appendix 'C. Progression of Generated Ideas' from for an
@GiadaPistilli
that's fine, but calling concerns about conscious AI / superintelligent machines 'sci-fi' without any arguments about why they're supposedly 'sci-fi' will make others (like me) want to engage with those [very wild-sounding] claims
@liron
@TheZvi
I find
@TheZvi
's arguments here quite far from 'what peak rationality looks like' and, tbh, (maybe uncharitably) motivated reasoning-flavored; i.e. I'd expect that on the vast majority of topics (including e.g. other x-risks), superforecasters predicting lower prob would probably
Instead of going to the effort of reading someone else's work and plagiarizing it, I recommend the other extreme of not bothering to read anyone else's work and just hoping you don't reinvent the wheel too many times.
Another great episode of the simulation sitcom we might be in: the money used to fund a big chunk of AI x-safety work can be causally traced to the success of the company planning to opensource AGI.
@ylecun
@tegmark
@RishiSunak
@vonderleyen
'Very few believe in the doomsday scenarios you have promoted.
You, Yoshua, Geoff, and Stuart are the singular-but-vocal exceptions.' -> this is obviously a ridiculously large falsehood, see e.g.
Policy changes from P(doom) crowd re AGI should be understood as *bargaining*, not honest updates on evidence.
Back when GPT-4 seemed alien tech, the policy was pausing GPT-4.5 – and preventing open weight release of Llama-2.
Let's accelerate them moving to next stages of grief.
The biggest effect of the effective altruists is that they radicalized a generation of AI researchers and raised billions of dollars for them.
Deepmind, OpenAI, Anthropic
This is a terrible framing and if you only give AI researchers those 2 choices, no wonder many of them will keep doing capabilities. The message we want looks more like 'AI researchers' skills are valuable and they can be applied productively to (especially prosaic) AI alignment
@teckwyn
@edavidds
@ylecun
Imagine spending most of your life of building a technology so powerful, so good, that it will fix all the issues in the world. You're not just good at building this - you're one of the best. All of your fame and self-worth revolves around you being good at making this amazing
@AISafetyMemes
Haven't looked at that part of the system card, but it seems plausible that the model is (just) mistakenly simulating a conversation turn (the user's turn)
Spicy take: GPT-4-level open-source models (Llama 3?) will be a huge boon for AI safety research. Think of all the mech interp / activation steering on Sydney levels of 'craziness' / obvious misalignment.
@DavidSKrueger
Huh, this didn't feel at all like the consensus view in my interactions with the community. In particular, if
#2
is part of the consensus view, then the relative neglect of work on automated AI safety R&D and evals seems even wilder to me.
Also, having been at a forecasting
Nonsense QA with no prompting is not an interesting failure of large language models. Any vaguely sensible prompt (like the one from the Gopher paper) greatly reduces it, indicating it will not be a hard problem to completely solve with RL etc.
Yes, so many theoretical and empirical results for this, e.g. . Many more alignment researchers should spend more time reading on the science of DL and less time on rehashed MIRI-esque LW vaguaries which have
@Simeon_Cps
@BogdanIonutCir2
@SharmakeFarah14
@QuintinPope5
Deep learning has a strong bias for shallow circuits that don't depend much on each other. It's hard to learn a deep circuit that only pays off once it's 100% complete. It's just like Darwinian evolution. But this also means doing any kind of long-range planning inside a forward
Hmmm, from what I see my colleagues in AI at Google London work bloody long ours and are extremely committed. This guy once came to London and told us to abandon Torch and use TensorFlow. That set the field of AI back by at least 6 months.
@JacquesThibs
@AkashWasil
@ESYudkowsky
those seem to me like pretty mild / reasonable-ish takes (though I wouldn't necesarily fully agree); and probably not even that far from what the current median alignment researcher believes
@liron
@ilyasut
Ilya doesn't seem to say anything false here; he only claims the possibility of aligned systems and his example is actually a proof of concept of the possibility; also, many of the claims in this thread don't seem to me to have a helpful tone, including about Paul
Can we please stop treating Eliezer like some oracle of Delphi? I'm pretty sure this was *not* about RLHF in any meaningful way. This isn't the first time either, e.g. the mind gymnastics performed by some to defend posts like .
(epistemic status: quick take)
Browsing though EAG London attendees' profiles and seeing what seems like way too many people / orgs doing (I assume DC) evals. I expect a huge 'market downturn' on this, since I can hardly see how there would be so much demand for DC evals in a
@ESYudkowsky
It seems true that LLMs don't seem to have been called out in advance by ~anyone, but unsupervised/self-supervised learning as "the way" had been called out in advance repeatedly by some of the biggest names in Deep Learning (including Bengio, Hinton and LeCun; including when RL
At these prices, you could e.g. filter all the estimated high-quality human text (50T tokens ) for 2.5M$. Could make conditional pretraining (from human preferences) even on all that text quite affordable
This is unreal. We're legitimately getting to a point where intelligence might be too cheap to meter
Gemini flash will soon cost ~$0.05/1m tokens
For reference, ~2 years ago gpt-3.5 was $0.06$/1k tokens
In time we got 100x cheaper models that are 10x smarter
@carad0
nah, something like superalignment + (apparently) being open in discourse about both benefits and risks seem pretty good plans (and probably better than overcomplicated 4D chess pivotal acts); rats can be wildly overconfident about how their plans are supposed to be better than
I appreciate and am very grateful for Eliezer's historical contribution to early alignment field building and research. But maybe consider that communication might not be your strong point if you need to rely on lack of punctuation to communicate humor? Also, perhaps MIRI
@halomancer1
The screenshot is of a shitpost, good sir. Observe the lack of punctuation on the second sentence. Also, that it was screenshotted to strip the context.
Introducing The AI Scientist! 🧪🔬🔭It creates research ideas & experiments, any necessary code, runs experiments, plots & analyzes data, writes an ENTIRE science manuscript, & performs peer review! Then builds on "published" discoveries.Fully automated. A new era in science?🧵👇
@ylecun
i guess it depends on how you choose to operationalize 'terrified of AGI'; in my book, spending 1/2 of one's time on alignment (Sutskever), signing the FLI letter (Bengio) or saying AI takeover isn't inconceivable (Hinton) are signs of taking the risks seriously
You know AI alignment is going mainstream when Hinton discusses concerns of instrumental convergence and power-seeking:
'The scientist warned that this eventually might "create sub-goals like 'I need to get more power'".'
'Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one
We should make plans for how to use similar amounts of compute for automated AI safety R&D. This is, in my view, both kind of obvious and wildly neglected. E.g. it seems plausible to me that very large parts of interpretability work could be automated soon using LM agents:
Obviously it depends what it’s spent on, but £1.5 billion of compute is a significant amount. Even to private labs. GPT-4 was trained for around $100-150 million.
But how much of this goes to academia? The
@AISafetyInst
? British start ups? Research orgs like
@apolloaisafety
?
@littIeramblings
very plausible, and also seems not-that-dissimilar to Leopold's model (and from my median, fwiw); the good news, from my pov, would be that i expect that development model to be among the easier worlds, w.r.t. technical alignment
@ylecun
@tegmark
@RishiSunak
Yann, where is your confidence about scaling up LLMs certainly not leading to AGI/superintelligence coming from? Some theoretical results seem to me to be pointing the other way, e.g.
@davidad
hmm, can you say more about why you expect them to be 'more controllable, interpretable, generalizable within-task, and have fewer emergent abilities'?
Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.
For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of
@dwarkesh_sp
@ShaneLegg
timelines maybe (though he's been pretty public about that); DeepMind's alignment plans (hopefully in more details than what's public)
In 2-4 years, if we're still alive, anytime you see a video this beautiful, your first thought will be to wonder whether it's real or if the AI's prompt was "beautiful video of 15 different moth species flapping their wings, professional photography, 8k, trending on Twitter".
1) Character AI already has over 20 million people spending 2 HOURS A DAY talking to AIs (aka fake people)
2) Sama said AIs will soon be superhuman at persuasion
3) Those superhuman persuaders will soon outnumber us 10000 to 1. And be hot.
An AI takeover scenario:
You can’t
@Simeon_Cps
I don't get how I should interpret this post. You can likely find 'several top experts in AI safety & governance' with probabilities in pretty much any range (at least between 1-99%), so finding some with >75% p(doom) doesn't seem to me to warrant the (apparent implicit)
quick take:
@CRSegerie
's should be required reading for ~anyone starting on AI safety (e.g. in the AGISF curriculum), especially if they're considering any model internals work (and of course even more so if they're specifically considering mech interp)
I find it pretty wild that automating AI safety R&D, which seems to me like the best shot we currently have at solving the full superintelligence control/alignment problem, no longer seems to have any well-resourced, vocal, public backers (with the superalignment team disbanded).
@ylecun
@elonmusk
this seems very overconfident on how low AI x-risk would be; e.g. 'The Precipice' puts x-risk from asteroid impact in a century at ~ 1 in a million; I find it pretty implausible than even the x-risk of misusing AI (supposing alignment is solved) is clearly < 1 in a million
unsure how to feel about this, rationally seems like a positive update, but emotionally feels a bit like there's now a 4 year deadline to solving superintelligence alignment
We need new technical breakthroughs to steer and control AI systems much smarter than us.
Our new Superalignment team aims to solve this problem within 4 years, and we’re dedicating 20% of the compute we've secured to date towards this problem.
Join us!
With decent progress on and continued progress on Redwood Research's agenda (e.g. , ), I'd be at >99% that a ~human-level automated alignment researcher could be built that would be safe to use massively (for
I think with a concerted effort we can very likely (> 90% probability) build AI capable of automating ~all human-level alignment research while also being incapable of doing non-trivial consequentialist reasoning in a single forward pass: . Related, there's
This really highlights how the next 3 years might be very consequential w.r.t. AI risk and for humanity in general. We're probably gonna get 10000x FLOPe gains, and I wouldn't be too surprised if the gains were even larger, since I expect a lot of post-training automation
'I think the research that was done by the Superalignment team should continue happen outside of OpenAI and, if governments have a lot of capital to allocate, they should figure out a way to provide compute to continue those efforts. Or maybe there's a better way forward. But I
I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their compute towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone.
It seems
@Noahpinion
I appreciate most of your takes, but this one is really bad; please do better on educating yourself on AI and x-risk, start with e.g. who signed this
quick take: I'd give 80% probability of TAI-capable systems by 2030, conditional on the 2e29 FLOP training run from , combined with the ML (especially post-training) automation I expect from systems in the shape of
The forward passes of current architectures are just too weak and pretraining doesn't incentivize it enough. I predict this will keep being the case as long as pretraining is where most capabilities come from (and probably at least until the 5e28 FLOP training runs 'data wall')
Really glad people are working on situational awareness evals. I think it's interesting that this is plausibly a very distinct capability from general knowledge, since performance on SAD was only weakly correlated with MMLU.
I used to think that situational awareness would be a
Challenges with unsupervised LLM knowledge discovery
paper page:
show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most
things are accelerating. pretty much nothing needs to change course to achieve agi imo. worrying about timelines is idle anxiety, outside your control. you should be anxious about stupid mortal things instead. do your parents hate you? does your wife love you?
though the funding landscape seems to me to create artificial scarcity in terms of how many of those people actually get to be paid to work on AI safety
One reason I've reduced my p-doom the past ~2 years is the sheer number of people going into AI Safety.
(MATS Program = ML Alignment and Theory Scholars Program )
(Deadline for MATS is tomorrow BTW! )
I expect this to be < 1% likely to happen e.g. this decade: this is what you get when you combine the intractabilities of 'All AGI development outside MAGIC would be prohibited, via a global ban on AI development above a compute threshold.' and 'understandable and controllable AI
The AI Summit consensus is clear: it's time for international measures. Here is a concrete proposal.
In our recent paper,
@jasonhausenloy
, Claire Dennis and I propose an international institution to address extinction risk from AI: MAGIC, a Multinational AGI Consortium.
@MelMitchell1
this assumes the AI researchers don't know about human intelligence and doesn't point out that cognitive scientists might not know that much about machine intelligence
@Simeon_Cps
I'm very skeptical of this line of reasoning, especially the part about RL being safer than LLMs. Sandboxing and RL are orthogonal, it seems to me; same for removing different environment / training components, especially e.g. for RETRO-like architectures. Also, a very powerful
I don't necessarily agree with all the points here, but I still think many of these are unfairly neglected/discounted considerations within the AI x-safety community
Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all.
We strive for a future where everyone is empowered by AIs under their own control.
In our first post, we argue AI is easy to control, and will get more controllable over time.
Automated Design of Agentic Systems
Presents Meta Agent Search to demonstrate that we can use agents to invent novel and powerful agent designs by programming in code
proj:
abs:
github:
(shortform: )
Contra both the 'doomers' and the 'optimists' on (not) pausing. Rephrased: RSPs (done right) seem right.
Contra 'doomers'. Oversimplified, 'doomers' (e.g. PauseAI, FLI's letter, Eliezer) ask(ed) for pausing now / even earlier - (e.g. the
I'm afraid we could easily have a version of this in AI safety too, if we are careless about our (AI governance-related) messages, especially since technical AI safety is much more preparadigmatic than climate change issues
I don't necessarily agree with the numbers here, but I do think the argument of 'more AI safety work is [much] more tractable than pausing' is very likely true
The amount of economic damage that an
#AIPause
would do is probably measured in trillions of dollars per year.
Right now we are spending about $10M-$100M per year on AI safety.
Economics thus suggests that we should first increase the AI safety spend by a factor of 10,000
Agree this is a surprising blindspot for many in AI safety, especially for some of the loudest Pause/Stop advocates (though I think 'easily-steerable and deeply non-rebellious form of intelligence' is an overstatement; steering models seems surprisingly easy for now, but things
The AI Scientist makes me think the time has come for the ML community to regularly hold a "Scientist Turing Test." Reviewers try to judge if papers are AI vs. human generated. Let the best science win! 🧑🏽🔬🧬🧫🥼🤖🦾