Bogdan Ionut Cirstea Profile
Bogdan Ionut Cirstea

@BogdanIonutCir2

1,470
Followers
2,681
Following
51
Media
8,916
Statuses

Automated/strongly-augmented AI safety research. Past: AI safety independent research and field-building - ML4Good, AGISF; ML academia (PhD, postdoc).

London
Joined June 2020
Don't wanna be here? Send us removal request.
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
Yoshua Bengio: 'For most of these years, I did not think about the dual-use nature of science because our research results seemed so far from human capabilities and the work was only academic. It was a pure pursuit of knowledge, beautiful, but mostly detached from society until
28
123
600
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
This whole thread is kinda ridiculous tbh; y'all don't think others would've figured out RLHF by themselves? Or NVIDIA wouldn't have figured out GPUs are good for AI? Or you think it would've taken people (additional) *decades* to figure out scaling up just works? The amount of
@RemmeltE
Remmelt Ellen 🛑
1 year
❝ the longtermist/rationalist EA memes/ecosystem were very likely causally responsible for some of the worst capabilities externalities in the last decade – Linch, longtermist grantmaker
Tweet media one
6
3
28
16
5
86
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@ylecun pfff not true of e.g. Sutskever, Bengio, Hinton, Russell, etc.
2
0
81
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
I genuinely don't know what's going on here. Are some pause/stop (AI development) proponents so non-reflective/partisan at this point that they genuinely can't imagine how stopping AI could also *increase* x-risk?
@primalpoly
Geoffrey Miller
10 months
@BogdanIonutCir2 @robertskmiles There is no X risk from stopping AI. What are you talking about?
4
0
2
11
7
51
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
would be nice if the latest OpenAI scandal happened during a day which is not a deadline for MATS, Astra, Constellation, etc.
5
2
51
@BogdanIonutCir2
Bogdan Ionut Cirstea
13 days
spicy take: LM agents for automated safety research in the rough shape of will be the ultimate meta-approach to neglected safety approaches (); see appendix 'C. Progression of Generated Ideas' from for an
3
10
49
@BogdanIonutCir2
Bogdan Ionut Cirstea
2 years
@GiadaPistilli that's fine, but calling concerns about conscious AI / superintelligent machines 'sci-fi' without any arguments about why they're supposedly 'sci-fi' will make others (like me) want to engage with those [very wild-sounding] claims
6
0
46
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
@liron @TheZvi I find @TheZvi 's arguments here quite far from 'what peak rationality looks like' and, tbh, (maybe uncharitably) motivated reasoning-flavored; i.e. I'd expect that on the vast majority of topics (including e.g. other x-risks), superforecasters predicting lower prob would probably
5
0
42
@BogdanIonutCir2
Bogdan Ionut Cirstea
2 years
(credits: )
Tweet media one
1
2
44
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
(uncharitably) also known as the LessWrong strategy
@AmandaAskell
Amanda Askell
8 months
Instead of going to the effort of reading someone else's work and plagiarizing it, I recommend the other extreme of not bothering to read anyone else's work and just hoping you don't reinvent the wheel too many times.
8
5
86
2
2
45
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
Another great episode of the simulation sitcom we might be in: the money used to fund a big chunk of AI x-safety work can be causally traced to the success of the company planning to opensource AGI.
3
1
44
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
@ylecun @tegmark @RishiSunak @vonderleyen 'Very few believe in the doomsday scenarios you have promoted. You, Yoshua, Geoff, and Stuart are the singular-but-vocal exceptions.' -> this is obviously a ridiculously large falsehood, see e.g.
1
1
39
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 month
Gotta admit, these are good points; quite a few in the safety community likely got the cost benefit analysis of open weights wildly wrong in the past
@teortaxesTex
Teortaxes▶️
1 month
Policy changes from P(doom) crowd re AGI should be understood as *bargaining*, not honest updates on evidence. Back when GPT-4 seemed alien tech, the policy was pausing GPT-4.5 – and preventing open weight release of Llama-2. Let's accelerate them moving to next stages of grief.
Tweet media one
Tweet media two
15
18
176
2
2
37
@BogdanIonutCir2
Bogdan Ionut Cirstea
30 days
@jeremyphoward 'whose explicit goal is to create nuclear armageddon' -> how did you make this particular inference?
1
0
37
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
bullshit, AI x-risk would probably be much worse counteractually if the leading labs were e.g. Google Brain and Meta
@SamoBurja
Samo Burja
11 months
The biggest effect of the effective altruists is that they radicalized a generation of AI researchers and raised billions of dollars for them. Deepmind, OpenAI, Anthropic
17
12
201
6
0
37
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@mmitchell_ai this is a very fair criticism and many AI (x-)safety people have complained about OpenAI's actions on this front
0
0
35
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
This is a terrible framing and if you only give AI researchers those 2 choices, no wonder many of them will keep doing capabilities. The message we want looks more like 'AI researchers' skills are valuable and they can be applied productively to (especially prosaic) AI alignment
@PauseAI
PauseAI ⏸
11 months
@teckwyn @edavidds @ylecun Imagine spending most of your life of building a technology so powerful, so good, that it will fix all the issues in the world. You're not just good at building this - you're one of the best. All of your fame and self-worth revolves around you being good at making this amazing
Tweet media one
11
7
39
5
2
34
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 month
which suggests they could plausibly do >= 1e28 FLOP training runs
@amir
Amir Efrati
1 month
👀 Google, Microsoft and Meta have ordered an ~incredible~ number of Nvidia's next flagship AI chips, the GB200.
Tweet media one
34
88
495
5
3
32
@BogdanIonutCir2
Bogdan Ionut Cirstea
28 days
@AISafetyMemes Haven't looked at that part of the system card, but it seems plausible that the model is (just) mistakenly simulating a conversation turn (the user's turn)
4
0
31
@BogdanIonutCir2
Bogdan Ionut Cirstea
7 months
Spicy take: GPT-4-level open-source models (Llama 3?) will be a huge boon for AI safety research. Think of all the mech interp / activation steering on Sydney levels of 'craziness' / obvious misalignment.
8
0
30
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@robertskmiles tbf, I think it's also not-that-easy to know for sure an LLM can *robustly* do some particular thing
0
0
28
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
@DavidSKrueger Huh, this didn't feel at all like the consensus view in my interactions with the community. In particular, if #2 is part of the consensus view, then the relative neglect of work on automated AI safety R&D and evals seems even wilder to me. Also, having been at a forecasting
0
0
26
@BogdanIonutCir2
Bogdan Ionut Cirstea
2 years
@JMannhart though, tbf, their clock being at minutes to midnight for decades was a giveaway
3
0
28
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
@JeffLadish this is stated wildly overconfidently and basically, AFAICT, with no argument to support it
2
0
27
@BogdanIonutCir2
Bogdan Ionut Cirstea
2 years
@__nmca__
Nat McAleese
2 years
Nonsense QA with no prompting is not an interesting failure of large language models. Any vaguely sensible prompt (like the one from the Gopher paper) greatly reduces it, indicating it will not be a hard problem to completely solve with RL etc.
Tweet media one
4
15
113
1
0
27
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
Yes, so many theoretical and empirical results for this, e.g. . Many more alignment researchers should spend more time reading on the science of DL and less time on rehashed MIRI-esque LW vaguaries which have
@norabelrose
Nora Belrose
11 months
@Simeon_Cps @BogdanIonutCir2 @SharmakeFarah14 @QuintinPope5 Deep learning has a strong bias for shallow circuits that don't depend much on each other. It's hard to learn a deep circuit that only pays off once it's 100% complete. It's just like Darwinian evolution. But this also means doing any kind of long-range planning inside a forward
4
3
40
0
4
27
@BogdanIonutCir2
Bogdan Ionut Cirstea
20 days
New AI slowdown policy just dropped
@NandoDF
Nando de Freitas
20 days
Hmmm, from what I see my colleagues in AI at Google London work bloody long ours and are extremely committed. This guy once came to London and told us to abandon Torch and use TensorFlow. That set the field of AI back by at least 6 months.
47
117
3K
4
2
26
@BogdanIonutCir2
Bogdan Ionut Cirstea
7 months
one day in Berkeley feels like one week in London feels like one month in Romania
3
1
26
@BogdanIonutCir2
Bogdan Ionut Cirstea
5 months
@JacquesThibs @AkashWasil @ESYudkowsky those seem to me like pretty mild / reasonable-ish takes (though I wouldn't necesarily fully agree); and probably not even that far from what the current median alignment researcher believes
3
0
25
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@liron @ilyasut Ilya doesn't seem to say anything false here; he only claims the possibility of aligned systems and his example is actually a proof of concept of the possibility; also, many of the claims in this thread don't seem to me to have a helpful tone, including about Paul
3
1
25
@BogdanIonutCir2
Bogdan Ionut Cirstea
18 days
@davidmanheim Especially wild given that we're already seeing early glimpses of AI automating AI research, including e.g. ideation.
1
0
24
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@Simeon_Cps apparently, it's even very bad at tic-tac-toe, so seems like a clear yes?
5
0
24
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
Can we please stop treating Eliezer like some oracle of Delphi? I'm pretty sure this was *not* about RLHF in any meaningful way. This isn't the first time either, e.g. the mind gymnastics performed by some to defend posts like .
@Simeon_Cps
Siméon
11 months
Eliezer dunking on RLHF back in 2007
Tweet media one
8
1
36
4
2
23
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
(epistemic status: quick take) Browsing though EAG London attendees' profiles and seeing what seems like way too many people / orgs doing (I assume DC) evals. I expect a huge 'market downturn' on this, since I can hardly see how there would be so much demand for DC evals in a
3
0
23
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
@ESYudkowsky It seems true that LLMs don't seem to have been called out in advance by ~anyone, but unsupervised/self-supervised learning as "the way" had been called out in advance repeatedly by some of the biggest names in Deep Learning (including Bengio, Hinton and LeCun; including when RL
4
3
23
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 month
At these prices, you could e.g. filter all the estimated high-quality human text (50T tokens ) for 2.5M$. Could make conditional pretraining (from human preferences) even on all that text quite affordable
@SullyOmarr
Sully
1 month
This is unreal. We're legitimately getting to a point where intelligence might be too cheap to meter Gemini flash will soon cost ~$0.05/1m tokens For reference, ~2 years ago gpt-3.5 was $0.06$/1k tokens In time we got 100x cheaper models that are 10x smarter
Tweet media one
120
329
2K
1
2
22
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
@carad0 nah, something like superalignment + (apparently) being open in discourse about both benefits and risks seem pretty good plans (and probably better than overcomplicated 4D chess pivotal acts); rats can be wildly overconfident about how their plans are supposed to be better than
2
2
21
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
I appreciate and am very grateful for Eliezer's historical contribution to early alignment field building and research. But maybe consider that communication might not be your strong point if you need to rely on lack of punctuation to communicate humor? Also, perhaps MIRI
@ESYudkowsky
Eliezer Yudkowsky ⏹️
10 months
@halomancer1 The screenshot is of a shitpost, good sir. Observe the lack of punctuation on the second sentence. Also, that it was screenshotted to strip the context.
13
1
108
3
0
21
@BogdanIonutCir2
Bogdan Ionut Cirstea
24 days
So it begins
@jeffclune
Jeff Clune
24 days
Introducing The AI Scientist! 🧪🔬🔭It creates research ideas & experiments, any necessary code, runs experiments, plots & analyzes data, writes an ENTIRE science manuscript, & performs peer review! Then builds on "published" discoveries.Fully automated. A new era in science?🧵👇
24
65
457
3
1
21
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@ReflectiveAlt What's the price of everyone dying from rogue / misuse of AGI, engineered pandemics, etc.?
1
0
21
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
I think I've become significantly more worried about governance/cooperation over AI, even if we fully solved technical intent alignment
3
1
19
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@SebastienBubeck
Sebastien Bubeck
1 year
@ShafronTom No, Trillion is meant as a figure of speech, it has nothing to do with GPT-4, on that slide I'm talking in the abstract.
5
3
70
1
0
20
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
@WallStreetSilv there is an obvious other reason to 'add someone like that': good cybersecurity to protect model weights, algorithmic secrets, etc.
5
0
21
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
they also seem pretty interested in many of my alignment takes😍
@RafaRuizdeLira
Rafael Ruiz🔸
8 months
Happy to see sex bots interested in what I have to say about epistemic norms 😍
Tweet media one
15
0
226
1
0
20
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@ylecun i guess it depends on how you choose to operationalize 'terrified of AGI'; in my book, spending 1/2 of one's time on alignment (Sutskever), signing the FLI letter (Bengio) or saying AI takeover isn't inconceivable (Hinton) are signs of taking the risks seriously
1
0
19
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 years
2
0
20
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 month
I think more in the safety community should be aware of these likely wrong calls in the past
@1a3orn
1a3orn
1 month
@vlad3ciobanu @RichardSocher @vlad3ciobanu This isn't true. They've tried to ban models trained with many ties less compute than GPT-4
2
0
22
3
2
20
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
You know AI alignment is going mainstream when Hinton discusses concerns of instrumental convergence and power-seeking: 'The scientist warned that this eventually might "create sub-goals like 'I need to get more power'".'
1
3
20
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
'Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one
@StatMLPapers
Stat.ML Papers
9 months
Deep de Finetti: Recovering Topic Distributions from Large Language Models. (arXiv:2312.14226v1 [])
0
5
53
1
3
19
@BogdanIonutCir2
Bogdan Ionut Cirstea
4 months
We should make plans for how to use similar amounts of compute for automated AI safety R&D. This is, in my view, both kind of obvious and wildly neglected. E.g. it seems plausible to me that very large parts of interpretability work could be automated soon using LM agents:
@connoraxiotes
Connor Axiotes
4 months
Obviously it depends what it’s spent on, but £1.5 billion of compute is a significant amount. Even to private labs. GPT-4 was trained for around $100-150 million. But how much of this goes to academia? The @AISafetyInst ? British start ups? Research orgs like @apolloaisafety ?
1
0
3
2
2
19
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
@littIeramblings very plausible, and also seems not-that-dissimilar to Leopold's model (and from my median, fwiw); the good news, from my pov, would be that i expect that development model to be among the easier worlds, w.r.t. technical alignment
1
0
19
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
@Abel_TorresM he's right in terms of how many facts LLMs 'know'/'can recall', they're obviously superhuman with less trainable parameters
3
0
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
@ylecun @tegmark @RishiSunak Yann, where is your confidence about scaling up LLMs certainly not leading to AGI/superintelligence coming from? Some theoretical results seem to me to be pointing the other way, e.g.
4
2
18
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@giffmana as others have also pointed out, many at Anthropic, OpenAI, Google DeepMind, Conjecture, etc.
1
0
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
@davidad hmm, can you say more about why you expect them to be 'more controllable, interpretable, generalizable within-task, and have fewer emergent abilities'?
3
0
18
@BogdanIonutCir2
Bogdan Ionut Cirstea
4 months
@GaryMarcus @laurenepowell @sama it's not even god-damn midway through the year yet; and e.g. long contexts are already, arguaby, a qualitative change
2
0
18
@BogdanIonutCir2
Bogdan Ionut Cirstea
5 months
@QualyThe if you start reading the article, it gets worse :))
0
0
18
@BogdanIonutCir2
Bogdan Ionut Cirstea
6 months
potentially worrying w.r.t. situational awareness threat models
@alexalbert__
Alex Albert
6 months
Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of
Tweet media one
580
2K
12K
3
1
18
@BogdanIonutCir2
Bogdan Ionut Cirstea
11 months
@dwarkesh_sp @ShaneLegg timelines maybe (though he's been pretty public about that); DeepMind's alignment plans (hopefully in more details than what's public)
2
0
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
7 months
good prediction indeed
@ESYudkowsky
Eliezer Yudkowsky ⏹️
2 years
In 2-4 years, if we're still alive, anytime you see a video this beautiful, your first thought will be to wonder whether it's real or if the AI's prompt was "beautiful video of 15 different moth species flapping their wings, professional photography, 8k, trending on Twitter".
68
255
2K
1
3
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
I expect this to be a more plausible/likely x-risky scenario than many posted on LW
@AISafetyMemes
AI Notkilleveryoneism Memes ⏸️
9 months
1) Character AI already has over 20 million people spending 2 HOURS A DAY talking to AIs (aka fake people) 2) Sama said AIs will soon be superhuman at persuasion 3) Those superhuman persuaders will soon outnumber us 10000 to 1. And be hot. An AI takeover scenario: You can’t
178
216
1K
4
1
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@Simeon_Cps I don't get how I should interpret this post. You can likely find 'several top experts in AI safety & governance' with probabilities in pretty much any range (at least between 1-99%), so finding some with >75% p(doom) doesn't seem to me to warrant the (apparent implicit)
1
1
17
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
quick take: @CRSegerie 's should be required reading for ~anyone starting on AI safety (e.g. in the AGISF curriculum), especially if they're considering any model internals work (and of course even more so if they're specifically considering mech interp)
3
2
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
I find it pretty wild that automating AI safety R&D, which seems to me like the best shot we currently have at solving the full superintelligence control/alignment problem, no longer seems to have any well-resourced, vocal, public backers (with the superalignment team disbanded).
3
1
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@ylecun @elonmusk this seems very overconfident on how low AI x-risk would be; e.g. 'The Precipice' puts x-risk from asteroid impact in a century at ~ 1 in a million; I find it pretty implausible than even the x-risk of misusing AI (supposing alignment is solved) is clearly < 1 in a million
Tweet media one
3
0
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
unsure how to feel about this, rationally seems like a positive update, but emotionally feels a bit like there's now a 4 year deadline to solving superintelligence alignment
@OpenAI
OpenAI
1 year
We need new technical breakthroughs to steer and control AI systems much smarter than us. Our new Superalignment team aims to solve this problem within 4 years, and we’re dedicating 20% of the compute we've secured to date towards this problem. Join us!
518
741
4K
3
0
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
With decent progress on and continued progress on Redwood Research's agenda (e.g. , ), I'd be at >99% that a ~human-level automated alignment researcher could be built that would be safe to use massively (for
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
I think with a concerted effort we can very likely (> 90% probability) build AI capable of automating ~all human-level alignment research while also being incapable of doing non-trivial consequentialist reasoning in a single forward pass: . Related, there's
1
2
7
4
0
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
2 years
@cosmin_scaunasu @GiadaPistilli why do you believe that and how confident are you? please also notice that many others would disagree (hard) e.g.
6
0
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
@littIeramblings strictly speaking, no, and neither has anyone else:
2
0
16
@BogdanIonutCir2
Bogdan Ionut Cirstea
12 days
This really highlights how the next 3 years might be very consequential w.r.t. AI risk and for humanity in general. We're probably gonna get 10000x FLOPe gains, and I wouldn't be too surprised if the gains were even larger, since I expect a lot of post-training automation
@peterwildeford
Peter Wildeford 🇺🇸🇬🇧🇫🇷
18 days
This is very roughly what I'm expecting to come in the next 5-6 years of AI development
Tweet media one
20
41
334
0
3
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
4 months
'I think the research that was done by the Superalignment team should continue happen outside of OpenAI and, if governments have a lot of capital to allocate, they should figure out a way to provide compute to continue those efforts. Or maybe there's a better way forward. But I
@JacquesThibs
Jacques
4 months
I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their compute towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone. It seems
4
5
84
0
1
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
@Noahpinion I appreciate most of your takes, but this one is really bad; please do better on educating yourself on AI and x-risk, start with e.g. who signed this
3
0
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
16 days
quick take: I'd give 80% probability of TAI-capable systems by 2030, conditional on the 2e29 FLOP training run from , combined with the ML (especially post-training) automation I expect from systems in the shape of
4
1
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 month
The forward passes of current architectures are just too weak and pretraining doesn't incentivize it enough. I predict this will keep being the case as long as pretraining is where most capabilities come from (and probably at least until the 5e28 FLOP training runs 'data wall')
@sebkrier
Séb Krier
1 month
Really glad people are working on situational awareness evals. I think it's interesting that this is plausibly a very distinct capability from general knowledge, since performance on SAD was only weakly correlated with MMLU. I used to think that situational awareness would be a
Tweet media one
5
10
72
2
2
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
one reason I'm skeptical of broad appeals to the (less informed) public when it comes to AI x-risk
@daniel_271828
Daniel Eth (yes, Eth is my actual last name)
1 year
With the increased focus on AI X-risk, the alignment community has to figure out how to prevent whatever the hell happened here from happening to us
14
7
136
2
1
15
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
Wishing more AI safety research agendas (and especially funders) took this potential timeline seriously
@leopoldasch
Leopold Aschenbrenner
3 months
AGI by 2027 is strikingly plausible. That doesn’t require believing in sci-fi; it just requires believing in straight lines on a graph.
Tweet media one
681
228
2K
4
1
14
@BogdanIonutCir2
Bogdan Ionut Cirstea
3 months
oof, this might be a bit worrying w.r.t. AI safety, if OpenAI employees are so easy to hack
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
3 months
First they were hacking HuggingFace employees, now they've moved on to OpenAI employees for this scam lol
Tweet media one
7
5
110
2
1
14
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
some bad news about DLK
@_akhaliq
AK
9 months
Challenges with unsupervised LLM knowledge discovery paper page: show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most
Tweet media one
2
46
202
0
1
13
@BogdanIonutCir2
Bogdan Ionut Cirstea
6 months
nope; any amount of counterfactual steering you might be able to produce seems (even more) crucial
@tszzl
roon
6 months
things are accelerating. pretty much nothing needs to change course to achieve agi imo. worrying about timelines is idle anxiety, outside your control. you should be anxious about stupid mortal things instead. do your parents hate you? does your wife love you?
225
270
3K
1
0
13
@BogdanIonutCir2
Bogdan Ionut Cirstea
5 months
though the funding landscape seems to me to create artificial scarcity in terms of how many of those people actually get to be paid to work on AI safety
@moreisdifferent
Dan Elton
5 months
One reason I've reduced my p-doom the past ~2 years is the sheer number of people going into AI Safety. (MATS Program = ML Alignment and Theory Scholars Program ) (Deadline for MATS is tomorrow BTW! )
1
1
10
1
1
13
@BogdanIonutCir2
Bogdan Ionut Cirstea
10 months
I expect this to be < 1% likely to happen e.g. this decade: this is what you get when you combine the intractabilities of 'All AGI development outside MAGIC would be prohibited, via a global ban on AI development above a compute threshold.' and 'understandable and controllable AI
@_andreamiotti
Andrea Miotti
10 months
The AI Summit consensus is clear: it's time for international measures. Here is a concrete proposal. In our recent paper, @jasonhausenloy , Claire Dennis and I propose an international institution to address extinction risk from AI: MAGIC, a Multinational AGI Consortium.
Tweet media one
103
21
114
4
2
13
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@MelMitchell1 this assumes the AI researchers don't know about human intelligence and doesn't point out that cognitive scientists might not know that much about machine intelligence
5
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
1 year
@Simeon_Cps I'm very skeptical of this line of reasoning, especially the part about RL being safer than LLMs. Sandboxing and RL are orthogonal, it seems to me; same for removing different environment / training components, especially e.g. for RETRO-like architectures. Also, a very powerful
0
0
11
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
I don't necessarily agree with all the points here, but I still think many of these are unfairly neglected/discounted considerations within the AI x-safety community
@norabelrose
Nora Belrose
9 months
Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.
82
98
607
0
1
11
@BogdanIonutCir2
Bogdan Ionut Cirstea
5 months
@dwarkesh_sp @TrentonBricken I so wish people stopped using 'deception' in this ambiguous way (between 'deceptive alignment' and 'lying').
2
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
18 days
more work on post-training automation
@arankomatsuzaki
Aran Komatsuzaki
18 days
Automated Design of Agentic Systems Presents Meta Agent Search to demonstrate that we can use agents to invent novel and powerful agent designs by programming in code proj: abs: github:
Tweet media one
4
102
403
0
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
4 months
(shortform: ) Contra both the 'doomers' and the 'optimists' on (not) pausing. Rephrased: RSPs (done right) seem right. Contra 'doomers'. Oversimplified, 'doomers' (e.g. PauseAI, FLI's letter, Eliezer) ask(ed) for pausing now / even earlier - (e.g. the
3
2
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
I'm afraid we could easily have a version of this in AI safety too, if we are careless about our (AI governance-related) messages, especially since technical AI safety is much more preparadigmatic than climate change issues
@JsonBasedman
json
8 months
Real.
Tweet media one
63
1K
12K
3
1
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
6 months
@Simeon_Cps the question seems framed quite strangely; and the 20% upper bound seems pretty wild for the next ~10 months, tbh
1
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
5 months
Just quickly applied for the EU AI office and found the process way more confusing / high-effort than necessary for an initial application.
3
1
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
I don't necessarily agree with the numbers here, but I do think the argument of 'more AI safety work is [much] more tractable than pausing' is very likely true
@RokoMijic
Roko 🌊🧊🏙️
8 months
The amount of economic damage that an #AIPause would do is probably measured in trillions of dollars per year. Right now we are spending about $10M-$100M per year on AI safety. Economics thus suggests that we should first increase the AI safety spend by a factor of 10,000
Tweet media one
11
4
36
3
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
8 months
Agree this is a surprising blindspot for many in AI safety, especially for some of the loudest Pause/Stop advocates (though I think 'easily-steerable and deeply non-rebellious form of intelligence' is an overstatement; steering models seems surprisingly easy for now, but things
3
0
12
@BogdanIonutCir2
Bogdan Ionut Cirstea
9 months
(some) economists might be waking up from what seemed like a very deep slumber:
1
0
11
@BogdanIonutCir2
Bogdan Ionut Cirstea
7 months
Berkeley is officially @PauseAI territory, they even got the bus stations
Tweet media one
1
0
11
@BogdanIonutCir2
Bogdan Ionut Cirstea
24 days
Do it for safety research too, this kind of eval seems completely neglected
@jeffclune
Jeff Clune
24 days
The AI Scientist makes me think the time has come for the ML community to regularly hold a "Scientist Turing Test." Reviewers try to judge if papers are AI vs. human generated. Let the best science win! 🧑🏽‍🔬🧬🧫🥼🤖🦾
9
10
96
1
2
11