Bogdan Ionut Cirstea @BogdanIonutCir2 Twitter profile

Last Seen Profiles

@SinfulDoge

@cosaskknotedije

@tbarkerfl_tom

@mizzytohblaq

@stwmaniax

@HoonnieH_

@fugaku3600

@pedapudi007

@eiorresth

@BBCTech

@bokeplokalmalam

@FFlaSantiago

@LolitaMendieta_

@stw_pdg

@argiedog

@_Marshal_X

@bayonett444

@abbyhynes24

@lehoang306

@RahmaorabyW

@koroppokul

@Bdooo_r00

@zuluetAlfea

@BraveWiki

@chris_taru

@KMRNPRNFL

@SHINOBUFEED

@xZahida

@chuche861

@fud_bia

@chunmomo0127

@elonmusk__847

@destinyschilde

@Wenyi8630

@LangasNaini

@_JordiTorres

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

Yoshua Bengio: 'For most of these years, I did not think about the dual-use nature of science because our research results seemed so far from human capabilities and the work was only academic. It was a pure pursuit of knowledge, beautiful, but mostly detached from society until

28

123

600

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

This whole thread is kinda ridiculous tbh; y'all don't think others would've figured out RLHF by themselves? Or NVIDIA wouldn't have figured out GPUs are good for AI? Or you think it would've taken people (additional) *decades* to figure out scaling up just works? The amount of

Remmelt Ellen 🛑

@RemmeltE

1 year

❝ the longtermist/rationalist EA memes/ecosystem were very likely causally responsible for some of the worst capabilities externalities in the last decade – Linch, longtermist grantmaker

6

3

28

16

5

86

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@ylecun pfff not true of e.g. Sutskever, Bengio, Hinton, Russell, etc.

2

0

81

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

I genuinely don't know what's going on here. Are some pause/stop (AI development) proponents so non-reflective/partisan at this point that they genuinely can't imagine how stopping AI could also *increase* x-risk?

Geoffrey Miller

@primalpoly

10 months

@BogdanIonutCir2 @robertskmiles There is no X risk from stopping AI. What are you talking about?

4

0

2

11

7

51

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

would be nice if the latest OpenAI scandal happened during a day which is not a deadline for MATS, Astra, Constellation, etc.

5

2

51

Bogdan Ionut Cirstea

@BogdanIonutCir2

13 days

spicy take: LM agents for automated safety research in the rough shape of will be the ultimate meta-approach to neglected safety approaches (); see appendix 'C. Progression of Generated Ideas' from for an

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been...

arxiv.org

3

10

49

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 years

@GiadaPistilli that's fine, but calling concerns about conscious AI / superintelligent machines 'sci-fi' without any arguments about why they're supposedly 'sci-fi' will make others (like me) want to engage with those [very wild-sounding] claims

6

0

46

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

@liron @TheZvi I find @TheZvi 's arguments here quite far from 'what peak rationality looks like' and, tbh, (maybe uncharitably) motivated reasoning-flavored; i.e. I'd expect that on the vast majority of topics (including e.g. other x-risks), superforecasters predicting lower prob would probably

5

0

42

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 years

(credits: )

1

2

44

Bogdan Ionut Cirstea

@BogdanIonutCir2

8 months

(uncharitably) also known as the LessWrong strategy

Amanda Askell

@AmandaAskell

8 months

Instead of going to the effort of reading someone else's work and plagiarizing it, I recommend the other extreme of not bothering to read anyone else's work and just hoping you don't reinvent the wheel too many times.

8

5

86

2

45

Bogdan Ionut Cirstea

@BogdanIonutCir2

8 months

Another great episode of the simulation sitcom we might be in: the money used to fund a big chunk of AI x-safety work can be causally traced to the success of the company planning to opensource AGI.

3

1

44

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

@ylecun @tegmark @RishiSunak @vonderleyen 'Very few believe in the doomsday scenarios you have promoted. You, Yoshua, Geoff, and Stuart are the singular-but-vocal exceptions.' -> this is obviously a ridiculously large falsehood, see e.g.

Statement on AI Risk | CAIS

A statement jointly signed by a historic coalition of experts: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and...

www.safe.ai

1

39

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 month

Gotta admit, these are good points; quite a few in the safety community likely got the cost benefit analysis of open weights wildly wrong in the past

Teortaxes▶️

@teortaxesTex

1 month

Policy changes from P(doom) crowd re AGI should be understood as *bargaining*, not honest updates on evidence. Back when GPT-4 seemed alien tech, the policy was pausing GPT-4.5 – and preventing open weight release of Llama-2. Let's accelerate them moving to next stages of grief.

15

18

176

2

37

Bogdan Ionut Cirstea

@BogdanIonutCir2

30 days

@jeremyphoward 'whose explicit goal is to create nuclear armageddon' -> how did you make this particular inference?

1

0

37

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

bullshit, AI x-risk would probably be much worse counteractually if the leading labs were e.g. Google Brain and Meta

Samo Burja

@SamoBurja

11 months

The biggest effect of the effective altruists is that they radicalized a generation of AI researchers and raised billions of dollars for them. Deepmind, OpenAI, Anthropic

17

12

201

6

0

37

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@mmitchell_ai this is a very fair criticism and many AI (x-)safety people have complained about OpenAI's actions on this front

0

35

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

This is a terrible framing and if you only give AI researchers those 2 choices, no wonder many of them will keep doing capabilities. The message we want looks more like 'AI researchers' skills are valuable and they can be applied productively to (especially prosaic) AI alignment

PauseAI ⏸

@PauseAI

11 months

@teckwyn @edavidds @ylecun Imagine spending most of your life of building a technology so powerful, so good, that it will fix all the issues in the world. You're not just good at building this - you're one of the best. All of your fame and self-worth revolves around you being good at making this amazing

11

7

39

5

2

34

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 month

which suggests they could plausibly do >= 1e28 FLOP training runs

Amir Efrati

@amir

1 month

👀 Google, Microsoft and Meta have ordered an ~incredible~ number of Nvidia's next flagship AI chips, the GB200.

34

88

495

5

3

32

Bogdan Ionut Cirstea

@BogdanIonutCir2

28 days

@AISafetyMemes Haven't looked at that part of the system card, but it seems plausible that the model is (just) mistakenly simulating a conversation turn (the user's turn)

4

0

31

Bogdan Ionut Cirstea

@BogdanIonutCir2

7 months

Spicy take: GPT-4-level open-source models (Llama 3?) will be a huge boon for AI safety research. Think of all the mech interp / activation steering on Sydney levels of 'craziness' / obvious misalignment.

8

0

30

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@robertskmiles tbf, I think it's also not-that-easy to know for sure an LLM can *robustly* do some particular thing

0

28

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

@DavidSKrueger Huh, this didn't feel at all like the consensus view in my interactions with the community. In particular, if #2 is part of the consensus view, then the relative neglect of work on automated AI safety R&D and evals seems even wilder to me. Also, having been at a forecasting

0

26

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 years

@JMannhart though, tbf, their clock being at minutes to midnight for decades was a giveaway

3

0

28

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

@JeffLadish this is stated wildly overconfidently and basically, AFAICT, with no argument to support it

2

0

27

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 years

@_drbruced

Nat McAleese

@__nmca__

2 years

Nonsense QA with no prompting is not an interesting failure of large language models. Any vaguely sensible prompt (like the one from the Gopher paper) greatly reduces it, indicating it will not be a hard problem to completely solve with RL etc.

4

15

113

1

0

27

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

Yes, so many theoretical and empirical results for this, e.g. . Many more alignment researchers should spend more time reading on the science of DL and less time on rehashed MIRI-esque LW vaguaries which have

The Parallelism Tradeoff: Limitations of Log-Precision Transformers

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question. We prove that transformers whose arithmetic...

arxiv.org

Nora Belrose

@norabelrose

11 months

@Simeon_Cps @BogdanIonutCir2 @SharmakeFarah14 @QuintinPope5 Deep learning has a strong bias for shallow circuits that don't depend much on each other. It's hard to learn a deep circuit that only pays off once it's 100% complete. It's just like Darwinian evolution. But this also means doing any kind of long-range planning inside a forward

4

3

40

0

4

27

Bogdan Ionut Cirstea

@BogdanIonutCir2

20 days

New AI slowdown policy just dropped

Nando de Freitas

@NandoDF

20 days

Hmmm, from what I see my colleagues in AI at Google London work bloody long ours and are extremely committed. This guy once came to London and told us to abandon Torch and use TensorFlow. That set the field of AI back by at least 6 months.

47

117

3K

4

2

26

Bogdan Ionut Cirstea

@BogdanIonutCir2

7 months

one day in Berkeley feels like one week in London feels like one month in Romania

3

1

26

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

Unfortunately, the public debate doesn't seem anywhere near conclusively won:

Stop talking about tomorrow’s AI doomsday when AI poses risks today

Nature - Talk of artificial intelligence destroying humanity plays into the tech companies’ agenda, and hinders effective regulation of the societal harms AI is causing right now.

www.nature.com

6

3

24

Bogdan Ionut Cirstea

@BogdanIonutCir2

5 months

@JacquesThibs @AkashWasil @ESYudkowsky those seem to me like pretty mild / reasonable-ish takes (though I wouldn't necesarily fully agree); and probably not even that far from what the current median alignment researcher believes

3

0

25

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@liron @ilyasut Ilya doesn't seem to say anything false here; he only claims the possibility of aligned systems and his example is actually a proof of concept of the possibility; also, many of the claims in this thread don't seem to me to have a helpful tone, including about Paul

3

1

25

Bogdan Ionut Cirstea

@BogdanIonutCir2

18 days

@davidmanheim Especially wild given that we're already seeing early glimpses of AI automating AI research, including e.g. ideation.

1

0

24

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@Simeon_Cps apparently, it's even very bad at tic-tac-toe, so seems like a clear yes?

5

0

24

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

Can we please stop treating Eliezer like some oracle of Delphi? I'm pretty sure this was *not* about RLHF in any meaningful way. This isn't the first time either, e.g. the mind gymnastics performed by some to defend posts like .

The Hidden Complexity of Wishes — LessWrong

(It has come to my attention that this article is currently being misrepresented as proof that I/MIRI previously advocated that it would be very diff…

www.lesswrong.com

Siméon

@Simeon_Cps

11 months

Eliezer dunking on RLHF back in 2007

8

1

36

4

2

23

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

(epistemic status: quick take) Browsing though EAG London attendees' profiles and seeing what seems like way too many people / orgs doing (I assume DC) evals. I expect a huge 'market downturn' on this, since I can hardly see how there would be so much demand for DC evals in a

3

0

23

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

@ESYudkowsky It seems true that LLMs don't seem to have been called out in advance by ~anyone, but unsupervised/self-supervised learning as "the way" had been called out in advance repeatedly by some of the biggest names in Deep Learning (including Bengio, Hinton and LeCun; including when RL

4

3

23

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 month

At these prices, you could e.g. filter all the estimated high-quality human text (50T tokens ) for 2.5M$. Could make conditional pretraining (from human preferences) even on all that text quite affordable

Pretraining Language Models with Human Preferences

Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable...

arxiv.org

Sully

@SullyOmarr

1 month

This is unreal. We're legitimately getting to a point where intelligence might be too cheap to meter Gemini flash will soon cost ~$0.05/1m tokens For reference, ~2 years ago gpt-3.5 was $0.06$/1k tokens In time we got 100x cheaper models that are 10x smarter

120

329

2K

1

2

22

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

@carad0 nah, something like superalignment + (apparently) being open in discourse about both benefits and risks seem pretty good plans (and probably better than overcomplicated 4D chess pivotal acts); rats can be wildly overconfident about how their plans are supposed to be better than

2

21

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

I appreciate and am very grateful for Eliezer's historical contribution to early alignment field building and research. But maybe consider that communication might not be your strong point if you need to rely on lack of punctuation to communicate humor? Also, perhaps MIRI

Eliezer Yudkowsky ⏹️

@ESYudkowsky

10 months

@halomancer1 The screenshot is of a shitpost, good sir. Observe the lack of punctuation on the second sentence. Also, that it was screenshotted to strip the context.

13

1

108

3

0

21

Bogdan Ionut Cirstea

@BogdanIonutCir2

24 days

So it begins

Jeff Clune

@jeffclune

24 days

Introducing The AI Scientist! 🧪🔬🔭It creates research ideas & experiments, any necessary code, runs experiments, plots & analyzes data, writes an ENTIRE science manuscript, & performs peer review! Then builds on "published" discoveries.Fully automated. A new era in science?🧵👇

24

65

457

3

1

21

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@ReflectiveAlt What's the price of everyone dying from rogue / misuse of AGI, engineered pandemics, etc.?

1

0

21

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

I think I've become significantly more worried about governance/cooperation over AI, even if we fully solved technical intent alignment

3

1

19

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@norabelrose

Sebastien Bubeck

@SebastienBubeck

1 year

@ShafronTom No, Trillion is meant as a figure of speech, it has nothing to do with GPT-4, on that slide I'm talking in the abstract.

5

3

70

1

0

20

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

@WallStreetSilv there is an obvious other reason to 'add someone like that': good cybersecurity to protect model weights, algorithmic secrets, etc.

5

0

21

Bogdan Ionut Cirstea

@BogdanIonutCir2

8 months

they also seem pretty interested in many of my alignment takes😍

Rafael Ruiz🔸

@RafaRuizdeLira

8 months

Happy to see sex bots interested in what I have to say about epistemic norms 😍

15

0

226

1

0

20

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@ylecun i guess it depends on how you choose to operationalize 'terrified of AGI'; in my book, spending 1/2 of one's time on alignment (Sutskever), signing the FLI letter (Bengio) or saying AI takeover isn't inconceivable (Hinton) are signs of taking the risks seriously

1

0

19

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 years

@davidasinclair @geochurch @PeterDiamandis @xprize Anti-aging x prize?

2

0

20

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 month

I think more in the safety community should be aware of these likely wrong calls in the past

1a3orn

@1a3orn

1 month

@vlad3ciobanu @RichardSocher @vlad3ciobanu This isn't true. They've tried to ban models trained with many ties less compute than GPT-4

2

0

22

3

2

20

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

You know AI alignment is going mainstream when Hinton discusses concerns of instrumental convergence and power-seeking: 'The scientist warned that this eventually might "create sub-goals like 'I need to get more power'".'

1

3

20

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

'Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document. Prior work has found that internal representations of LLMs encode one

Stat.ML Papers

@StatMLPapers

9 months

Deep de Finetti: Recovering Topic Distributions from Large Language Models. (arXiv:2312.14226v1 [])

0

5

53

1

3

19

Bogdan Ionut Cirstea

@BogdanIonutCir2

4 months

We should make plans for how to use similar amounts of compute for automated AI safety R&D. This is, in my view, both kind of obvious and wildly neglected. E.g. it seems plausible to me that very large parts of interpretability work could be automated soon using LM agents:

Connor Axiotes

@connoraxiotes

4 months

Obviously it depends what it’s spent on, but £1.5 billion of compute is a significant amount. Even to private labs. GPT-4 was trained for around $100-150 million. But how much of this goes to academia? The @AISafetyInst ? British start ups? Research orgs like @apolloaisafety ?

1

0

3

2

19

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

@littIeramblings very plausible, and also seems not-that-dissimilar to Leopold's model (and from my median, fwiw); the good news, from my pov, would be that i expect that development model to be among the easier worlds, w.r.t. technical alignment

1

0

19

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

@Abel_TorresM he's right in terms of how many facts LLMs 'know'/'can recall', they're obviously superhuman with less trainable parameters

3

0

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

@ylecun @tegmark @RishiSunak Yann, where is your confidence about scaling up LLMs certainly not leading to AGI/superintelligence coming from? Some theoretical results seem to me to be pointing the other way, e.g.

Auto-Regressive Next-Token Predictors are Universal Learners

Large language models display remarkable capabilities in logical and mathematical reasoning, allowing them to solve complex tasks. Interestingly, these abilities emerge in networks trained on the...

arxiv.org

4

2

18

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@giffmana as others have also pointed out, many at Anthropic, OpenAI, Google DeepMind, Conjecture, etc.

1

0

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

@davidad hmm, can you say more about why you expect them to be 'more controllable, interpretable, generalizable within-task, and have fewer emergent abilities'?

3

0

18

Bogdan Ionut Cirstea

@BogdanIonutCir2

4 months

@GaryMarcus @laurenepowell @sama it's not even god-damn midway through the year yet; and e.g. long contexts are already, arguaby, a qualitative change

2

0

18

Bogdan Ionut Cirstea

@BogdanIonutCir2

5 months

@QualyThe if you start reading the article, it gets worse :))

0

18

Bogdan Ionut Cirstea

@BogdanIonutCir2

6 months

potentially worrying w.r.t. situational awareness threat models

Alex Albert

@alexalbert__

6 months

Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of

580

2K

12K

3

1

18

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

Work with me on evaluating the promise of automating alignment research by applying to by December 1st

How promising is automating alignment research? (literature review)

How promising is automating alignment research? (literature review) Summary Automating alignment research is one approach to alignment that has gained much more visibility with the Open AI superali...

docs.google.com

0

1

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

11 months

@dwarkesh_sp @ShaneLegg timelines maybe (though he's been pretty public about that); DeepMind's alignment plans (hopefully in more details than what's public)

2

0

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

7 months

good prediction indeed

Eliezer Yudkowsky ⏹️

@ESYudkowsky

2 years

In 2-4 years, if we're still alive, anytime you see a video this beautiful, your first thought will be to wonder whether it's real or if the AI's prompt was "beautiful video of 15 different moth species flapping their wings, professional photography, 8k, trending on Twitter".

68

255

2K

1

3

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

I expect this to be a more plausible/likely x-risky scenario than many posted on LW

AI Notkilleveryoneism Memes ⏸️

@AISafetyMemes

9 months

1) Character AI already has over 20 million people spending 2 HOURS A DAY talking to AIs (aka fake people) 2) Sama said AIs will soon be superhuman at persuasion 3) Those superhuman persuaders will soon outnumber us 10000 to 1. And be hot. An AI takeover scenario: You can’t

178

216

1K

4

1

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@Simeon_Cps I don't get how I should interpret this post. You can likely find 'several top experts in AI safety & governance' with probabilities in pretty much any range (at least between 1-99%), so finding some with >75% p(doom) doesn't seem to me to warrant the (apparent implicit)

1

17

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

quick take: @CRSegerie 's should be required reading for ~anyone starting on AI safety (e.g. in the AGISF curriculum), especially if they're considering any model internals work (and of course even more so if they're specifically considering mech interp)

Against Almost Every Theory of Impact of Interpretability — LessWrong

Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edo…

www.lesswrong.com

3

2

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

I find it pretty wild that automating AI safety R&D, which seems to me like the best shot we currently have at solving the full superintelligence control/alignment problem, no longer seems to have any well-resourced, vocal, public backers (with the superalignment team disbanded).

3

1

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@ylecun @elonmusk this seems very overconfident on how low AI x-risk would be; e.g. 'The Precipice' puts x-risk from asteroid impact in a century at ~ 1 in a million; I find it pretty implausible than even the x-risk of misusing AI (supposing alignment is solved) is clearly < 1 in a million

3

0

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

unsure how to feel about this, rationally seems like a positive update, but emotionally feels a bit like there's now a 4 year deadline to solving superintelligence alignment

OpenAI

@OpenAI

1 year

We need new technical breakthroughs to steer and control AI systems much smarter than us. Our new Superalignment team aims to solve this problem within 4 years, and we’re dedicating 20% of the compute we've secured to date towards this problem. Join us!

518

741

4K

3

0

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

8 months

With decent progress on and continued progress on Redwood Research's agenda (e.g. , ), I'd be at >99% that a ~human-level automated alignment researcher could be built that would be safe to use massively (for

Untrusted smart models and trusted dumb models — LessWrong

[Ryan Greenblatt originally made this point to me a while ago, and we then developed a bunch of these ideas together. Thanks to Paul Christiano and a…

www.lesswrong.com

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

I think with a concerted effort we can very likely (> 90% probability) build AI capable of automating ~all human-level alignment research while also being incapable of doing non-trivial consequentialist reasoning in a single forward pass: . Related, there's

1

2

7

4

0

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 years

@cosmin_scaunasu @GiadaPistilli why do you believe that and how confident are you? please also notice that many others would disagree (hard) e.g.

6

0

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

8 months

@littIeramblings strictly speaking, no, and neither has anyone else:

No free lunch in search and optimization - Wikipedia

en.wikipedia.org

2

0

16

Bogdan Ionut Cirstea

@BogdanIonutCir2

12 days

This really highlights how the next 3 years might be very consequential w.r.t. AI risk and for humanity in general. We're probably gonna get 10000x FLOPe gains, and I wouldn't be too surprised if the gains were even larger, since I expect a lot of post-training automation

Peter Wildeford 🇺🇸🇬🇧🇫🇷

@peterwildeford

18 days

This is very roughly what I'm expecting to come in the next 5-6 years of AI development

20

41

334

0

3

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

4 months

'I think the research that was done by the Superalignment team should continue happen outside of OpenAI and, if governments have a lot of capital to allocate, they should figure out a way to provide compute to continue those efforts. Or maybe there's a better way forward. But I

Jacques

@JacquesThibs

4 months

I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their compute towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone. It seems

4

5

84

0

1

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

@Noahpinion I appreciate most of your takes, but this one is really bad; please do better on educating yourself on AI and x-risk, start with e.g. who signed this

Statement on AI Risk | CAIS

A statement jointly signed by a historic coalition of experts: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and...

www.safe.ai

3

0

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

16 days

quick take: I'd give 80% probability of TAI-capable systems by 2030, conditional on the 2e29 FLOP training run from , combined with the ML (especially post-training) automation I expect from systems in the shape of

4

1

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 month

The forward passes of current architectures are just too weak and pretraining doesn't incentivize it enough. I predict this will keep being the case as long as pretraining is where most capabilities come from (and probably at least until the 5e28 FLOP training runs 'data wall')

Séb Krier

@sebkrier

1 month

Really glad people are working on situational awareness evals. I think it's interesting that this is plausibly a very distinct capability from general knowledge, since performance on SAD was only weakly correlated with MMLU. I used to think that situational awareness would be a

5

10

72

2

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

one reason I'm skeptical of broad appeals to the (less informed) public when it comes to AI x-risk

Daniel Eth (yes, Eth is my actual last name)

@daniel_271828

1 year

With the increased focus on AI X-risk, the alignment community has to figure out how to prevent whatever the hell happened here from happening to us

14

7

136

2

1

15

Bogdan Ionut Cirstea

@BogdanIonutCir2

2 months

I wish I had come across this post much earlier, it's excellent; I wish even more it got more views from the typical LW crowd

Thoughts on sharing information about language model capabilities — LessWrong

Core claim I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, signifi…

www.lesswrong.com

4

1

14

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

Wishing more AI safety research agendas (and especially funders) took this potential timeline seriously

Leopold Aschenbrenner

@leopoldasch

3 months

AGI by 2027 is strikingly plausible. That doesn’t require believing in sci-fi; it just requires believing in straight lines on a graph.

681

228

2K

4

1

14

Bogdan Ionut Cirstea

@BogdanIonutCir2

3 months

oof, this might be a bit worrying w.r.t. AI safety, if OpenAI employees are so easy to hack

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

3 months

First they were hacking HuggingFace employees, now they've moved on to OpenAI employees for this scam lol

7

5

110

2

1

14

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

some bad news about DLK

AK

@_akhaliq

9 months

Challenges with unsupervised LLM knowledge discovery paper page: show that existing unsupervised methods on large language model (LLM) activations do not discover knowledge -- instead they seem to discover whatever feature of the activations is most

2

46

202

0

1

13

Bogdan Ionut Cirstea

@BogdanIonutCir2

6 months

nope; any amount of counterfactual steering you might be able to produce seems (even more) crucial

roon

@tszzl

6 months

things are accelerating. pretty much nothing needs to change course to achieve agi imo. worrying about timelines is idle anxiety, outside your control. you should be anxious about stupid mortal things instead. do your parents hate you? does your wife love you?

225

270

3K

1

0

13

Bogdan Ionut Cirstea

@BogdanIonutCir2

5 months

though the funding landscape seems to me to create artificial scarcity in terms of how many of those people actually get to be paid to work on AI safety

Dan Elton

@moreisdifferent

5 months

One reason I've reduced my p-doom the past ~2 years is the sheer number of people going into AI Safety. (MATS Program = ML Alignment and Theory Scholars Program ) (Deadline for MATS is tomorrow BTW! )

1

10

1

13

Bogdan Ionut Cirstea

@BogdanIonutCir2

10 months

I expect this to be < 1% likely to happen e.g. this decade: this is what you get when you combine the intractabilities of 'All AGI development outside MAGIC would be prohibited, via a global ban on AI development above a compute threshold.' and 'understandable and controllable AI

Andrea Miotti

@_andreamiotti

10 months

The AI Summit consensus is clear: it's time for international measures. Here is a concrete proposal. In our recent paper, @jasonhausenloy , Claire Dennis and I propose an international institution to address extinction risk from AI: MAGIC, a Multinational AGI Consortium.

103

21

114

4

2

13

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@MelMitchell1 this assumes the AI researchers don't know about human intelligence and doesn't point out that cognitive scientists might not know that much about machine intelligence

5

0

12

Bogdan Ionut Cirstea

@BogdanIonutCir2

1 year

@Simeon_Cps I'm very skeptical of this line of reasoning, especially the part about RL being safer than LLMs. Sandboxing and RL are orthogonal, it seems to me; same for removing different environment / training components, especially e.g. for RETRO-like architectures. Also, a very powerful

0

11

Bogdan Ionut Cirstea

@BogdanIonutCir2

9 months

I don't necessarily agree with all the points here, but I still think many of these are unfairly neglected/discounted considerations within the AI x-safety community

Nora Belrose

@norabelrose

9 months

Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.

82

98

607

0

1

11

Bogdan Ionut Cirstea

@BogdanIonutCir2

6 months