Nat McAleese @__nmca__ Twitter profile

Pinned Tweet

Nat McAleese

@__nmca__

5 months

LLMs are underrated.

4

9

91

Last Seen Profiles

@covedarby

@GltsB

@thirddowncheco

@LTmagOfficial

@r_y_m_a_s66

@Keitoart1

@stw_pdg

@KSUCROPS

@bokeplokalmalam

@TechsCommit

@TexasFootball

@sailingdev

@yemen775050

@bukapi_one

@MomokoSuda

@10FrmDa10

@el_duderino813

@bokeplokalmalam

@checkyour_dead

@UHN

@xkcdhatguyreal

@buwnomars

@azinteymourzade

@webudding

@arkivesmoon

@miguejays

@ApriliaRam44539

@NakouPenny

@ilgini_

@stw_pdg

@BasedYoona

@davidreg412

@jmcaljabr

@LeonFredyM

@FannyBu214

@JanethRubioXXX

Nat McAleese

@__nmca__

23 days

OpenAI works miracles, but we do also wrap a lot of things in bash while loops to work around periodic crashes.

56

49

1K

Nat McAleese

@__nmca__

5 months

you guys are going to get like 8x more than you expect.

77

70

746

Nat McAleese

@__nmca__

2 years

I can finally talk about Sparrow! My team used RL to train a 70 billion parameter model to be simultaneously safer and more helpful… (1/5)

5

41

400

Nat McAleese

@__nmca__

1 year

Made the move from DeepMind + London to OpenAI + SF. Going to miss the Sparrow team a lot, but very excited to see the sparks of AGI in person!

10

11

353

Nat McAleese

@__nmca__

3 months

As AI improves humans will need more and more help to monitor and control it. So my team at OpenAI have trained an AI that helps humans to evaluate AI! (1/5)

16

36

352

Nat McAleese

@__nmca__

18 days

if you haven’t noticed yet, o1 is wild but o1-mini is ludicrous

9

14

321

Nat McAleese

@__nmca__

2 months

Final superalignment paper! We define a multi-agent game; get LLMs to play it and show that it makes their reasoning more “legible” to humans 1/n

3

34

273

Nat McAleese

@__nmca__

18 days

It's a good model.

8

17

220

Nat McAleese

@__nmca__

18 days

go on, someone say it can’t reason

13

209

Nat McAleese

@__nmca__

2 years

Learn your classification task with 2x less data & better final accuracy via active learning in our new paper: . How does it work? (1/n)

7

32

206

Nat McAleese

@__nmca__

1 year

Excited that the cat is finally out of the bag! A few key things I’d like to highlight about OpenAI’s new super-alignment team... (1/n)

OpenAI

@OpenAI

1 year

We need new technical breakthroughs to steer and control AI systems much smarter than us. Our new Superalignment team aims to solve this problem within 4 years, and we’re dedicating 20% of the compute we've secured to date towards this problem. Join us!

516

737

4K

12

10

205

Nat McAleese

@__nmca__

3 years

Working on a 280 billion parameter language model has greatly reduced how long I think it will take to build AGI. Very excited we finally released the details of Gopher - awesome work from the team! @drjwrae @geoffreyirving

3

29

149

Nat McAleese

@__nmca__

3 years

We trained a huge language model to back up its claims with evidence! Super excited to be sharing GopherCite after a great collaboration with @jacobmenick @geoffreyirving @majatrebacz + many other awesome folk @DeepMind .

4

21

137

Nat McAleese

@__nmca__

5 months

academics: deep learning is hitting a wall the wall:

8

5

127

Nat McAleese

@__nmca__

6 months

Aaron Defazio

@aaron_defazio

6 months

Cooking up something special! Can't wait to get a paper out so everyone can try it out. An optimizer with no extra overhead, no additional parameters. Stay tuned!

14

30

398

2

3

121

Nat McAleese

@__nmca__

2 years

Nonsense QA with no prompting is not an interesting failure of large language models. Any vaguely sensible prompt (like the one from the Gopher paper) greatly reduces it, indicating it will not be a hard problem to completely solve with RL etc.

4

15

113

Nat McAleese

@__nmca__

10 months

the first superalignment paper is here!

Weak-to-strong generalization

We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak...

openai.com

2

12

97

Nat McAleese

@__nmca__

17 days

this, but unironically

Satya Nutella

@satyanutella_

18 days

27

19

897

3

96

Nat McAleese

@__nmca__

10 months

anthropic has some great multi-agent RL debate results:

Anthropic Fall 2023 Debate Progress Update — AI Alignment Forum

This is a research update on some work that I’ve been doing on Scalable Oversight at Anthropic, based on the original AI safety via debate proposal a…

www.alignmentforum.org

0

17

92

Nat McAleese

@__nmca__

1 month

Even on high school exams, no language model has ever scored more than 100%. Coincidence? I think not.

8

3

91

Nat McAleese

@__nmca__

10 months

Want to work on applied research for near-term safety of AI? It's pretty obvious you should do it for @lilianweng on OpenAI Safety Systems. They ship.

Safety & responsibility

Artificial general intelligence has the potential to benefit nearly every aspect of our lives—so it must be developed and deployed responsibly.

openai.com

3

5

90

Nat McAleese

@__nmca__

13 days

seeing o1 do this well on completely fresh international-competition level coding problems was an amazing moment. If you don’t agree that this novel reasoning then your definition of novel reasoning is broken 😅

Alexander Wei

@alexwei_

13 days

Evaluating o1 on the International Olympiad of Informatics was very personally meaningful to me. When I competed nine years ago, I never thought I'd be back—so soon—competing with an AI. To highlight how amazing this model is, we shared on Codeforces its best IOI submissions ⬇️

10

25

342

1

3

90

Nat McAleese

@__nmca__

9 months

@robertskmiles suppose scores are [100, 100, 1, 2, 3], idx 0 and 1 are by far one of the best, e.g. in the closely tied group that is far better than most of the distribution

4

0

81

Nat McAleese

@__nmca__

10 months

The Community Notes algorithm is quite simple, yet it seems to work fairly well. This is a reason to be optimistic about coordination technology.

3

8

76

Nat McAleese

@__nmca__

2 years

The new Bing safety-pilled an NYT journalist. This seems... maybe good on balance? I'm unsure.

2

1

72

Nat McAleese

@__nmca__

1 year

1) Yes, this is the notkilleveryoneism team. ( @AISafetyMemes ...)

2

3

74

Nat McAleese

@__nmca__

5 months

💔

Jan Leike

@janleike

5 months

I resigned

1K

902

10K

1

4

67

Nat McAleese

@__nmca__

10 days

o1 at work:

5

3

71

Nat McAleese

@__nmca__

4 months

the mandate of haveJan

Jan Leike

@janleike

4 months

I'm excited to join @AnthropicAI to continue the superalignment mission! My new team will work on scalable oversight, weak-to-strong generalization, and automated alignment research. If you're interested in joining, my dms are open.

371

525

9K

1

0

62

Nat McAleese

@__nmca__

9 days

people keep saying that the rollout and adoption of AGI is gonna take a lot of thought, software engineering and intelligence. oh boy do I have the technology for you!

3

68

Nat McAleese

@__nmca__

2 months

I wish the times were a bit less interesting.

1

3

60

Nat McAleese

@__nmca__

5 months

welcome to the world, gpt-4o

2

0

57

Nat McAleese

@__nmca__

2 years

Minerva and DeepNash are both surprising progress even against my short timelines. Much more so than Dalle2 was, having seen GLIDE - but around that GLIDE level of omg. Imagining 2030 is getting really hard.

2

6

63

Nat McAleese

@__nmca__

17 days

@DaveShapi run evals! AIME 2024 should be easy to replicate.

4

2

60

Nat McAleese

@__nmca__

1 year

4) Yes, superintelligence is a real danger.

2

1

52

Nat McAleese

@__nmca__

3 months

Read the paper to understand how: (3/5)

Finding GPT-4’s mistakes with GPT-4

CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF

openai.com

1

4

53

Nat McAleese

@__nmca__

6 months

@QEternity @swyx @karpathy @sgrove less than hiring Kaparthy

1

0

53

Nat McAleese

@__nmca__

1 year

2) Yes, 20% of all of OpenAI’s compute is a metric shit-ton of GPUs per person.

2

1

46

Nat McAleese

@__nmca__

1 year

1) Ben Hilton’s great summary, “Preventing AI-related catastrophe”

1

2

43

Nat McAleese

@__nmca__

2 years

@percyliang @NPCollapse the area of "scalable oversight" focuses in precisely this, see the work of myself & Geoffrey Irving & Sam Bowman & Ethan Perez & Jeffrey Wu & Jan Leike & many others. Sam's latest paper is excellent:

Measuring progress on scalable oversight for large language models

SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, A Askell, A Jones, A Chen…, arXiv preprint arXiv:2211.03540, 2022 - Cited by 70

scholar.google.com

0

42

Nat McAleese

@__nmca__

1 year

3) Yes, we’re hiring for both scientists and engineers right now (although at OAI, they're much the same!)

2

0

40

Nat McAleese

@__nmca__

3 months

We looked specifically at model-written code for almost all our evaluations, and we already see huge potential for GPT-4-class models to assist humans in RLHF labelling (2/5)

1

2

39

Nat McAleese

@__nmca__

2 years

BuT iT's jUsT SuRfAcE LeVEl sTaTiStIcS!

2

1

39

Nat McAleese

@__nmca__

2 months

the one thing we all agree on is that Noam Shazeer could definitely invert a binary tree on a whiteboard.

0

38

Nat McAleese

@__nmca__

2 years

by majority vote, the fathers of deep learning now declare agi safety to be a real issue:

Ian Hogarth

@soundboy

2 years

1/ Notable how three pioneers of deep learning ( recognised in their shared 2018 Turing award) have substantially diverged on how they assess risk from superintelligence:

36

277

2K

1

36

Nat McAleese

@__nmca__

1 year

3) Dan Hendryk’s “An Overview of Catastrophic AI Risks”

2

0

37

Nat McAleese

@__nmca__

1 year

Now is the time for progress on superintelligence alignment; this is why @ilyasut and @janleike are joining forces to lead the new super-effort. Join us!

4

0

36

Nat McAleese

@__nmca__

2 months

we forgot how to count that low

Brendan Dolan-Gavitt

@moyix

2 months

Sorry OpenAI is doing WHAT now?! Fine-tuning gpt-4o-mini is *free* for up to 2M tok/day??

65

170

3K

2

1

37

Nat McAleese

@__nmca__

1 year

2) Richard Ngo’s paper “The alignment problem from a deep learning perspective”

The Alignment Problem from a Deep Learning Perspective

In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities at many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn...

arxiv.org

1

36

Nat McAleese

@__nmca__

1 year

Alternatively, if you have no idea why folks are talking so seriously about risks from rogue AI (but you have a science or engineering background) here’s a super-alignment reading list…

2

36

Nat McAleese

@__nmca__

1 year

Believing that cars could change Earth's climate requires imagining hundreds of millions in circulation, an unrealistic scenario benefiting the auto industry's narrative. They're just pushing car hype!

2

3

36

Nat McAleese

@__nmca__

2 years

Overall it was an exciting and huge collaboration, the results of which you can now read: . Huge thanks to everyone involved, but particularly to the rest of the joint-first authors who made the thing work! @mia_glaese @majatrebacz @john_aslanides (5/5)

Building safer dialogue agents

In our latest paper, we introduce Sparrow – a dialogue agent that’s useful and reduces the risk of unsafe and inappropriate answers. Our agent is designed to talk with a user, answer questions,...

deepmind.google

2

0

34

Nat McAleese

@__nmca__

3 months

I’m tremendously excited for the future of human-machine teams in evaluation and training. If you want to work on this technology one of the best ways to do it is for @mia_glaese who runs human data here at OpenAI. They’re the best in the business! (4/5)

2

4

31

Nat McAleese

@__nmca__

1 year

So if you think that you can help solve the problem of how to control AI that is much, much smarter than humanity; now is the time to apply! ...

Introducing Superalignment

We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and...

openai.com

1

32

Nat McAleese

@__nmca__

9 months

2024 prediction: more AI progress than in 2023

0

3

27

Nat McAleese

@__nmca__

18 days

this, but unironically

Andrej Karpathy

@karpathy

18 days

o1-mini keeps refusing to try to solve the Riemann Hypothesis on my behalf. Model laziness continues to be a major issue sad ;p

321

502

10K

1

28

Nat McAleese

@__nmca__

1 year

4) Sam Bowman’s “Measuring Progress on Scalable Oversight for Large Language Models”

Measuring Progress on Scalable Oversight for Large Language Models

Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills...

arxiv.org

1

0

27

Nat McAleese

@__nmca__

5 months

academics: language models just memorize! language models:

1

27

Nat McAleese

@__nmca__

1 year

5) Geoffrey Irving's "AI Safety Via Debate"

AI safety via debate

To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during...

arxiv.org

1

0

26

Nat McAleese

@__nmca__

4 months

the anthropic principle makes this an especially solid bet

Matt Popovich

@mpopv

5 months

twenty years from now, i would bet ai x-risk will look a lot like y2k does now - nothing cataclysmic happened, so the common view is that it was a fake concern all along - the counterfactual risk was actually real though - it was only averted through a lot of human effort

38

32

444

1

0

25

Nat McAleese

@__nmca__

3 months

Apply to their team here: (5/5)

1

3

23

Nat McAleese

@__nmca__

1 year

Demis, Dario, Sam, Hinton, Bengio, Russell all signed the following statement:

3

2

21

Nat McAleese

@__nmca__

16 days

@DaveShapi you really need to some evals dude

1

0

23

Nat McAleese

@__nmca__

1 year

manifold on whether superalignment will succeed:

1

21

Nat McAleese

@__nmca__

2 years

We started by RL fine-tuning models to be more helpful, but that made the resulting policies much more exploitable when you try and trick them into bad behaviour. We had to jointly train for usefulness and safety to get better at both! (2/5)

1

2

22

Nat McAleese

@__nmca__

2 years

well I did not expect my day to include furiously checking ...

2

0

22

Nat McAleese

@__nmca__

2 months

The mandate of have(John)

0

21

Nat McAleese

@__nmca__

2 years

The description length of all models in the same transformer family is the same at initialisation, regardless of param count; it is this description length (& not after training) that is relevant to optimal compression. (1/n)

Michael Nielsen

@michael_nielsen

2 years

The "scale is all you need" hype around ever-larger language models is a striking inversion of our (usual) preference for making models simpler. Occam's Big Ball of Mud

20

11

137

2

1

21

Nat McAleese

@__nmca__

3 years

So happy to finally talk about Red Teaming! TL;DR even seemingly well behaved dialogue models fall down completely if you search hard enough for adversarial questions...

Google DeepMind

@GoogleDeepMind

3 years

Language models (LMs) can generate harmful text. New research shows that generating test cases ("red teaming") using another LM can help find and fix undesirable behaviour before impacting users. Read more: 1/

16

89

553

1

2

21

Nat McAleese

@__nmca__

1 year

And, for examples of solutions instead of problems:

1

0

20

Nat McAleese

@__nmca__

2 years

And the harm mitigations we used (rule classifiers and preference models) *don't* solve distributional bias problems - they just remove bad behaviours that you can see in a single sample (lots of detail in paper, 4/5)

2

1

19

Nat McAleese

@__nmca__

10 months

Collin is better at ML research than at speedcubing.

Sam Altman

@sama

10 months

i'd particularly like to recognize @CollinBurns4 for today's generalization result, who came to openai excited to pursue this vision and helped get the rest of the team excited about it!

178

157

3K

2

0

18

Nat McAleese

@__nmca__

2 years

if it actually is clippy that gets us all, the irony will confirm the simulation hypothesis

1

18

Nat McAleese

@__nmca__

2 months

As LLMs become capable of superhuman reasoning we need methods that let us understand why and how they reached their conclusions. Unfortunately the chain of thought that gets the best performance might not be the easiest for humans to understand — the “legibility gap”. 2/n

1

0

17

Nat McAleese

@__nmca__

2 years

Including the ability to search the internet really helps with factuality (although it's not a panacea)... (3/5)

1

0

17

Nat McAleese

@__nmca__

2 months

I’m thrilled the team were able to formally define a notion of legibility that behaves sensibly when approximately optimized and that the results generalize to humans. Amazing work by the two lead authors @cynnjjs and @janhkirchner and the rest of the team! 3/n

1

0

16

Nat McAleese

@__nmca__

2 months

blog post and paper now up at: 4/4

Prover-Verifier Games improve legibility of language model outputs

openai.com

0

17

Nat McAleese

@__nmca__

4 months

> "Our generation too easily takes for granted that we live in peace and freedom. And those who herald the age of AGI in SF too often ignore the elephant in the room: superintelligence is a matter of national security, and the United States must win." (1/2)

1

4

16

Nat McAleese

@__nmca__

1 year

I'm so happy we got this timeline:

0

1

16

Nat McAleese

@__nmca__

18 days

medium confidence: if you wouldn't qualify for USAMO, then o1 can probably accelerate your research, whatever the domain

1

0

15

Nat McAleese

@__nmca__

4 months

this is a much better joke than xkcd extrapolation

prerat

@prerationalist

4 months

speaking of drawing lines on a chart....... by 2050 the olympic 100m dash will be won by a human running on all fours

13

39

869

3

1

16

Nat McAleese

@__nmca__

10 months

agi is gonna be wild

Adam Azzam

@AAAzzam

10 months

fun story: terry tao was on both my and my brother's committee. he solved both our dissertation problems before we were done talking, each of us got "wouldn't it have been easier to...outline of entire proof" 🫠

70

339

6K

0

14

Nat McAleese

@__nmca__

8 months

the best python plotting library is very clearly @HKibirige 's plotnine. little known gem.

2

15

Nat McAleese

@__nmca__

15 days

moderate confidence: if you wouldn't qualify for USAMO, then o1 can probably accelerate your research, whatever the domain

Ethan Mollick

@emollick

15 days

The first author of this astrophysics paper found that if he gave o1-preview the methods section, it was able to reproduce 10 months of work coding he did as a PhD in 5 prompts (a few caveats in the video) Side note: all of your methods sections are becoming instruction manuals.

55

385

3K

0

15

Nat McAleese

@__nmca__

8 months

@sama "deep learning hitting a wall"

1

0

13

Nat McAleese

@__nmca__

4 months

Moravec’s Opportunity

1

14

Nat McAleese

@__nmca__

1 year

🤷

0

14

Nat McAleese

@__nmca__

2 years

Jack Rae has been on about this for ages, see his great recent Stanford lecture:

Compression for AGI - Jack Rae | Stanford MLSys #76

Episode 76 of the Stanford MLSys Seminar “Foundation Models Limited Series”!Speaker: Jack RaeTitle: Compression for AGIAbstract: In this talk we discuss how ...

www.youtube.com

0

1

13

Nat McAleese

@__nmca__

8 months

Yann Lecun proposes disdainful "DeTuring" award for those concerned with AI risk; forgets that indeed Turing invented AI risk.

Connor Leahy

@NPCollapse

8 months

I nominate Alan Turing for the first DeTuring Award.

61

156

1K

2

0

13

Nat McAleese

@__nmca__

7 months

the canonical eval setting should be one-(shot-without-cot)-cot pass @ 1, everything else has been a mistake.

2

1

12

Nat McAleese

@__nmca__

1 year

@slatestarcodex @zrkrlc I ran them:

Nat McAleese

@__nmca__

1 year

@slatestarcodex has a cute eval for image generation. How does dall-e-3 do? (1/n)

1

5

0

5

Nat McAleese

@__nmca__

2 years

I feel like the image understanding capabilities of GPT4 are currently underrated. API access or the evals paper are going to blow minds. (based only on assuming paper examples are not cherries; which was true of '3)

0

13

Nat McAleese

@__nmca__

10 months

did anyone see this future coming? have read a fair bit of sci-fi but this was absent!

0

13

Nat McAleese

@__nmca__

1 year

@slatestarcodex has a cute eval for image generation. How does dall-e-3 do? (1/n)

1

5

Nat McAleese

@__nmca__

8 months

positive sum status games are the ultimate social technology

Richard Ngo

@RichardMCNgo

8 months

Catchy quote, but Bernard Arnault just became the richest man on earth; are you sure you want to short fashion? Like it or not, when people gain material abundance, they mostly spend it on status. The real question is whether we can design status games that are positive-sum.

28

15

273

0

11

Nat McAleese

@__nmca__

2 years

we laughed at this originally but LLM @ int4 + inference hardware and batteries would fit, either now or in one generation's time. so we have evolved to the point of the internet in a box / hitchhiker's guide. exciting times.

1

0

12

Nat McAleese

@__nmca__

2 years

A fun collaboration with @IanOsband , @john_aslanides , @geoffreyirving and several more great but non-tweeting authors. Looking forward to more uncertainty estimates in future!

0

12

Nat McAleese

@__nmca__

11 months

#devday all tools all the time:

0

2

12

Nat McAleese

@__nmca__

2 years

How do we learn what will be informative? It helps to separate aleatoric & epistemic uncertainty. Ian argues you can do this with the joint distribution of your labels - and has a key paper on it, introducing EpiNets (3/n)

2

0

12