Dan Hendrycks @DanHendrycks Twitter profile

Pinned Tweet

Dan Hendrycks

1 year

Following the statement on AI extinction risks, many have called for further discussion of the challenges posed by AI and ideas on how to mitigate risk. Our new paper provides a detailed overview of catastrophic AI risks. Read it here: (🧵 below)

31

154

478

Last Seen Profiles

@parvathy__madhu

@wsbar_

@KeithBegley

@astralcenter

@iocl_do

@MarkeyBShawtiee

@arbitrary_guy

@corybryant20

@javon_perry06

@ASD76_

@SouthVascular

@FabioNegrin

@Bancor

@stwmaniax

@eceemdilaraa

@_taeng0309

@kasal_chandon

@amilmerchant

@ProfDrAliDeveci

@JOS01MAC

@nikoliazekter

@zweiter_pfosten

@saraymnn

@CapfinSA

@bluefairyshopph

@Jjmachinegun

@bayramyilmaz58

@batubozkan

@uode

@stwmaniax

@Visionsito

@bdra862

@MJRoboey

@waschhaus_weide

@thaskinterest

@Nilgun_OK

Dan Hendrycks

@DanHendrycks

3 years

Can Transformers crack the coding interview? We collected 10,000 programming problems to find out. GPT-3 isn't very good, but new models like GPT-Neo are starting to be able to solve introductory coding challenges. paper: dataset:

17

410

2K

Dan Hendrycks

@DanHendrycks

1 year

We just put out a statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6)

Statement on AI Risk | CAIS

A statement jointly signed by a historic coalition of experts: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and...

www.safe.ai

116

379

1K

Dan Hendrycks

@DanHendrycks

1 year

AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:

25

219

1K

Dan Hendrycks

@DanHendrycks

25 days

We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:

222

147

996

Dan Hendrycks

@DanHendrycks

1 year

@elonmusk @xai Excited to help advise on AI safety

xAI

xAI is an AI company with the mission of advancing scientific discovery and gaining a deeper understanding of our universe. Our first product is Grok - a conversational AI.

x.ai

47

70

838

Dan Hendrycks

@DanHendrycks

3 months

The UC Berkeley course I co-taught now has lecture videos available: With guest lectures from Nicholas Carlini, @JacobSteinhardt , @Eric_Wallace_ , @davidbau , and more Course site:

Understanding Large Language Models: Foundations and Safety

www.youtube.com

8

171

891

Dan Hendrycks

@DanHendrycks

1 year

Do models like GPT-4 behave safely when given the ability to act? We develop the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world. Paper:

24

198

828

Dan Hendrycks

@DanHendrycks

2 months

NVIDIA gave us an AI pause. They rate limited OpenAI to create a neck-and-neck competition (OpenAI, xAI, Meta, Microsoft, etc.). For NVIDIA, each new competitor is another several billion in revenue. Because of this, we haven't seen a next-generation (>10^26 FLOP) model yet.

71

49

769

Dan Hendrycks

@DanHendrycks

1 year

I was able to voluntarily rewrite my belief system that I inherited from my low socioeconomic status, anti-gay, and highly religious upbringing. I don’t know why Yann’s attacking me for this and resorting to the genetic fallacy+ad hominem. Regardless, Yann thinks AIs "will

Yann LeCun

@ylecun

1 year

As I have pointed out before, AI doomerism is a kind of apocalyptic cult. Why would its most vocal advocates come from ultra-religious families (that they broke away from because of science)?

188

116

939

50

60

744

Dan Hendrycks

@DanHendrycks

3 months

Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.

Nat Friedman

@natfriedman

4 months

We're gonna need some new benchmarks, fellas

68

78

1K

29

27

702

Dan Hendrycks

@DanHendrycks

1 month

@elonmusk You're the best, Elon! TLDR of 1047: 1. If you don’t train a model with $100 million in compute, and don’t fine-tune a ($100m+) model with $10 million in compute (or rent out a very large compute cluster), this law does not apply to you. 2. “Critical harm” means $500 million in

53

66

681

Dan Hendrycks

@DanHendrycks

4 years

NLP for law is in its infancy due to a lack of training data. To address this, we created a large dataset for contract review. The dataset would have cost over $2,000,000 without volunteer legal experts. Paper: Reddit discussion:

8

143

662

Dan Hendrycks

@DanHendrycks

4 years

To find the limits of Transformers, we collected 12,500 math problems. While a three-time IMO gold medalist got 90%, GPT-3 models got ~5%, with accuracy increasing slowly. If trends continue, ML models are far from achieving mathematical reasoning.

11

110

652

Dan Hendrycks

@DanHendrycks

1 year

Hinton: “I think it’s quite conceivable that humanity is just a passing phase in the evolution of intelligence.”

25

120

624

Dan Hendrycks

@DanHendrycks

7 months

GPT-4 with simple engineering can predict the future around as well as crowds: On hard questions, it can do better than crowds. If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would

24

113

648

Dan Hendrycks

@DanHendrycks

4 months

As an alternative to RLHF and adversarial training, we released short-circuiting. It makes models ~100x more robust. It works for LLMs, multimodal models, and agents. Unlike before, I now think robustly stopping models from generating harmful outputs may be highly tractable and

27

97

638

Dan Hendrycks

@DanHendrycks

1 year

"The founder of effective accelerationism" and AI arms race advocate @BasedBeffJezos just backed out of tomorrow's debate with me. His intellectual defense for why we should build AI hastily is unfortunately based on predictable misunderstandings. I compile these errors below 🧵

26

70

610

Dan Hendrycks

@DanHendrycks

18 days

Have a question that is challenging for humans and AI? We ( @ai_risks + @scale_AI ) are launching Humanity's Last Exam, a massive collaboration to create the world's toughest AI benchmark. Submit a hard question and become a co-author. Best questions get part of $500,000 in

45

103

626

Dan Hendrycks

@DanHendrycks

2 months

To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to

31

42

670

Dan Hendrycks

@DanHendrycks

7 months

Grok-1 is open sourced. Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose

Open Release of Grok-1

We are releasing the weights and architecture of our 314 billion parameter Mixture-of-Experts model Grok-1.

x.ai

Grok

@grok

7 months

@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░

2K

16K

20

50

386

Dan Hendrycks

@DanHendrycks

5 years

Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months. Paper: Dataset and code:

10

163

510

Dan Hendrycks

@DanHendrycks

4 years

How multipurpose is #GPT3 ? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.

13

119

470

Dan Hendrycks

@DanHendrycks

8 months

Google has patented Transformers, dropout, etc. If they start to go under, what would happen if they began to sue everyone using their patented technology?

46

57

483

Dan Hendrycks

@DanHendrycks

6 months

I got ~75% on a subset of MATH so it's basically as good as me at math.

OpenAI

@OpenAI

6 months

Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source:

658

1K

7K

11

15

399

Dan Hendrycks

@DanHendrycks

2 months

Now xAI is at the frontier

xAI

@xai

2 months

2K

9K

11

53

361

Dan Hendrycks

@DanHendrycks

2 years

The NSF has now _$20 million_ in grants available for AI safety research! Happy to have helped make this possible. Deadline: May 26, 2023 For a broad overview of problems in safety, check out this paper:

Unsolved Problems in ML Safety

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for...

arxiv.org

7

68

352

Dan Hendrycks

@DanHendrycks

10 months

EA ≠ AI safety AI safety has outgrown the EA community The world will be safer with a broad range of people tackling many different AI risks

15

27

340

Dan Hendrycks

@DanHendrycks

2 years

Some impressions from using GPT-4 🧵

5

43

337

Dan Hendrycks

@DanHendrycks

2 years

@anthrupad @ylecun @RichardMCNgo As it happens, my p(doom) > 80%, but it has been lower in the past. Two years ago it was ~20%. Some of my concerns about the AI arms race are outlined here:

Natural Selection Favors AIs over Humans

For billions of years, evolution has been the driving force behind the development of life, including humans. Evolution endowed humans with high intelligence, which allowed us to become one of the...

arxiv.org

8

25

338

Dan Hendrycks

@DanHendrycks

2 years

More and more researchers think that building AIs smarter than us could pose existential risks. But what might these risks look like, and how can we manage them? We provide a guide to help analyze how research can reduce these risks. Paper: (🧵below)

13

75

328

Dan Hendrycks

@DanHendrycks

1 month

In a landmark moment for AI safety, SB 1047 has passed the Assembly floor with a wide margin of support. We need commonsense safeguards to mitigate against critical AI risk—and SB 1047 is a workable path forward. @GavinNewsom should sign it into law.

63

33

316

Dan Hendrycks

@DanHendrycks

2 years

Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: 🧵

AndriyMulyar

@andriy_mulyar

2 years

Serious question: What does an NLP Ph.D student work on nowadays with the presence of closed source GPT models that beat anything you can do in standard academic lab? @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts

134

197

1K

5

72

307

Dan Hendrycks

@DanHendrycks

25 days

This is the prompt that does the heavy lifting

14

21

306

Dan Hendrycks

@DanHendrycks

7 months

People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management. Possible trigger: AI might suddenly become viewed as the

44

295

Dan Hendrycks

@DanHendrycks

3 years

DeepMind's 230 billion parameter Gopher model sets a new state-of the-art on our benchmark of 57 knowledge areas. They also claim to have a supervised model that gets 63.4% on the benchmark's professional law task--in many states, that's accurate enough to pass the bar exam!

Google DeepMind

@GoogleDeepMind

3 years

Today we're releasing three new papers on large language models. This work offers a foundation for our future language research, especially in areas that will have a bearing on how models are evaluated and deployed: 1/

12

311

1K

2

53

297

Dan Hendrycks

@DanHendrycks

1 year

I've become less concerned about AIs lying to humans/rogue AIs. More of my concern lies in * malicious use (like bioweapons) * collective action problems (like racing to replace people) We'll need adversarial robustness, compute governance, and international coordination.

Dan Hendrycks

@DanHendrycks

1 year

AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:

25

219

1K

21

29

290

Dan Hendrycks

@DanHendrycks

10 months

Things that have most slowed down AI timelines/development: - reviewers, by favoring of cleverness and proofs over simplicity and performance - NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay - tensorflow

Sam Altman

@sama

1 year

agi delayed four days

304

863

11K

14

16

286

Dan Hendrycks

@DanHendrycks

5 months

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.

1

44

289

Dan Hendrycks

@DanHendrycks

1 year

Rich Sutton, author of the reinforcement learning textbook, alarming says "We are in the midst of a major step in the evolution of the planet" "succession to AI is inevitable" "they could displace us from existence" "it behooves us... to bow out" "we should not resist succession"

Richard Sutton

@RichardSSutton

1 year

We should prepare for, but not fear, the inevitable succession from humanity to AI, or so I argue in this talk pre-recorded for presentation at WAIC in Shanghai.

58

360

25

41

271

Dan Hendrycks

@DanHendrycks

22 days

Chemical, Biological, Radiological, and Nuclear (CBRN) weapon risks are "medium" for OpenAI's o1 preview model before they added safeguards. That's just the weaker preview model, not even their best model. GPT-4o was low risk, this is medium, and a transition to "high" risk might

18

36

265

Dan Hendrycks

@DanHendrycks

8 months

To help make models more robust and defend against misuse, we created HarmBench, an evaluation framework for automated red teaming and testing the adversarial robustness of LLMs and multimodal models. 🌐 📝

4

50

256

Dan Hendrycks

@DanHendrycks

1 year

AI systems can be deceptive. For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead. Our survey on AI deception is here:

8

56

247

Dan Hendrycks

@DanHendrycks

2 years

As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: (🧵 below)

17

48

243

Dan Hendrycks

@DanHendrycks

7 months

Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝 🔗

13

65

244

Dan Hendrycks

@DanHendrycks

3 years

How can we productively work toward creating safe machine learning models? After struggling with this question for the past several years, we have developed a new roadmap for ML safety. Post: Paper:

2

53

239

Dan Hendrycks

@DanHendrycks

25 days

I think people have an aversion to admitting when AI systems are better than humans at a task, even when they're superior in terms of speed, accuracy, and cost. This might be a cognitive bias that doesn't yet have a name. This address this, we should clarify what we mean by

31

19

238

Dan Hendrycks

@DanHendrycks

2 years

@MetaAI This directly incentivizes researchers to build models that are skilled at deception.

7

13

220

Dan Hendrycks

@DanHendrycks

1 year

Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas: * external red teaming * interagency oversight commission * internal audit committees * external incident investigation team * safety research funding (🧵below)

Chuck Schumer

@SenSchumer

1 year

Today, I’m launching a major new first-of-its-kind effort on AI and American innovation leadership.

2K

614

4K

9

48

220

Dan Hendrycks

@DanHendrycks

20 days

Very soon

Dan Hendrycks

@DanHendrycks

3 months

Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.

29

27

702

6

8

214

Dan Hendrycks

@DanHendrycks

2 years

PixMix shows that augmenting images with fractals improves several robustness and uncertainty metrics simultaneously (corruptions, adversaries, prediction consistency, calibration, and anomaly detection). paper: code: #cvpr2022

2

34

211

Dan Hendrycks

@DanHendrycks

1 year

4/ He thinks letting evolution run wild is a good thing, because "we shouldn't resist the will of the universe." However, this is simply the naturalistic fallacy: what is natural (disease, pain, exploitation) is not necessarily what is good.

3

202

Dan Hendrycks

@DanHendrycks

11 days

More than 120 Hollywood actors, comedians, writers, directors, and producers are urging Governor @GavinNewsom to sign SB 1047 into law. Amazing to see such tremendous support! Signatories: JJ Abrams ( @jjabrams ) Acclaimed director and writer known for "Star Wars," "Star Trek,"

64

50

239

Dan Hendrycks

@DanHendrycks

9 months

- Meta, by open sourcing competitive models (e.g., Llama 3) they reduce AI orgs' revenue/valuations/ability to buy more GPUs and scale AI models

Dan Hendrycks

@DanHendrycks

10 months

Things that have most slowed down AI timelines/development: - reviewers, by favoring of cleverness and proofs over simplicity and performance - NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay - tensorflow

14

16

286

48

16

189

Dan Hendrycks

@DanHendrycks

2 months

New letter from @geoffreyhinton , Yoshua Bengio, Lawrence @Lessig , and Stuart Russell urging Gov. Newsom to sign SB 1047. “We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better

12

34

191

Dan Hendrycks

@DanHendrycks

1 year

@BasedBeffJezos 2/ He argues that we should build AGI to colonize the cosmos ASAP because there is so much potential at stake. This cost-benefit analysis is wrong. For every year we delay building AGI, we lose a galaxy. However, if we go extinct in the process, we lose the entire cosmos. Cosmic

8

6

187

Dan Hendrycks

@DanHendrycks

2 years

It knows many esoteric facts (e.g., the meaning of obscure songs, knows what area a researcher works in, can contrast ML optimizers like Adam vs AdamW like in a PhD oral exam, and so on). My rule-of-thumb is that "if it's on the internet 5 or more times, GPT-4 remembers it."

1

26

184

Dan Hendrycks

@DanHendrycks

2 years

We’ll be organizing a NeurIPS workshop on Machine Learning Safety! We'll have $50K in best papers awards. To encourage proactiveness about tail risks, we'll also have $50K in awards for papers that discuss their impact on long-term, long-tail risks.

0

38

187

Dan Hendrycks

@DanHendrycks

2 months

SB 1047 has passed through the Appropriations Committee! It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below

12

17

186

Dan Hendrycks

@DanHendrycks

1 year

What can we actually do to reduce risks from AI? AI researchers Hinton, Bengio, Dawn Song, Pieter Abbeel, and others provide concrete proposals.

9

44

172

Dan Hendrycks

@DanHendrycks

4 months

This is worth checking out. Minor criticisms: I think industry's "algorithmic secrets" are not a very natural leverage point to greatly restrict. FlashAttention, Quiet-STaR (q*), Mamba/SSMs, FineWeb, and so on are ideas and advances from outside industry. These advances will

Leopold Aschenbrenner

@leopoldasch

4 months

Virtually nobody is pricing in what's coming in AI. I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project. SITUATIONAL AWARENESS: The Decade Ahead

275

911

4K

10

4

180

Dan Hendrycks

@DanHendrycks

27 days

Three models remain unbroken in the Gray Swan jailbreaking competition (~500 registrants), which is still ongoing. These models are based on Circuit Breakers + other RepE techniques (, ).

9

20

179

Dan Hendrycks

@DanHendrycks

11 months

Asimov's second law of robotics says that “a robot must obey the orders given it by human beings.” So can LLMs follow simple rules? Unfortunately, not reliably, as shown by our RuLES benchmark. 📄: 🛠️: 🌐:

8

30

176

Dan Hendrycks

@DanHendrycks

1 year

Now 2 out of 3 of the deep learning Turing Award winners are concerned about catastrophic risks from advanced AI. "He is worried that future versions of the technology pose a threat to humanity." "A part of him, he said, now regrets his life’s work."

6

33

172

Dan Hendrycks

@DanHendrycks

2 months

@abcampbell I'd then have no income.

4

1

197

Dan Hendrycks

@DanHendrycks

3 months

@PirateWires This an obvious example of bad-faith "gotcha" journalism — Pirate Wires never even reached out for comment on a story entirely about me, and the article is full of misrepresentations and errors. For starters, I'm working on AI safety from multiple fronts: publishing technical

29

5

175

Dan Hendrycks

@DanHendrycks

1 year

AI is moving at a frenzied pace. Here are my thoughts on how the AI arms race and competitive pressures could lead to severe societal-scale risks:

The Darwinian Argument for Worrying About AI

As the autonomy of AIs increases, so will their control over the key decisions that influence society.

time.com

9

39

164

Dan Hendrycks

@DanHendrycks

2 years

It certainly seems better at reasoning than ChatGPT 3.5. While this isn't a formal benchmark, showed a difference between the two models: 83 IQ for ChatGPT 3.5 96 IQ for GPT-4

7

24

164

Dan Hendrycks

@DanHendrycks

1 year

Excited to be in the TIME100 AI along with many others including @janleike @ilyasut @sama @alexandr_wang @ericschmidt

The 100 Most Influential People in AI 2023

Here’s who made the 2023 TIME100 AI list of the most influential people in artificial intelligence.

time.com

12

7

157

Dan Hendrycks

@DanHendrycks

1 year

3/ He agrees that AI's development can be viewed as an evolutionary process. However, this is not a good thing. As I discuss here, natural selection favors AIs over humans, and this could lead to human extinction.

Natural Selection Favors AIs over Humans

For billions of years, evolution has been the driving force behind the development of life, including humans. Evolution endowed humans with high intelligence, which allowed us to become one of the...

arxiv.org

4

6

158

Dan Hendrycks

@DanHendrycks

25 days

OpenAI, xAI, Google, Anthropic, Meta, Amazon, Microsoft, and Mistral have made commitments to robust safety measures, similar to what SB 1047 asks for. The main difference with SB 1047? It's enforced.

7

23

162

Dan Hendrycks

@DanHendrycks

7 months

Making a good benchmark may seem easy---just collect a dataset---but it requires getting multiple high-level design choices right. @Thomas_Woodside and I wrote a post on how to design good ML benchmarks:

4

20

153

Dan Hendrycks

@DanHendrycks

2 months

How can we prevent LLM safeguards from being simply removed with a few steps of fine-tuning? We show it's surprisingly possible to make progress on creating safeguards that are tamper-resistant, reducing malicious use risks of open-weight models. Paper:

9

22

152

Dan Hendrycks

@DanHendrycks

4 months

A retrospective of Unsolved Problems in ML Safety. Unsolved Problems, written the summer of 2021, mentions ideas that were nascent or novel for their time. Here are a few: • Hazardous Capabilities Evals: In the monitoring section, we introduce the idea

6

14

150

Dan Hendrycks

@DanHendrycks

18 days

Lectures for the AI Safety, Ethics, and Society course are up. 1: Risks Overview 2: AI Fundamentals 3: ML Safety 4: Safety Engineering 5: Complex Systems 6: Beneficial AI 7: Collective Action Problems 8: Governance Course site:

AI Safety, Ethics, & Society

Rapid progress in AI raises questions about how the deployment of advanced AI systems will shape society, both for better and for worse. How do we ensure tha...

www.youtube.com

3

32

151

Dan Hendrycks

@DanHendrycks

2 years

Can we use ML models to predict future world events? We create the Autocast forecasting benchmark to measure their prescience. ML models don't yet beat humans/prediction markets, but they are starting to have traction. Paper: Code:

2

32

145

Dan Hendrycks

@DanHendrycks

1 year

As stated in the first sentence of the signatory page, there are many “important and urgent risks from AI,” not just the risk of extinction; for example, systemic bias, misinformation, malicious use, cyberattacks, and weaponization. These are all important risks that need to be

7

23

141

Dan Hendrycks

@DanHendrycks

1 year

AI policy idea: do not automate nuclear command with AI While the military is increasingly using AI in command and control systems to address information overload (), the modernization effort should exclude the automation of nuclear command and control.

11

22

143

Dan Hendrycks

@DanHendrycks

6 months

GPT-5 doesn't seem likely to be released this year. Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute. That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s. I doubt OpenAI has a 1 million GPU server ready.

15

7

142

Dan Hendrycks

@DanHendrycks

1 year

AI researchers from leading universities worldwide have signed the AI extinction statement, a situation reminiscent of atomic scientists issuing warnings about the very technologies they've created. As Robert Oppenheimer noted, “We knew the world would not be the same.” 🧵(2/6)

3

29

142

Dan Hendrycks

@DanHendrycks

1 year

7/ He claims that we should let the free market entirely decide what AI should be like and there should be no regulation, since regulation is too "communist." However, when there are market failures, even libertarians agree government action can be necessary. There is an

2

5

135

Dan Hendrycks

@DanHendrycks

2 years

It's bad at copy editing. If you give it a paragraph to improve, it will suggest fixing typos that don't exist, or adding commas that are already present. Its poor ability to keep track of these low-level details might be explained by a sparse self-attention scheme.

5

4

132

Dan Hendrycks

@DanHendrycks

4 months

The Center for AI Safety Action Fund released a new poll today showing strong bipartisan support among likely California voters for Senator @Scott_Wiener ’s SB 1047. Results in images below and here:

12

24

134

Dan Hendrycks

@DanHendrycks

4 years

What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.

3

29

137

Dan Hendrycks

@DanHendrycks

25 days

The bot could have been called "Nate Gold," but I didn't get permission from @NateSilver538 in time; hence it's FiveThirtyNine instead

13

4

134

Dan Hendrycks

@DanHendrycks

1 year

6/ He seems eerily OK with AIs replacing us, but letting the AI ecosystem run wild is not necessarily good for AIs. - resources are wasted due to needless arms races (red-queen effect) - parasitical relations are common - the state of nature has excessive conflict and suffering

3

2

130

Dan Hendrycks

@DanHendrycks

3 months

@polynoamial For one of them I want it to have questions that are harder than what humans can answer so that it can measure different levels of superintelligence.

4

3

129

Dan Hendrycks

@DanHendrycks

1 year

Yoshua Bengio just released "FAQ on Catastrophic AI Risks" It touches on the main themes of our recent paper "An Overview of Catastrophic AI Risks" and uses an efficient back-and-forth Q and A format to explain risks well

6

31

126

Dan Hendrycks

@DanHendrycks

1 year

Glad to see @andyzou_jiaming (my undergraduate mentee) come up with the first automatic large language model attack that really works:

Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots (Published 2023)

A new report indicates that the guardrails for widely used chatbots can be thwarted, leading to an increasingly unpredictable environment for the technology.

www.nytimes.com

4

20

122

Dan Hendrycks

@DanHendrycks

2 months

Somebody has been paying for blatantly false X ads about SB 1047. The "full shutdown" provision applies to models "controlled by a developer"---open weight models aren't. In the video, they also imply Google and Meta support the bill, but they do not.

5

9

125

Dan Hendrycks

@DanHendrycks

1 year

Related: “The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom.” - Isaac Asimov “If we continue to accumulate only power and not wisdom, we will surely destroy ourselves.” - Carl Sagan

Thomas G. Dietterich

@tdietterich

1 year

Herbert Simon said "In the long run and on average, more knowledge is better than less." I guess we could view this as the credo of science. It is fundamentally a belief about humanity's ability to choose good over evil. As AI improves, do we still believe this?

30

21

94

10

12

120

Dan Hendrycks

@DanHendrycks

2 years

It appears the forecasters greatly underestimated language models, which now get ~50% on MATH.

alewkowycz

@alewkowycz

2 years

Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.

104

1K

8K

6

15

120

Dan Hendrycks

@DanHendrycks

1 year

end/ Overall, I think the ideas underlying this version of effective accelerationism are interesting but, when they’re properly understood, point in the direction of safety rather than barreling ahead.

7

3

118

Dan Hendrycks

@DanHendrycks

1 year

"How Rogue AIs may Arise"---in which Yoshua Bengio describes risks from malicious actors, instrumental goals, and evolutionary pressures:

How Rogue AIs may Arise - Yoshua Bengio

yoshuabengio.org

7

31

114

Dan Hendrycks

@DanHendrycks

7 months

Reminder that "Responsible Scaling Policies" are just non-binding proclamations and as such shouldn't be interpreted as a strong line of defense for safety. Voluntary commitments can be easily violated without much social blowback. For example, responsible AI teams have been

6

12

115

Dan Hendrycks

@DanHendrycks

25 days

Many AI company employees came out in favor of SB 1047 ( @GavinNewsom )

7

12

115

Dan Hendrycks

@DanHendrycks

2 years

Since it gets 86.4% on our MMLU benchmark, that suggests GPT-4.5 should be able to reach expert-level performance. GPT-2: Language Models are Unsupervised Multitask Learners GPT-3: Language Models are Few-Shot Learners GPT-4: Language Models are... Almost Omniscient

4

9

110

Dan Hendrycks

@DanHendrycks

3 years

I'm writing a newsletter to cover recent notable ML papers in robustness, monitoring, value alignment, and more.

ML Safety Newsletter #2

Adversarial Training, Feature Visualization, and Machine Ethics

newsletter.mlsafety.org

0

15

111

Dan Hendrycks

@DanHendrycks

2 years

It's so good at few-shot learning that I expect the death of various types of text-based benchmarks. It is probably better at annotating than most MTurk and even Surge ($20/hr) annotators.

2

5

112