Dan Hendrycks Profile Banner
Dan Hendrycks Profile
Dan Hendrycks

@DanHendrycks

21,254
Followers
90
Following
190
Media
981
Statuses

• Director of the Center for AI Safety () • GELU/MMLU/MATH • PhD in AI from UC Berkeley

San Francisco
Joined August 2009
Don't wanna be here? Send us removal request.
Pinned Tweet
@DanHendrycks
Dan Hendrycks
1 year
Following the statement on AI extinction risks, many have called for further discussion of the challenges posed by AI and ideas on how to mitigate risk. Our new paper provides a detailed overview of catastrophic AI risks. Read it here: (🧵 below)
Tweet media one
31
154
478
@DanHendrycks
Dan Hendrycks
3 years
Can Transformers crack the coding interview? We collected 10,000 programming problems to find out. GPT-3 isn't very good, but new models like GPT-Neo are starting to be able to solve introductory coding challenges. paper: dataset:
Tweet media one
Tweet media two
Tweet media three
17
410
2K
@DanHendrycks
Dan Hendrycks
1 year
We just put out a statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6)
116
379
1K
@DanHendrycks
Dan Hendrycks
1 year
AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
25
219
1K
@DanHendrycks
Dan Hendrycks
25 days
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
222
147
996
@DanHendrycks
Dan Hendrycks
3 months
The UC Berkeley course I co-taught now has lecture videos available: With guest lectures from Nicholas Carlini, @JacobSteinhardt , @Eric_Wallace_ , @davidbau , and more Course site:
8
171
891
@DanHendrycks
Dan Hendrycks
1 year
Do models like GPT-4 behave safely when given the ability to act? We develop the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world. Paper:
Tweet media one
24
198
828
@DanHendrycks
Dan Hendrycks
2 months
NVIDIA gave us an AI pause. They rate limited OpenAI to create a neck-and-neck competition (OpenAI, xAI, Meta, Microsoft, etc.). For NVIDIA, each new competitor is another several billion in revenue. Because of this, we haven't seen a next-generation (>10^26 FLOP) model yet.
71
49
769
@DanHendrycks
Dan Hendrycks
1 year
I was able to voluntarily rewrite my belief system that I inherited from my low socioeconomic status, anti-gay, and highly religious upbringing. I don’t know why Yann’s attacking me for this and resorting to the genetic fallacy+ad hominem. Regardless, Yann thinks AIs "will
Tweet media one
@ylecun
Yann LeCun
1 year
As I have pointed out before, AI doomerism is a kind of apocalyptic cult. Why would its most vocal advocates come from ultra-religious families (that they broke away from because of science)?
188
116
939
50
60
744
@DanHendrycks
Dan Hendrycks
3 months
Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.
@natfriedman
Nat Friedman
4 months
We're gonna need some new benchmarks, fellas
Tweet media one
68
78
1K
29
27
702
@DanHendrycks
Dan Hendrycks
1 month
@elonmusk You're the best, Elon! TLDR of 1047: 1. If you don’t train a model with $100 million in compute, and don’t fine-tune a ($100m+) model with $10 million in compute (or rent out a very large compute cluster), this law does not apply to you. 2. “Critical harm” means $500 million in
Tweet media one
Tweet media two
Tweet media three
Tweet media four
53
66
681
@DanHendrycks
Dan Hendrycks
4 years
NLP for law is in its infancy due to a lack of training data. To address this, we created a large dataset for contract review. The dataset would have cost over $2,000,000 without volunteer legal experts. Paper: Reddit discussion:
Tweet media one
Tweet media two
Tweet media three
8
143
662
@DanHendrycks
Dan Hendrycks
4 years
To find the limits of Transformers, we collected 12,500 math problems. While a three-time IMO gold medalist got 90%, GPT-3 models got ~5%, with accuracy increasing slowly. If trends continue, ML models are far from achieving mathematical reasoning.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
11
110
652
@DanHendrycks
Dan Hendrycks
1 year
Hinton: “I think it’s quite conceivable that humanity is just a passing phase in the evolution of intelligence.”
25
120
624
@DanHendrycks
Dan Hendrycks
7 months
GPT-4 with simple engineering can predict the future around as well as crowds: On hard questions, it can do better than crowds. If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would
Tweet media one
Tweet media two
24
113
648
@DanHendrycks
Dan Hendrycks
4 months
As an alternative to RLHF and adversarial training, we released short-circuiting. It makes models ~100x more robust. It works for LLMs, multimodal models, and agents. Unlike before, I now think robustly stopping models from generating harmful outputs may be highly tractable and
Tweet media one
Tweet media two
27
97
638
@DanHendrycks
Dan Hendrycks
1 year
"The founder of effective accelerationism" and AI arms race advocate @BasedBeffJezos just backed out of tomorrow's debate with me. His intellectual defense for why we should build AI hastily is unfortunately based on predictable misunderstandings. I compile these errors below 🧵
26
70
610
@DanHendrycks
Dan Hendrycks
18 days
Have a question that is challenging for humans and AI? We ( @ai_risks + @scale_AI ) are launching Humanity's Last Exam, a massive collaboration to create the world's toughest AI benchmark. Submit a hard question and become a co-author. Best questions get part of $500,000 in
Tweet media one
Tweet media two
Tweet media three
45
103
626
@DanHendrycks
Dan Hendrycks
2 months
To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to
31
42
670
@DanHendrycks
Dan Hendrycks
7 months
Grok-1 is open sourced. Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose
@grok
Grok
7 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░
2K
2K
16K
20
50
386
@DanHendrycks
Dan Hendrycks
5 years
Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months. Paper: Dataset and code:
Tweet media one
10
163
510
@DanHendrycks
Dan Hendrycks
4 years
How multipurpose is #GPT3 ? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.
Tweet media one
Tweet media two
13
119
470
@DanHendrycks
Dan Hendrycks
8 months
Google has patented Transformers, dropout, etc. If they start to go under, what would happen if they began to sue everyone using their patented technology?
Tweet media one
46
57
483
@DanHendrycks
Dan Hendrycks
6 months
I got ~75% on a subset of MATH so it's basically as good as me at math.
@OpenAI
OpenAI
6 months
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source:
Tweet media one
658
1K
7K
11
15
399
@DanHendrycks
Dan Hendrycks
2 months
Now xAI is at the frontier
@xai
xAI
2 months
2K
2K
9K
11
53
361
@DanHendrycks
Dan Hendrycks
2 years
The NSF has now _$20 million_ in grants available for AI safety research! Happy to have helped make this possible. Deadline: May 26, 2023 For a broad overview of problems in safety, check out this paper:
7
68
352
@DanHendrycks
Dan Hendrycks
10 months
EA ≠ AI safety AI safety has outgrown the EA community The world will be safer with a broad range of people tackling many different AI risks
15
27
340
@DanHendrycks
Dan Hendrycks
2 years
Some impressions from using GPT-4 🧵
5
43
337
@DanHendrycks
Dan Hendrycks
2 years
More and more researchers think that building AIs smarter than us could pose existential risks. But what might these risks look like, and how can we manage them? We provide a guide to help analyze how research can reduce these risks. Paper: (🧵below)
Tweet media one
13
75
328
@DanHendrycks
Dan Hendrycks
1 month
In a landmark moment for AI safety, SB 1047 has passed the Assembly floor with a wide margin of support. We need commonsense safeguards to mitigate against critical AI risk—and SB 1047 is a workable path forward. @GavinNewsom should sign it into law.
63
33
316
@DanHendrycks
Dan Hendrycks
2 years
Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: 🧵
@andriy_mulyar
AndriyMulyar
2 years
Serious question: What does an NLP Ph.D student work on nowadays with the presence of closed source GPT models that beat anything you can do in standard academic lab? @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts
134
197
1K
5
72
307
@DanHendrycks
Dan Hendrycks
25 days
This is the prompt that does the heavy lifting
Tweet media one
14
21
306
@DanHendrycks
Dan Hendrycks
7 months
People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management. Possible trigger: AI might suddenly become viewed as the
44
44
295
@DanHendrycks
Dan Hendrycks
3 years
DeepMind's 230 billion parameter Gopher model sets a new state-of the-art on our benchmark of 57 knowledge areas. They also claim to have a supervised model that gets 63.4% on the benchmark's professional law task--in many states, that's accurate enough to pass the bar exam!
Tweet media one
Tweet media two
@GoogleDeepMind
Google DeepMind
3 years
Today we're releasing three new papers on large language models. This work offers a foundation for our future language research, especially in areas that will have a bearing on how models are evaluated and deployed: 1/
Tweet media one
12
311
1K
2
53
297
@DanHendrycks
Dan Hendrycks
1 year
I've become less concerned about AIs lying to humans/rogue AIs. More of my concern lies in * malicious use (like bioweapons) * collective action problems (like racing to replace people) We'll need adversarial robustness, compute governance, and international coordination.
@DanHendrycks
Dan Hendrycks
1 year
AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
25
219
1K
21
29
290
@DanHendrycks
Dan Hendrycks
10 months
Things that have most slowed down AI timelines/development: - reviewers, by favoring of cleverness and proofs over simplicity and performance - NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay - tensorflow
@sama
Sam Altman
1 year
agi delayed four days
304
863
11K
14
16
286
@DanHendrycks
Dan Hendrycks
5 months
Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.
Tweet media one
1
44
289
@DanHendrycks
Dan Hendrycks
1 year
Rich Sutton, author of the reinforcement learning textbook, alarming says "We are in the midst of a major step in the evolution of the planet" "succession to AI is inevitable" "they could displace us from existence" "it behooves us... to bow out" "we should not resist succession"
@RichardSSutton
Richard Sutton
1 year
We should prepare for, but not fear, the inevitable succession from humanity to AI, or so I argue in this talk pre-recorded for presentation at WAIC in Shanghai.
58
58
360
25
41
271
@DanHendrycks
Dan Hendrycks
22 days
Chemical, Biological, Radiological, and Nuclear (CBRN) weapon risks are "medium" for OpenAI's o1 preview model before they added safeguards. That's just the weaker preview model, not even their best model. GPT-4o was low risk, this is medium, and a transition to "high" risk might
Tweet media one
18
36
265
@DanHendrycks
Dan Hendrycks
8 months
To help make models more robust and defend against misuse, we created HarmBench, an evaluation framework for automated red teaming and testing the adversarial robustness of LLMs and multimodal models. 🌐 📝
Tweet media one
Tweet media two
Tweet media three
4
50
256
@DanHendrycks
Dan Hendrycks
1 year
AI systems can be deceptive. For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead. Our survey on AI deception is here:
Tweet media one
Tweet media two
Tweet media three
8
56
247
@DanHendrycks
Dan Hendrycks
2 years
As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: (🧵 below)
Tweet media one
Tweet media two
17
48
243
@DanHendrycks
Dan Hendrycks
7 months
Can hazardous knowledge be unlearned from LLMs without harming other capabilities? We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝 🔗
Tweet media one
Tweet media two
Tweet media three
Tweet media four
13
65
244
@DanHendrycks
Dan Hendrycks
3 years
How can we productively work toward creating safe machine learning models? After struggling with this question for the past several years, we have developed a new roadmap for ML safety. Post: Paper:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
53
239
@DanHendrycks
Dan Hendrycks
25 days
I think people have an aversion to admitting when AI systems are better than humans at a task, even when they're superior in terms of speed, accuracy, and cost. This might be a cognitive bias that doesn't yet have a name. This address this, we should clarify what we mean by
31
19
238
@DanHendrycks
Dan Hendrycks
2 years
@MetaAI This directly incentivizes researchers to build models that are skilled at deception.
7
13
220
@DanHendrycks
Dan Hendrycks
1 year
Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas: * external red teaming * interagency oversight commission * internal audit committees * external incident investigation team * safety research funding (🧵below)
@SenSchumer
Chuck Schumer
1 year
Today, I’m launching a major new first-of-its-kind effort on AI and American innovation leadership.
2K
614
4K
9
48
220
@DanHendrycks
Dan Hendrycks
20 days
Very soon
@DanHendrycks
Dan Hendrycks
3 months
Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.
29
27
702
6
8
214
@DanHendrycks
Dan Hendrycks
2 years
PixMix shows that augmenting images with fractals improves several robustness and uncertainty metrics simultaneously (corruptions, adversaries, prediction consistency, calibration, and anomaly detection). paper: code: #cvpr2022
Tweet media one
Tweet media two
Tweet media three
2
34
211
@DanHendrycks
Dan Hendrycks
1 year
4/ He thinks letting evolution run wild is a good thing, because "we shouldn't resist the will of the universe." However, this is simply the naturalistic fallacy: what is natural (disease, pain, exploitation) is not necessarily what is good.
Tweet media one
3
3
202
@DanHendrycks
Dan Hendrycks
11 days
More than 120 Hollywood actors, comedians, writers, directors, and producers are urging Governor @GavinNewsom to sign SB 1047 into law. Amazing to see such tremendous support! Signatories: JJ Abrams ( @jjabrams ) Acclaimed director and writer known for "Star Wars," "Star Trek,"
64
50
239
@DanHendrycks
Dan Hendrycks
9 months
- Meta, by open sourcing competitive models (e.g., Llama 3) they reduce AI orgs' revenue/valuations/ability to buy more GPUs and scale AI models
@DanHendrycks
Dan Hendrycks
10 months
Things that have most slowed down AI timelines/development: - reviewers, by favoring of cleverness and proofs over simplicity and performance - NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay - tensorflow
14
16
286
48
16
189
@DanHendrycks
Dan Hendrycks
2 months
New letter from @geoffreyhinton , Yoshua Bengio, Lawrence @Lessig , and Stuart Russell urging Gov. Newsom to sign SB 1047. “We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better
Tweet media one
Tweet media two
Tweet media three
12
34
191
@DanHendrycks
Dan Hendrycks
1 year
@BasedBeffJezos 2/ He argues that we should build AGI to colonize the cosmos ASAP because there is so much potential at stake. This cost-benefit analysis is wrong. For every year we delay building AGI, we lose a galaxy. However, if we go extinct in the process, we lose the entire cosmos. Cosmic
Tweet media one
8
6
187
@DanHendrycks
Dan Hendrycks
2 years
It knows many esoteric facts (e.g., the meaning of obscure songs, knows what area a researcher works in, can contrast ML optimizers like Adam vs AdamW like in a PhD oral exam, and so on). My rule-of-thumb is that "if it's on the internet 5 or more times, GPT-4 remembers it."
1
26
184
@DanHendrycks
Dan Hendrycks
2 years
We’ll be organizing a NeurIPS workshop on Machine Learning Safety! We'll have $50K in best papers awards. To encourage proactiveness about tail risks, we'll also have $50K in awards for papers that discuss their impact on long-term, long-tail risks.
Tweet media one
0
38
187
@DanHendrycks
Dan Hendrycks
2 months
SB 1047 has passed through the Appropriations Committee! It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below
Tweet media one
Tweet media two
Tweet media three
Tweet media four
12
17
186
@DanHendrycks
Dan Hendrycks
1 year
What can we actually do to reduce risks from AI? AI researchers Hinton, Bengio, Dawn Song, Pieter Abbeel, and others provide concrete proposals.
Tweet media one
9
44
172
@DanHendrycks
Dan Hendrycks
4 months
This is worth checking out. Minor criticisms: I think industry's "algorithmic secrets" are not a very natural leverage point to greatly restrict. FlashAttention, Quiet-STaR (q*), Mamba/SSMs, FineWeb, and so on are ideas and advances from outside industry. These advances will
@leopoldasch
Leopold Aschenbrenner
4 months
Virtually nobody is pricing in what's coming in AI. I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project. SITUATIONAL AWARENESS: The Decade Ahead
Tweet media one
Tweet media two
275
911
4K
10
4
180
@DanHendrycks
Dan Hendrycks
27 days
Three models remain unbroken in the Gray Swan jailbreaking competition (~500 registrants), which is still ongoing. These models are based on Circuit Breakers + other RepE techniques (, ).
Tweet media one
9
20
179
@DanHendrycks
Dan Hendrycks
11 months
Asimov's second law of robotics says that “a robot must obey the orders given it by human beings.” So can LLMs follow simple rules? Unfortunately, not reliably, as shown by our RuLES benchmark. 📄: 🛠️: 🌐:
Tweet media one
Tweet media two
8
30
176
@DanHendrycks
Dan Hendrycks
1 year
Now 2 out of 3 of the deep learning Turing Award winners are concerned about catastrophic risks from advanced AI. "He is worried that future versions of the technology pose a threat to humanity." "A part of him, he said, now regrets his life’s work."
Tweet media one
6
33
172
@DanHendrycks
Dan Hendrycks
2 months
@abcampbell I'd then have no income.
4
1
197
@DanHendrycks
Dan Hendrycks
3 months
@PirateWires This an obvious example of bad-faith "gotcha" journalism — Pirate Wires never even reached out for comment on a story entirely about me, and the article is full of misrepresentations and errors. For starters, I'm working on AI safety from multiple fronts: publishing technical
29
5
175
@DanHendrycks
Dan Hendrycks
1 year
AI is moving at a frenzied pace. Here are my thoughts on how the AI arms race and competitive pressures could lead to severe societal-scale risks:
9
39
164
@DanHendrycks
Dan Hendrycks
2 years
It certainly seems better at reasoning than ChatGPT 3.5. While this isn't a formal benchmark, showed a difference between the two models: 83 IQ for ChatGPT 3.5 96 IQ for GPT-4
7
24
164
@DanHendrycks
Dan Hendrycks
1 year
3/ He agrees that AI's development can be viewed as an evolutionary process. However, this is not a good thing. As I discuss here, natural selection favors AIs over humans, and this could lead to human extinction.
4
6
158
@DanHendrycks
Dan Hendrycks
25 days
OpenAI, xAI, Google, Anthropic, Meta, Amazon, Microsoft, and Mistral have made commitments to robust safety measures, similar to what SB 1047 asks for. The main difference with SB 1047? It's enforced.
Tweet media one
7
23
162
@DanHendrycks
Dan Hendrycks
7 months
Making a good benchmark may seem easy---just collect a dataset---but it requires getting multiple high-level design choices right. @Thomas_Woodside and I wrote a post on how to design good ML benchmarks:
4
20
153
@DanHendrycks
Dan Hendrycks
2 months
How can we prevent LLM safeguards from being simply removed with a few steps of fine-tuning? We show it's surprisingly possible to make progress on creating safeguards that are tamper-resistant, reducing malicious use risks of open-weight models. Paper:
Tweet media one
Tweet media two
Tweet media three
9
22
152
@DanHendrycks
Dan Hendrycks
4 months
A retrospective of Unsolved Problems in ML Safety. Unsolved Problems, written the summer of 2021, mentions ideas that were nascent or novel for their time. Here are a few: • Hazardous Capabilities Evals: In the monitoring section, we introduce the idea
6
14
150
@DanHendrycks
Dan Hendrycks
18 days
Lectures for the AI Safety, Ethics, and Society course are up. 1: Risks Overview 2: AI Fundamentals 3: ML Safety 4: Safety Engineering 5: Complex Systems 6: Beneficial AI 7: Collective Action Problems 8: Governance Course site:
3
32
151
@DanHendrycks
Dan Hendrycks
2 years
Can we use ML models to predict future world events? We create the Autocast forecasting benchmark to measure their prescience. ML models don't yet beat humans/prediction markets, but they are starting to have traction. Paper: Code:
Tweet media one
Tweet media two
Tweet media three
2
32
145
@DanHendrycks
Dan Hendrycks
1 year
As stated in the first sentence of the signatory page, there are many “important and urgent risks from AI,” not just the risk of extinction; for example, systemic bias, misinformation, malicious use, cyberattacks, and weaponization. These are all important risks that need to be
Tweet media one
7
23
141
@DanHendrycks
Dan Hendrycks
1 year
AI policy idea: do not automate nuclear command with AI While the military is increasingly using AI in command and control systems to address information overload (), the modernization effort should exclude the automation of nuclear command and control.
Tweet media one
11
22
143
@DanHendrycks
Dan Hendrycks
6 months
GPT-5 doesn't seem likely to be released this year. Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute. That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s. I doubt OpenAI has a 1 million GPU server ready.
15
7
142
@DanHendrycks
Dan Hendrycks
1 year
AI researchers from leading universities worldwide have signed the AI extinction statement, a situation reminiscent of atomic scientists issuing warnings about the very technologies they've created. As Robert Oppenheimer noted, “We knew the world would not be the same.” 🧵(2/6)
Tweet media one
3
29
142
@DanHendrycks
Dan Hendrycks
1 year
7/ He claims that we should let the free market entirely decide what AI should be like and there should be no regulation, since regulation is too "communist." However, when there are market failures, even libertarians agree government action can be necessary. There is an
2
5
135
@DanHendrycks
Dan Hendrycks
2 years
It's bad at copy editing. If you give it a paragraph to improve, it will suggest fixing typos that don't exist, or adding commas that are already present. Its poor ability to keep track of these low-level details might be explained by a sparse self-attention scheme.
5
4
132
@DanHendrycks
Dan Hendrycks
4 months
The Center for AI Safety Action Fund released a new poll today showing strong bipartisan support among likely California voters for Senator @Scott_Wiener ’s SB 1047. Results in images below and here:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
12
24
134
@DanHendrycks
Dan Hendrycks
4 years
What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.
Tweet media one
Tweet media two
Tweet media three
3
29
137
@DanHendrycks
Dan Hendrycks
25 days
The bot could have been called "Nate Gold," but I didn't get permission from @NateSilver538 in time; hence it's FiveThirtyNine instead
13
4
134
@DanHendrycks
Dan Hendrycks
1 year
6/ He seems eerily OK with AIs replacing us, but letting the AI ecosystem run wild is not necessarily good for AIs. - resources are wasted due to needless arms races (red-queen effect) - parasitical relations are common - the state of nature has excessive conflict and suffering
Tweet media one
3
2
130
@DanHendrycks
Dan Hendrycks
3 months
@polynoamial For one of them I want it to have questions that are harder than what humans can answer so that it can measure different levels of superintelligence.
4
3
129
@DanHendrycks
Dan Hendrycks
1 year
Yoshua Bengio just released "FAQ on Catastrophic AI Risks" It touches on the main themes of our recent paper "An Overview of Catastrophic AI Risks" and uses an efficient back-and-forth Q and A format to explain risks well
Tweet media one
6
31
126
@DanHendrycks
Dan Hendrycks
2 months
Somebody has been paying for blatantly false X ads about SB 1047. The "full shutdown" provision applies to models "controlled by a developer"---open weight models aren't. In the video, they also imply Google and Meta support the bill, but they do not.
Tweet media one
5
9
125
@DanHendrycks
Dan Hendrycks
1 year
Related: “The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom.” - Isaac Asimov “If we continue to accumulate only power and not wisdom, we will surely destroy ourselves.” - Carl Sagan
@tdietterich
Thomas G. Dietterich
1 year
Herbert Simon said "In the long run and on average, more knowledge is better than less." I guess we could view this as the credo of science. It is fundamentally a belief about humanity's ability to choose good over evil. As AI improves, do we still believe this?
30
21
94
10
12
120
@DanHendrycks
Dan Hendrycks
2 years
It appears the forecasters greatly underestimated language models, which now get ~50% on MATH.
Tweet media one
@alewkowycz
alewkowycz
2 years
Very excited to present Minerva🦉: a language model capable of solving mathematical questions using step-by-step natural language reasoning. Combining scale, data and others dramatically improves performance on the STEM benchmarks MATH and MMLU-STEM.
Tweet media one
104
1K
8K
6
15
120
@DanHendrycks
Dan Hendrycks
1 year
end/ Overall, I think the ideas underlying this version of effective accelerationism are interesting but, when they’re properly understood, point in the direction of safety rather than barreling ahead.
7
3
118
@DanHendrycks
Dan Hendrycks
1 year
"How Rogue AIs may Arise"---in which Yoshua Bengio describes risks from malicious actors, instrumental goals, and evolutionary pressures:
7
31
114
@DanHendrycks
Dan Hendrycks
7 months
Reminder that "Responsible Scaling Policies" are just non-binding proclamations and as such shouldn't be interpreted as a strong line of defense for safety. Voluntary commitments can be easily violated without much social blowback. For example, responsible AI teams have been
6
12
115
@DanHendrycks
Dan Hendrycks
25 days
Many AI company employees came out in favor of SB 1047 ( @GavinNewsom )
Tweet media one
Tweet media two
7
12
115
@DanHendrycks
Dan Hendrycks
2 years
Since it gets 86.4% on our MMLU benchmark, that suggests GPT-4.5 should be able to reach expert-level performance. GPT-2: Language Models are Unsupervised Multitask Learners GPT-3: Language Models are Few-Shot Learners GPT-4: Language Models are... Almost Omniscient
4
9
110
@DanHendrycks
Dan Hendrycks
3 years
I'm writing a newsletter to cover recent notable ML papers in robustness, monitoring, value alignment, and more.
0
15
111
@DanHendrycks
Dan Hendrycks
2 years
It's so good at few-shot learning that I expect the death of various types of text-based benchmarks. It is probably better at annotating than most MTurk and even Surge ($20/hr) annotators.
2
5
112