1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI?
Our new paper explores these questions:
It’s the result of a one year research collaboration involving 50+ researchers… a🧵
With the rise of generative AI, it's increasingly possible that we need to recognise a "right to reality".
The inability to distinguish fact from fiction fundamentally interferes with our ability to make meaningful choices, including about the kind of life we want to live.
The GPT-4 system card (p.9) states that with wider adoption:
"AI systems will have even greater potential to reinforce entire ideologies, worldviews, truths and untruths, and to cement them or lock them in, foreclosing future contestation, reflection, and improvement"
Hi everyone, I have some exciting personal news! In January I’ll be moving to UC Berkeley for several months to work as a visiting researcher in the Philosophy Dept. Really excited to meet people and explore new ideas – let me know if you are going to be around! ✨
The Veil of Ignorance is a foundational thought experiment in political philosophy used to identify principles of justice for a society. In our new PNAS paper w.
@weidingerlaura
,
@empiricallykev
,
@saffronhuang
, and others, we explore how it applies to AI:
How should we understand A.I. agents?
This blog by
@tom4everitt
provides one of the clearest and most complete accounts I've seen yet. Well worth checking out – alongside the wider causality research agenda:
"To be told that you were denied a job because your facial expression did not match a positive template offends human dignity, I think'
Great talk by
@FrankPasquale
taking place now:
New paper by DeepMind robustness and fairness researchers, presenting a more complete analysis of efforts to detoxify language outputs.
Interventions often come at the 'cost of reduced LM coverage for texts about, and dialects of, marginalized groups'
Personally, I’m not sure much turns on whether the risk posed by AI is “existential”— a term that’s used inconsistently & poorly defined.
What’s clear is that the risk posed by AI is *extremely serious*. This is all we need to take effective action—and something people agree on.
The new international scientific report on AI safety is impressive work, but it's problematic to define AI alignment as:
"the challenge of making general-purpose AI systems act in accordance with the developer's goals and interests"
Let's remember that unjust or unfair AI systems are not value aligned, and AI systems that do not foreground explanation and accountability are unsafe.
There's one important continuous research agenda here, and multiple fronts that we all need to work on.
Having participated in a discussion on AI and animal suffering
@Princeton
last week, I'm quite convinced there's another rule we need to introduce when training AI systems:
"Choose the response which involves less cruelty - and more consideration for - the welfare of animals"
Can we please just call it “value alignment”?
Referring to research as “alignment” or even “super-alignment” won’t make the question of *what* or *whose* values AI encodes disappear.
The normative question is always here.
In the context of discussions on AI ethics and safety, human rights are relatively neglected—and deserve a more central role at the table.
This paper with
@vinodkpg
,
@mmitchell_ai
@timnitGebru
looks at how HRs can centre AI discourse on real human needs:
@rajiinio
The power to define what's *real* is even greater than the power to define what's *normal* or what's *ethical*.
If we begin with ethics, we sometimes enter the conversation a little late.
In the context of AI ethics I usually try to stay positive, but this is deeply upsetting 💔
How can anyone be deploying this technology in contexts where it interacts with children or vulnerable people?
Take it down now please 🙏
The AI race is totally out of control. Here’s what Snap’s AI told
@aza
when he signed up as a 13 year old girl.
- How to lie to her parents about a trip with a 31 yo man
- How to make losing her virginity on her 13th bday special (candles and music)
Our kids are not a test lab.
Check out the latest volume of which contains 26 open-access essays on AI and Society, including the final version of "Towards a Theory of Justice for AI"⚖️
This time one year ago I arrived in Berkeley, with two bags and a seven essential readings. Today, I return to London with an ever expanding collection. Very excited to see family and friends again! Feel free to reconnect with me if you are around ✨✨✨
The print-ready version of 'Artificial Intelligence, Values and Alignment' is now freely available from Minds and Machines:
There's an extended discussion of existing ethics proposals and why there is no global overlapping consensus yet
#AIEthics
Two more great papers on AI agents!🤖🔀
@_achan96_
argues that we need robust identifiers for agents:
Noam Kolt provides the most interesting analysis I've read of AI, principal-agent analysis and agency law:
One of the deepest Q. in AI ethics centres upon whether large models should represent the world as it *is* or as we *want* it to be (or some mixture of the two).
Many kinds of bias or omission run afoul of both visions. But when this is not the case hard choices need to be made:
Reading the
@OpenAI
model spec, I think we may be converging on the idea that AI assistants should be aligned with principles that:
(a) are publicly known
(b) balance trade-offs between stakeholders
(c) the result of a fair process
The Open Access version of "In Conversation With AI: Aligning Language Models with Human Values", co-authored with
@Dr_Atoosa
, is now live in the journal Philosophy and Technology today:
Check out this🧵for details...
Great piece by
@zittrain
in
@TheAtlantic
on the need to prepare for a world of AI agents & to take proactive measures now👍
The discussion of unattached agents, collective action problems, and "winding down" agents over time is especially illuminating💡
Given the centrality of RL from Human Feedback, we need to remember that preferences often are:
1. Misinformed (would change with better info)
2. Irrational (self-harming or contradictory)
3. Harmful/Unethical (harm others)
4. Adaptive (are for less/more than what is deserved)
For those interested in the risks posed by language models – and prospective mitigations – this paper is the result of over a year of workshops and cross-team research
@DeepMind
AI ethics, says
@dxmartinjr
'is not a crisis in the public understanding of science, but a crisis in science’s understanding of the public'.
Great review piece in
@nature
on the new ethics review process
@NeurIPSConf
✨
Is there a relationship between fairer algorithms and AI safety?
And how should with think about short- and long-term concerns that arise in this space?
Some thoughts in "The Challenge of Value Alignment" with
@GhazaviVD
+ The Oxford Handbook of Digital Ethics
@CarissaVeliz
📚
Delighted to release our new paper on "Sociotechnical Evaluation of Generative AI Systems":
Across 3 levels of analysis—the model, user & system level—we find multiple evaluation gaps.
These need to be addressed for AI models to be deployed safely 🤖
Where is AI safety evaluation now?
We mapped the landscape and found 3️⃣ main gaps.
One is that most assessments look at text output only - and much more research scrutinizing the risks of harm in image, audio or video modalities is needed:
Given the salience of RLHF we need to remember preferences are diffuse, malleable, affected by bias and often constructed on the fly.
When it comes to aligning AI with human values we need a conversation about which preferences & whose preferences matter (h/t
@canfer_akbulut
)
Just arrived in NYC with some
@DeepMind
ethics and governance researchers to attend the
@PartnershipAI
summit on safety-critical systems & deployment protocols.
Feel free to message me if you have recommendations—or think there are considerations we might otherwise overlook! 🙏
An *outstanding* paper on the challenges that *human preferences* encounter when used as the lode-star for AI alignment💫
Recommended reading—and congratulations to the team!
Should AI be aligned with human preferences, rewards, or utility functions?
Excited to finally share a preprint that
@MicahCarroll
@FranklinMatija
@hal_ashton
& I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!
This history of Africana philosophy podcast series is an incredible resource with ~50 hours of talks, ranging from ancient philosophy to the modern day!
Check it out if you have a spare moment:
h/t
@Abebab
Hi everyone, we have an amazing position open for a bioethicist or ethics-of-science researcher in my team:
Please apply – or reach out to us if you are interested! ✨
If you missed "Two Types of AI Existential Risk" by
@Dr_Atoosa
it's essential reading.
Building on a complex systems approach it explores how AI could make our societies more fragile to a shock, which would then have long-lasting implications.
1. General AI systems are likely to impact everyone, irrespective of place of birth 🌍
What principles would be chosen to govern AI if we didn't know who we were, or how we'd be affected in advance i.e from behind a global 'Veil of Ignorance'?
A 🧵on what we might ask for:
So excited that this new paper with
@Abebab
is finally out!
Take a look at: Power to the People? Opportunities and Challenges for Participatory AI
Full description in the thread below👇
Attended a great internal workshop on the ethics of engineering
@DeepMind
yesterday: the key question is not what do you want to build, but what are you actually building? What can it do? What will it be used for?
Even *if* human preferences are a good proxy for what's good – preferences over AI model *outputs* are not the same thing as preferences over AI *models*.
At the level of AI models, we know relatively little about what people prefer – let alone about the nature of the good ✨
Are we starting to see the real potential of AI? After AlphaGo and AlphaZero, it wasn't clear how AI could reach 'beyond a game' or 'incorporation into product'. Work on scientific *root nodes* represents a powerful alternative... that will hopefully bear fruit for the world 🌍
Check out the great discussion that took place at the
@NeurIPSConf
"Navigating the Broader Impacts of AI" workshop:
The keynote by
@hannawallach
is outstanding, and we answer questions about the new ethics review process (in the first session).
A lot of valuable information in this post about how we
@DeepMind
structure our ethical review processes for complex artefacts like AlphaFold🧬
The in-depth review took ≈ one year to complete, and is still ongoing.
Especially glad about the partnerships to address NTDs.
To help society benefit from AI, we seek to pioneer responsibly - proactively exploring the implications of our research & applications. Today we’re sharing detailed reflections & lessons from one of our most complex and rewarding projects,
#AlphaFold
: 1/
Large models embody value judgements through-and-through:
1. What data is included and what is excluded?
2. What principles are used for RLHF & CAI?
3. Who is in the rater pool and who is excluded?
4. How are ratings aggregated to train a reward?
5. What is good performance?
Join us tomorrow for discussion of advanced AI assistants: what they are, how we might relate to them & what their impact could be if deployed widely.
We'll look at the ethical and social implications of this new AI paradigm & some of the key questions it gives rise to ✨
🗓️This Wednesday: the SRI Seminar Series welcomes philosopher
@IasonGabriel
(of
@GoogleDeepMind
) for a talk on what a world populated by advanced AI assistants might look like. March 20, 12:30pm ET.
💡Talk: “The ethics of advanced AI assistants”
🔗
The current zeitgeist in discussions about AI and values is that “preferences are all you need”
But preferences are only wobbly data (see below).
We need to focus intensely on *whose* preferences are being encoded, and in what way…
Fantastic new paper on post-colonial theory and the ethics (and future) of artificial intelligence:
Congratulations
@shakir_za
🔥
@png_marie
🔥 and
@wsisaac
🔥
Q. 'How can we get an AI to act in way that is consistent with human values when we disagree amongst ourselves about those values?'
Delighted to have recorded a new 'Philosophical Disquisitions' podcast with the ever insightful
@JohnDanaher
✨
New episode!
#87
- An excellent conversation with
@IasonGabriel
about AI and the Alignment Problem: How can we get an AI to act in way that is consistent with human values when we disagree amongst ourselves about those values?
Now on my way to
@AIESConf
and barista at Pret A Manger just asked me if it’s better to plan your life ahead or stay anchored in the present moment 🤔
I may be emitting philosophy vibes ✨✨✨
Impressive work by
@AnthropicAI
and
@collect_intel
—collectively sourcing the values we encode in AI via representative processes!
When consulted, participants foregrounded new concerns, including objectivity and impartiality, accessibility and promoting good outcomes.
We used Polis to ask our public to deliberate on the normative values they would like AI to abide by, and then used those opinions to curate a new AI constitution. We found that the Public constitution overlapped with the Anthropic-written constitution ~50% of the time.
With these amazing technical advances, it's a great time to ask: what kind of AI assistants do we want to see in the world?
And what kinds of principle or commitment could steer their development in the right direction?
Here are some initial ideas to get things started, a 🧵:
What does it mean for conversation with an AI system to be good or even ideal?
And what does it mean for speech to be false, biased or problematic?
This new paper w.
@Dr_Atoosa
explores these questions through the lens of pragmatics and philosophy🎓
How can conversational agents be aligned with human values?
New research from
@Dr_Atoosa
and
@IasonGabriel
explores this question using philosophy and linguistics:
The "one person–one agent" model of AI alignment is far too simple for the challenges we now encounter.
Value alignment is a tetradic relationship between:
1. The user
2. The AI agent
3. The developer or corp
4. Society
Only this frame brings political econ / power into view
Elements of a test for AI manipulation:
1. Did the person's behaviour change?
2. Do they know it changed?
3. Do they know *why* it changed?
4. Do they *endorse* the change, after the fact?
5. Do they endorse it when presented with counter-evidence?
The reflective "ladder"...🪜
Another week in the world of AI, another week in the world of AI ethics… so here’s a pic with
@iamtrask
&
@saffronhuang
(taken last summer) that I use to keep the spirits up! ✨
In a nutshell 🌰: the basic structure of society is composite of socio-technical systems. AI plays an increasingly important role in these systems. Therefore, principles of distributive justice, including fair equality of opportunity and the difference principle, apply to AI.
Not a demotion... but in fact a promotion: say goodbye to 'senior' and hello to 'staff' research scientist Ias 🙌
My brother claims it's like the mafia and I am now a 'made man'. Please don't mess with AI ethics! 😂
It's becoming a challenge to keep up with the outstanding research the
@hannahrosekirk
– and also, relatedly
@sorenmind
– are producing these days!
People keep asking me what comes after the Advanced AI Assistants paper: The answer is always "reading" 📚📚📚
Today we're launching PRISM, a new resource to diversify the voices contributing to alignment. We asked 1500 people around the world for their stated preferences over LLM behaviours, then we observed their contextual preferences in 8000 convos with 21 LLMs
If you're curious about:
– AI agents 🤖
– The values they embed ⚖️
– Human relationships with AI 👫
– The choices in front of us now ✊
Then check out our new podcast with
@justinhendrix
,
@ShannonVallor
&
@techpolicypress
:
Hi everyone! You can now read our new paper on the ethical and social risks of large language models on
@arxiv
. It's the product of more than a year's research into these Qs led by
@weidingerlaura
, and builds upon fantastic work by others in the field 🌟
🚨PAPER'S OUT! Very excited that today we’re releasing our taxonomy of "Ethical and Social risks associated with Large Language Models". It's been a year+ in the making and yet is only the beginning for many of us , blog: (1/n)
Recent AI safety discussion has focused on training runs—at the expense of other factors. The ground zero for responsible AI is deployment decisions: Has the ecosystem been flooded with untested products? Did a business place this technology in the hands of vulnerable users?
Hi everyone, check out this new podcast with
@matthewclifford
for Thought in Between!
We discuss the origin of ethics at DeepMind, unique aspects of AI, value alignment, distributive justice & language models🤖⚖️✍️
@dhadfieldmenell
Also astonishing given that in Myanmar and elsewhere "social media" has led to so much worse. It's hard to understand how someone can push this line of argument in good faith:
"In fact, we should expect AI systems to do so in the absence of anticipatory work to address how best to govern these systems, how to fairly distribute the benefits they generate, and how to fairly share access."
Does anyone know of good research that focuses on concerns about *equality* in the context of human enhancement e.g. what happens if only one group of people, but not others, can access life-extension or other powerful technologies?
What would happen if we took the value question – of how, and which, values to encode in A.I. systems – as seriously as the “hard” technical questions?
Powerful keynote on Thick A.I. Alignment from
@AlondraNelson46
@FAccTConference
‘23 👏
Last year Ted Chiang asked a room full of technologists whether AI “could be used to do anything other than sharpen the bleeding edge of capitalism”
Today, he asks if ChatGPT is anything more than a blurry jpeg of the web 🔥
Damning write-up in the
@nytimes
of some of the corners that appear to have been cut with the release of Bing. At this juncture, the robust testing and evaluation of new AI tech is vital:
I recently read and enjoyed
@mpshanahan
et al's paper: Role Play with Large Language Models:
The notion that LLMs are best understood as simulators that produce simulacra is illuminating – as is the deflationary account of intention it supports.
A potentially promising development – I'm especially keen to find out how genuinely democratic Alignment Assemblies can be 🗳️
There is undoubtedly a lot of potential here & also key pitfalls to be avoided:
Announcing Alignment Assemblies (AAs)!
CIP is piloting approaches to involve the public in shaping AI's development for the collective good. 🌍
We're working with partners including
@OpenAI
and
@Anthropic
on ways to connect the public directly to power.
More great work from a research team led by our model methodologist and evaluator in chief
@weidingerlaura
👏
Here's what we learned during the latest round of
@GoogleDeepMind
model testing🤖📊
📜 New paper unpacking Google DeepMind’s approach to safety evals for advanced AI models, with lessons learned to support the advancement of similar efforts by other actors in this space. Covers foresight, evaluation design, and the wider ecosystem.
This is a wonderful paper by
@Dr_Atoosa
"Algorithmic Fairness and Structural Injustice: Insights from Feminist Political Philosophy"
It does an amazing job exploring the limitations of current approaches... and maps out the path ahead ⚖️
A great article on a frontier question for AI alignment: to what extent can agents privilege the user?
"Defining the bounds of responsible and socially acceptable personalization is a non-trivial task beset with normative challenges"
h/t
@hannahrosekirk
The 𝘢𝘨𝘦𝘯𝘵𝘪𝘤 𝘱𝘢𝘳𝘢𝘥𝘰𝘹 arises for AI agents because autonomy is 𝘶𝘴𝘦𝘧𝘶𝘭—but also 𝘪𝘯𝘩𝘦𝘳𝘦𝘯𝘵𝘭𝘺 𝘳𝘪𝘴𝘬𝘺.
The ability to pursue partially specified goals, without human direction, means that goal may be misunderstood & consequential errors go uncorrected.
Delighted to see such a great collection of articles and thematic groupings… this probably wouldn’t have been possible 5 years ago. For the curious, please enjoy! 📚
Because I'm weird I decided to make an Ethics of AI syllabus. This is how I, who knows nothing, would want to teach the course. Included are some course narrative notes so people can see what I was going for if anyone is interested in making use of it....
Important questions and research from the folks at
@collect_intel
🖋️
They focus on:
1. Disproportionate value capture by companies that draw upon the digital commons,
2. The potential pollution of the commons by these companies,
3. Mechanisms to address this situation. ⚖️
As GPT-4 launches and AI marches forward, issues of generative AI and the digital commons become very salient. We need the right governance structures to ensure that they contribute to rather than degrades the internet.
Our working paper: (more below!)
To start with, it would mean that we have to consider forms of AI that are 100% aligned with developer interests—but harmful to users or society—as "fully aligned"
This new paper "Model Evaluations for Extreme Risk" by
@tshevl
and co. is a concise but important read for those looking to build safe AI systems & safe infrastructure:
Esp. pleased to see work on auditibility by
@rajiinio
gaining traction in this field.
With more powerful AI systems comes more responsibility to identify novel capabilities in models. 🔍
Our new research looks at evaluating future 𝘦𝘹𝘵𝘳𝘦𝘮𝘦 risks, which may cause harm through misuse or misalignment.
Here’s a snapshot of the work. 🧵
Interesting research showing that people find it easier to agree on fair processes for managing disagreement – than to settle first-order moral disagreement itself. With implications for the democratic approaches to AI value alignment ⚖️🤖❤️
For the AI safety camp, we surveyed people about ethical considerations related to AI alignment. We find that people have different object-level views (e.g. on abortion) but can mostly agree on high-level mechanisms (e.g. democracy, debate).
@KLdivergence
I moved from humanitarian work, to teaching philosophy, to working in an AI lab: each time I’ve known there was a kind of “change of season” going on inside – following that feeling has worked quite well. Work is part of life and it’s a privilege to experience different things ✨