Download - Some AI research areas and their relevance to existential safety by Andrew_Critch

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong Top Posts

Education

Some AI research areas and their relevance to existential safety by Andrew_Critch

2021-12-12

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Some AI research areas and their relevance to existential safety, published by Andrew_Critch on the LessWrong.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
Followed by: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs), which provides examples of multi-stakeholder/multi-agent interactions leading to extinction events.
Introduction
This post is an overview of a variety of AI research areas in terms of how much I think contributing to and/or learning from those areas might help reduce AI x-risk. By research areas I mean “AI research topics that already have groups of people working on them and writing up their results”, as opposed to research “directions” in which I’d like to see these areas “move”.
I formed these views mostly pursuant to writing AI Research Considerations for Human Existential Safety (ARCHES). My hope is that my assessments in this post can be helpful to students and established AI researchers who are thinking about shifting into new research areas specifically with the goal of contributing to existential safety somehow. In these assessments, I find it important to distinguish between the following types of value:
The helpfulness of the area to existential safety, which I think of as a function of what services are likely to be provided as a result of research contributions to the area, and whether those services will be helpful to existential safety, versus
The educational value of the area for thinking about existential safety, which I think of as a function of how much a researcher motivated by existential safety might become more effective through the process of familiarizing with or contributing to that area, usually by focusing on ways the area could be used in service of existential safety.
The neglect of the area at various times, which is a function of how much technical progress has been made in the area relative to how much I think is needed.
Importantly:
The helpfulness to existential safety scores do not assume that your contributions to this area would be used only for projects with existential safety as their mission. This can negatively impact the helpfulness of contributing to areas that are more likely to be used in ways that harm existential safety.
The educational value scores are not about the value of an existential-safety-motivated researcher teaching about the topic, but rather, learning about the topic.
The neglect scores are not measuring whether there is enough “buzz” around the topic, but rather, whether there has been adequate technical progress in it. Buzz can predict future technical progress, though, by causing people to work on it.
Below is a table of all the areas I considered for this post, along with their entirely subjective “scores” I’ve given them. The rest of this post can be viewed simply as an elaboration/explanation of this table:
Existing Research Area
Social Application
Helpfulness to Existential Safety
Educational Value
2015 Neglect
2020 Neglect
2030 Neglect
Out of Distribution Robustness
Zero/
Single
1/10
4/10
5/10
3/10
1/10
Agent Foundations
Zero/
Single
3/10
8/10
9/10
8/10
7/10
Multi-agent RL
Zero/
Multi
2/10
6/10
5/10
4/10
0/10
Preference Learning
Single/
Single
1/10
4/10
5/10
1/10
0/10
Side-effect Minimization
Single/
Single
4/10
4/10
6/10
5/10
4/10
Human-Robot Interaction
Single/
Single
6/10
7/10
5/10
4/10
3/10
Interpretability in ML
Single/
Single
8/10
6/10
8/10
6/10
2/10
Fairness in ML
Multi/
Single
6/10
5/10
7/10
3/10
2/10
Computational Social Choice
Multi/
Single
7/10
7/10
7/10
5/10
4/10
Accountability in ML
Multi/
Multi
8/10
3/10
8/10
7/10
5/10
The research areas are ordered from least-socially-complex to most-socially-complex. This roughly (though imperfectly) correlates with addressing existential safety problems of increa...