Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) , published by Andrew Critch on the AI Alignment Forum.
With: Thomas Krendl Gilbert, who provided comments, interdisciplinary feedback, and input on the RAAP concept. Thanks also for comments from Ramana Kumar.
Target audience: researchers and institutions who think about existential risk from artificial intelligence, especially AI researchers.
Preceded by: Some AI research areas and their relevance to existential safety, which emphasized the value of thinking about multi-stakeholder/multi-agent social applications, but without concrete extinction scenarios.
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated agency is the cause. Scenarios with multiple AI-enabled superpowers are often called “multipolar” scenarios in AI futurology jargon, as opposed to “unipolar” scenarios with just one superpower.
Unipolar take-offs Multipolar take-offs
Slow take-offs Part 1 of this post
Fast take-offs Part 2 of this post
Part 1 covers a batch of stories that play out slowly (“slow take-offs”), and Part 2 stories play out quickly. However, in the end I don’t want you to be super focused how fast the technology is taking off. Instead, I’d like you to focus on multi-agent processes with a robust tendency to play out irrespective of which agents execute which steps in the process. I’ll call such processes Robust Agent-Agnostic Processes (RAAPs).
A group walking toward a restaurant is a nice example of a RAAP, because it exhibits:
Robustness: If you temporarily distract one of the walkers to wander off, the rest of the group will keep heading toward the restaurant, and the distracted member will take steps to rejoin the group.
Agent-agnosticism: Who’s at the front or back of the group might vary considerably during the walk. People at the front will tend to take more responsibility for knowing and choosing what path to take, and people at the back will tend to just follow. Thus, the execution of roles (“leader”, “follower”) is somewhat agnostic as to which agents execute them.
Interestingly, if all you want to do is get one person in the group not to go to the restaurant, sometimes it’s actually easier to achieve that by convincing the entire group not to go there than by convincing just that one person. This example could be extended to lots of situations in which agents have settled on a fragile consensus for action, in which it is strategically easier to motivate a new interpretation of the prior consensus than to pressure one agent to deviate from it.
I think a similar fact may be true about some agent-agnostic processes leading to AI x-risk, in that agent-specific interventions (e.g., aligning or shutting down this or that AI system or company) will not be enough to avert the process, and might even be harder than trying to shift the structure of society as a whole. Moreover, I believe this is true in both “slow take-off” and “fast take-off” AI development scenarios
This is because RAAPs can arise irrespective of the speed of the underlying “host” agents. RAAPs are made more or less likely to arise based on the “structure” of a given interaction. As such, the problem of avoiding the emergence of unsafe RAAPs, or ensuring the emergence of safe ones, is a problem of mechanism design (wiki/Mechanism_design). I recently learned that in sociology, the concept of a field (martin2003field, fligsteinmcadam2012fields) is roughly defined as a social space or arena in which the motivation and behavior of agents are explained through reference to surrounding processes or “structure” rather than freedom or chance. In my parlance, mechanisms cause fields, and fields cause RAAPs.
Meta / prefac...
view more