Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Plausible cases for HRAD work, and locating the crux in the realism about rationality debate, published by Issa Rice on the AI Alignment Forum.
This post is my attempt to summarize and distill the major public debates about MIRI's highly reliable agent designs (HRAD) work (which includes work on decision theory), including the discussions in Realism about rationality and Daniel Dewey's My current thoughts on MIRI's "highly reliable agent design" work. Part of the difficulty with discussing the value of HRAD work is that it's not even clear what the disagreement is about, so my summary takes the form of multiple possible "worlds" we might be in; each world consists of a positive case for doing HRAD work, along with the potential objections to that case, which results in one or more cruxes.
I will talk about "being in a world" throughout this post. What I mean by this is the following: If we are "in world X", that means that the case for HRAD work outlined in world X is the one that most resonates with MIRI people as their motivation for doing HRAD work; and that when people disagree about the value of HRAD work, this is what the disagreement is about. When I say that "I think we are in this world", I don't mean that I agree with this case for HRAD work; it just means that this is what I think MIRI people think.
In this post, the pro-HRAD stance is something like "HRAD work is the most important kind of technical research in AI alignment; it is the overwhelming priority and we're pretty much screwed if we under-invest in this kind of research" and the anti-HRAD stance is something like "HRAD work seems significantly less promising than other technical AI alignment agendas, such as the approaches to directly align machine learning systems (e.g. iterated amplification)". There is a much weaker pro-HRAD stance, which is something like "HRAD work is interesting and doing more of it adds value, but it's not necessarily the most important kind of technical AI alignment research to be working on"; this post is not about this weaker stance.
Clarifying some terms
Before describing the various worlds, I want to present some distinctions that have come up in discussions about HRAD, which will be relevant when distinguishing between the worlds.
Levels of abstraction vs levels of indirection
The idea of levels of abstraction was introduced in the context of debate about HRAD work by Rohin Shah, and is described in this comment (start from "When groups of humans try to build complicated stuff"). For more background, see these articles on Wikipedia.
Later on, in this comment Rohin gave a somewhat different "levels" idea, which I've decided to call "levels of indirection". The idea is that there might not be a hierarchy of abstraction, but there's still multiple intermediate layers between the theory you have and the end-result you want. The relevant "levels of indirection" is the sequence HRAD → machine learning → AGI. Even though levels of indirection are different from levels of abstraction, the idea is that the same principle applies, where the more levels there are, the harder it becomes for a theory to apply to the final level.
Precise vs imprecise theory
A precise theory is one which can scale to 2+ levels of abstraction/indirection.
An imprecise theory is one which can scale to at most 1 level of abstraction/indirection.
More intuitively, a precise theory is more mathy, rigorous, and exact like pure math and physics, and an imprecise theory is less mathy, like economics and psychology.
Building agents from the ground up vs understanding the behavior of rational agents and predicting roughly what they will do
This distinction comes from Abram Demski's comment. However, I'm not confident I've understood this distinction in the way that Abram intended it, so what I describe below...
view more