Download - Plausible cases for HRAD work, and locating the crux in the realism about rationality debate by Issa Rice

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Plausible cases for HRAD work, and locating the crux in the realism about rationality debate by Issa Rice

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Plausible cases for HRAD work, and locating the crux in the realism about rationality debate, published by Issa Rice on the AI Alignment Forum. This post is my attempt to summarize and distill the major public debates about MIRI's highly reliable agent designs (HRAD) work (which includes work on decision theory), including the discussions in Realism about rationality and Daniel Dewey's My current thoughts on MIRI's "highly reliable agent design" work. Part of the difficulty with discussing the value of HRAD work is that it's not even clear what the disagreement is about, so my summary takes the form of multiple possible "worlds" we might be in; each world consists of a positive case for doing HRAD work, along with the potential objections to that case, which results in one or more cruxes. I will talk about "being in a world" throughout this post. What I mean by this is the following: If we are "in world X", that means that the case for HRAD work outlined in world X is the one that most resonates with MIRI people as their motivation for doing HRAD work; and that when people disagree about the value of HRAD work, this is what the disagreement is about. When I say that "I think we are in this world", I don't mean that I agree with this case for HRAD work; it just means that this is what I think MIRI people think. In this post, the pro-HRAD stance is something like "HRAD work is the most important kind of technical research in AI alignment; it is the overwhelming priority and we're pretty much screwed if we under-invest in this kind of research" and the anti-HRAD stance is something like "HRAD work seems significantly less promising than other technical AI alignment agendas, such as the approaches to directly align machine learning systems (e.g. iterated amplification)". There is a much weaker pro-HRAD stance, which is something like "HRAD work is interesting and doing more of it adds value, but it's not necessarily the most important kind of technical AI alignment research to be working on"; this post is not about this weaker stance. Clarifying some terms Before describing the various worlds, I want to present some distinctions that have come up in discussions about HRAD, which will be relevant when distinguishing between the worlds. Levels of abstraction vs levels of indirection The idea of levels of abstraction was introduced in the context of debate about HRAD work by Rohin Shah, and is described in this comment (start from "When groups of humans try to build complicated stuff"). For more background, see these articles on Wikipedia. Later on, in this comment Rohin gave a somewhat different "levels" idea, which I've decided to call "levels of indirection". The idea is that there might not be a hierarchy of abstraction, but there's still multiple intermediate layers between the theory you have and the end-result you want. The relevant "levels of indirection" is the sequence HRAD → machine learning → AGI. Even though levels of indirection are different from levels of abstraction, the idea is that the same principle applies, where the more levels there are, the harder it becomes for a theory to apply to the final level. Precise vs imprecise theory A precise theory is one which can scale to 2+ levels of abstraction/indirection. An imprecise theory is one which can scale to at most 1 level of abstraction/indirection. More intuitively, a precise theory is more mathy, rigorous, and exact like pure math and physics, and an imprecise theory is less mathy, like economics and psychology. Building agents from the ground up vs understanding the behavior of rational agents and predicting roughly what they will do This distinction comes from Abram Demski's comment. However, I'm not confident I've understood this distinction in the way that Abram intended it, so what I describe below...