Download - The strategy-stealing assumption by Paul Christiano

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

The strategy-stealing assumption by Paul Christiano

2021-12-03

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: The strategy-stealing assumption, published by Paul Christiano on the AI Alignment Forum.
Suppose that 1% of the world’s resources are controlled by unaligned AI, and 99% of the world’s resources are controlled by humans. We might hope that at least 99% of the universe’s resources end up being used for stuff-humans-like (in expectation).
Jessica Taylor argued for this conclusion in Strategies for Coalitions in Unit-Sum Games: if the humans divide into 99 groups each of which acquires influence as effectively as the unaligned AI, then by symmetry each group should end, up with as much influence as the AI, i.e. they should end up with 99% of the influence.
This argument rests on what I’ll call the strategy-stealing assumption: for any strategy an unaligned AI could use to influence the long-run future, there is an analogous strategy that a similarly-sized group of humans can use in order to capture a similar amount of flexible influence over the future. By “flexible” I mean that humans can decide later what to do with that influence — which is important since humans don’t yet know what we want in the long run.
Why might the strategy-stealing assumption be true?
Today there are a bunch of humans, with different preferences and different kinds of influence. Crudely speaking, the long-term outcome seems to be determined by some combination of {which preferences have how much influence?} and {what is the space of realizable outcomes?}.
I expect this to become more true over time — I expect groups of agents with diverse preferences to eventually approach efficient outcomes, since otherwise there are changes that every agent would prefer (though this is not obvious, especially in light of bargaining failures). Then the question is just about which of these efficient outcomes we pick.
I think that our actions don’t effect the space of realizable outcomes, because long-term realizability is mostly determined by facts about distant stars that we can’t yet influence. The obvious exception is that if we colonize space faster, we will have access more resources. But quantitatively this doesn’t seem like a big consideration, because astronomical events occur over millions of millennia while our decisions only change colonization timelines by decades.
So I think our decisions mostly affect long-term outcomes by changing the relative weights of different possible preferences (or by causing extinction).
Today, one of the main ways that preferences have weight is because agents with those preferences control resources and other forms of influence. Strategy-stealing seems most possible for this kind of plan — an aligned AI can exactly copy the strategy of an unaligned AI, except the money goes into the aligned AI’s bank account instead. The same seems true for most kinds of resource gathering.
There are lots of strategies that give influence to other people instead of helping me. For example, I might preferentially collaborate with people who share my values. But I can still steal these strategies, as long as my values are just as common as the values of the person I’m trying to steal from. So a majority can steal strategies from a minority, but not the other way around.
There can be plenty of strategies that don’t involve acquiring resources or flexible influence. For example, we could have a parliament with obscure rules in which I can make maneuvers that advantage one set of values or another in a way that can’t be stolen. Strategy-stealing may only be possible at the level of groups — you need to retain the option of setting up a different parliamentary system that doesn’t favor particular values. Even then, it’s unclear whether strategy-stealing is possible.
There isn’t a clean argument for strategy-stealing, but I think it seems plausible enough that it’s meaningful and productive...