Download - Seeking Power is Often Convergently Instrumental in MDPs by TurnTrout, elriggs

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong Top Posts

Education

Seeking Power is Often Convergently Instrumental in MDPs by TurnTrout, elriggs

2021-12-11

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Seeking Power is Often Convergently Instrumental in MDPs , published by TurnTrout, elriggs on the LessWrong.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a linkpost for
In 2008, Steve Omohundro's foundational paper The Basic AI Drives conjectured that superintelligent goal-directed AIs might be incentivized to gain significant amounts of power in order to better achieve their goals. Omohundro's conjecture bears out in toy models, and the supporting philosophical arguments are intuitive. In 2019, the conjecture was even debated by well-known AI researchers.
Power-seeking behavior has been heuristically understood as an anticipated risk, but not as a formal phenomenon with a well-understood cause. The goal of this post (and the accompanying paper, Optimal Policies Tend to Seek Power) is to change that.
Motivation
It’s 2008, the ancient wild west of AI alignment. A few people have started thinking about questions like “if we gave an AI a utility function over world states, and it actually maximized that utility... what would it do?"
In particular, you might notice that wildly different utility functions seem to encourage similar strategies.
Resist shutdown?
Gain computational resources?
Prevent modification of utility function?
Paperclip utility
✔️
✔️
✔️
Blue webcam pixel utility
✔️
✔️
✔️
People-look-happy utility
✔️
✔️
✔️
These strategies are unrelated to terminal preferences: the above utility functions do not award utility to e.g. resource gain in and of itself. Instead, these strategies are instrumental: they help the agent optimize its terminal utility. In particular, a wide range of utility functions incentivize these instrumental strategies. These strategies seem to be convergently instrumental.
But why?
I’m going to informally explain a formal theory which makes significant progress in answering this question. I don’t want this post to be Optimal Policies Tend to Seek Power with cuter illustrations, so please refer to the paper for the math. You can read the two concurrently.
We can formalize questions like “do ‘most’ utility maximizers resist shutdown?” as “Given some prior beliefs about the agent’s utility function, knowledge of the environment, and the fact that the agent acts optimally, with what probability do we expect it to be optimal to avoid shutdown?”
The table’s convergently instrumental strategies are about maintaining, gaining, and exercising power over the future, in some sense. Therefore, this post will help answer:
What does it mean for an agent to “seek power”?
In what situations should we expect seeking power to be more probable under optimality, than not seeking power?
This post won’t tell you when you should seek power for your own goals; this post illustrates a regularity in optimal action across different goals one might pursue.
Formalizing Convergent Instrumental Goals suggests that the vast majority of utility functions incentivize the agent to exert a lot of control over the future, assuming that these utility functions depend on “resources.” This is a big assumption: what are “resources”, and why must the AI’s utility function depend on them? We drop this assumption, assuming only unstructured reward functions over a finite Markov decision process (MDP), and show from first principles how power-seeking can often be optimal.
Formalizing the Environment
My theorems apply to finite MDPs; for the unfamiliar, I’ll illustrate with Pac-Man.
Full observability: You can see everything that’s going on; this information is packaged in the state s. In Pac-Man, the state is the game screen.
Markov transition function: the next state depends only on the choice of action a and the current state s. It doesn’t matter how we got into a situation.
Discounted reward: future rewards get geometrically discoun...