Background: The Plan, The Plan: 2022 Update. If you haven’t read those, don’t worry, we’re going to go through things from the top this year, and with moderately more detail than before.
1. What's Your Plan For AI Alignment?Median happy trajectory:
- Sort out our fundamental confusions about agency and abstraction enough to do interpretability that works and generalizes robustly.
- Look through our AI's internal concepts for a good alignment target, then Retarget the Search [1].
- …
- Profit!
We’ll talk about some other (very different) trajectories shortly.
A side-note on how I think about plans: I’m not really optimizing to make the plan happen. Rather, I think about many different “plans” as possible trajectories, and my optimization efforts are aimed at robust bottlenecks - subproblems which are bottlenecks on lots of different trajectories. An example from the linked post:
For instance, if I wanted to build a solid-state amplifier in [...]
The original text contained 3 footnotes which were omitted from this narration.
---
First published: December 29th, 2023
Source: https://www.lesswrong.com/posts/HfqbjwpAEGep9mHhc/the-plan-2023-version
---
Narrated by TYPE III AUDIO.