Download - LW - On OpenAI's Preparedness Framework by Zvi

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - On OpenAI's Preparedness Framework by Zvi

2023-12-21

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI's Preparedness Framework, published by Zvi on December 21, 2023 on LessWrong.
Previously: On RSPs.
Be Prepared
OpenAI introduces their preparedness framework for safety in frontier models.
A summary of the biggest takeaways, which I will repeat at the end:
I am very happy the preparedness framework exists at all.
I am very happy it is beta and open to revision.
It's very vague and needs fleshing out in several places.
The framework exceeded expectations, with many great features. I updated positively.
I am happy we can talk price, while noting our prices are often still far apart.
Critical thresholds seem too high, if you get this wrong all could be lost. The High threshold for autonomy also seems too high.
The framework relies upon honoring its spirit and not gaming the metrics.
There is still a long way to go. But that is to be expected.
There is a lot of key detail that goes beyond that, as well.
Anthropic and OpenAI have now both offered us detailed documents that reflect real and costly commitments, and that reflect real consideration of important issues. Neither is complete or adequate in its current form, but neither claims to be.
I will start with the overview, then go into the details. Both are promising, if treated as foundations to build upon, and if the requirements and alarms are honored in spirit rather than treated as technical boxes to be checked.
The study of frontier AI risks has fallen far short of what is possible and where we need to be. To address this gap and systematize our safety thinking, we are adopting the initial version of our Preparedness Framework. It describes OpenAI's processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly powerful models.
Very good to acknowledge up front that past efforts have been inadequate.
I also appreciate this distinction:
Three different tasks, in order, with different solutions:
Make current models well-behaved.
Guard against dangers from new frontier models.
Prepare for the endgame of superintelligent AI systems.
What works best on an earlier problem likely will not work on a later problem. What works on a later problem will sometimes but not always also solve an earlier problem.
I also appreciate that the framework is labeled as a Beta, and that it is named a Preparedness Framework rather than an RSP (Responsible Scaling Policy, the name Anthropic used that many including myself objected to as inaccurate).
Basic Principles
Their approach is, like many things at OpenAI, driven by iteration.
Preparedness should be driven by science and grounded in facts
We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what's happening today to anticipate what's ahead. This is so critical to our mission that we are bringing our top technical talent to this work.
We bring a builder's mindset to safety
Our company is founded on tightly coupling science and engineering, and the Preparedness Framework brings that same approach to our work on safety. We learn from real-world deployment and use the lessons to mitigate emerging risks. For safety work to keep pace with the innovation ahead, we cannot simply do less, we need to continue learning through iterative deployment.
There are big advantages to this approach. The biggest danger in the approach is the potential failure to be able to successfully anticipate what is ahead in exactly the most dangerous situations where something discontinuous happens. Another danger is that if the safety requirements are treated as check boxes rather than honored in spirit, then it is easy to optimi...