Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI's Preparedness Framework, published by Zvi on December 21, 2023 on LessWrong.
Previously: On RSPs.
Be Prepared
OpenAI introduces their preparedness framework for safety in frontier models.
A summary of the biggest takeaways, which I will repeat at the end:
I am very happy the preparedness framework exists at all.
I am very happy it is beta and open to revision.
It's very vague and needs fleshing out in several places.
The framework exceeded expectations, with many great features. I updated positively.
I am happy we can talk price, while noting our prices are often still far apart.
Critical thresholds seem too high, if you get this wrong all could be lost. The High threshold for autonomy also seems too high.
The framework relies upon honoring its spirit and not gaming the metrics.
There is still a long way to go. But that is to be expected.
There is a lot of key detail that goes beyond that, as well.
Anthropic and OpenAI have now both offered us detailed documents that reflect real and costly commitments, and that reflect real consideration of important issues. Neither is complete or adequate in its current form, but neither claims to be.
I will start with the overview, then go into the details. Both are promising, if treated as foundations to build upon, and if the requirements and alarms are honored in spirit rather than treated as technical boxes to be checked.
The study of frontier AI risks has fallen far short of what is possible and where we need to be. To address this gap and systematize our safety thinking, we are adopting the initial version of our Preparedness Framework. It describes OpenAI's processes to track, evaluate, forecast, and protect against catastrophic risks posed by increasingly powerful models.
Very good to acknowledge up front that past efforts have been inadequate.
I also appreciate this distinction:
Three different tasks, in order, with different solutions:
Make current models well-behaved.
Guard against dangers from new frontier models.
Prepare for the endgame of superintelligent AI systems.
What works best on an earlier problem likely will not work on a later problem. What works on a later problem will sometimes but not always also solve an earlier problem.
I also appreciate that the framework is labeled as a Beta, and that it is named a Preparedness Framework rather than an RSP (Responsible Scaling Policy, the name Anthropic used that many including myself objected to as inaccurate).
Basic Principles
Their approach is, like many things at OpenAI, driven by iteration.
Preparedness should be driven by science and grounded in facts
We are investing in the design and execution of rigorous capability evaluations and forecasting to better detect emerging risks. In particular, we want to move the discussions of risks beyond hypothetical scenarios to concrete measurements and data-driven predictions. We also want to look beyond what's happening today to anticipate what's ahead. This is so critical to our mission that we are bringing our top technical talent to this work.
We bring a builder's mindset to safety
Our company is founded on tightly coupling science and engineering, and the Preparedness Framework brings that same approach to our work on safety. We learn from real-world deployment and use the lessons to mitigate emerging risks. For safety work to keep pace with the innovation ahead, we cannot simply do less, we need to continue learning through iterative deployment.
There are big advantages to this approach. The biggest danger in the approach is the potential failure to be able to successfully anticipate what is ahead in exactly the most dangerous situations where something discontinuous happens. Another danger is that if the safety requirements are treated as check boxes rather than honored in spirit, then it is easy to optimi...
view more