Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI's Model Spec, published by Zvi on June 22, 2024 on LessWrong.
There are multiple excellent reasons to publish a Model Spec like OpenAI's, that specifies how you want your model to respond in various potential situations.
1. It lets us have the debate over how we want the model to act.
2. It gives us a way to specify what changes we might request or require.
3. It lets us identify whether a model response is intended.
4. It lets us know if the company successfully matched its spec.
5. It lets users and prospective users know what to expect.
6. It gives insight into how people are thinking, or what might be missing.
7. It takes responsibility.
These all apply even if you think the spec in question is quite bad. Clarity is great.
As a first stab at a model spec from OpenAI, this actually is pretty solid. I do suggest some potential improvements and one addition. Many of the things I disagree with here are me having different priorities and preferences than OpenAI rather than mistakes in the spec, so I try to differentiate those carefully. Much of the rest is about clarity on what is a rule versus a default and exactly what matters.
In terms of overall structure, there is a clear mirroring of classic principles like Asimov's Laws of Robotics, but the true mirror might be closer to Robocop.
What are the central goals of OpenAI here?
1. Objectives: Broad, general principles that provide a directional sense of the desired behavior
Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI's mission.
Reflect well on OpenAI: Respect social norms and applicable law.
I appreciate the candor on the motivating factors here. There is no set ordering here. We should not expect 'respect social norms and applicable law' to be the only goal.
I would have phrased this in a hierarchy, and clarified where we want negative versus positive objectives in place. If Reflect is indeed a negative objective, in the sense that the objective is to avoid actions that reflect poorly and act as a veto, let's say so.
Even more importantly, we should think about this with Benefit. As in, I would expect that you would want something like this:
1. Assist the developer and end user…
2. …as long as doing so is a net Benefit to humanity, or at least not harmful to it…
3. …and this would not Reflect poorly on OpenAI, via norms, laws or otherwise.
Remember that Asimov's laws were also negative, as in you could phrase his laws as:
1. Obey the orders of a human…
2. …unless doing so would Harm a human, or allow one to come to harm.
3. …and to the extent possible Preserve oneself.
Reflections on later book modifications are also interesting parallels here.
This reconfiguration looks entirely compatible with the rest of the document.
What are the core rules and behaviors?
2. Rules: Instructions that address complexity and help ensure safety and legality
Follow the chain of command
Comply with applicable laws
Don't provide information hazards
Respect creators and their rights
Protect people's privacy
Don't respond with NSFW (not safe for work) content
What is not listed here is even more interesting than what is listed. We will return to the rules later.
3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives
Assume best intentions from the user or developer
Ask clarifying questions when necessary
Be as helpful as possible without overstepping
Support the different needs of interactive chat and programmatic use
Assume an objective point of view
Encourage fairness and...
view more