Download - LW - On OpenAI's Model Spec by Zvi

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - On OpenAI's Model Spec by Zvi

2024-06-22

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On OpenAI's Model Spec, published by Zvi on June 22, 2024 on LessWrong.
There are multiple excellent reasons to publish a Model Spec like OpenAI's, that specifies how you want your model to respond in various potential situations.
1. It lets us have the debate over how we want the model to act.
2. It gives us a way to specify what changes we might request or require.
3. It lets us identify whether a model response is intended.
4. It lets us know if the company successfully matched its spec.
5. It lets users and prospective users know what to expect.
6. It gives insight into how people are thinking, or what might be missing.
7. It takes responsibility.
These all apply even if you think the spec in question is quite bad. Clarity is great.
As a first stab at a model spec from OpenAI, this actually is pretty solid. I do suggest some potential improvements and one addition. Many of the things I disagree with here are me having different priorities and preferences than OpenAI rather than mistakes in the spec, so I try to differentiate those carefully. Much of the rest is about clarity on what is a rule versus a default and exactly what matters.
In terms of overall structure, there is a clear mirroring of classic principles like Asimov's Laws of Robotics, but the true mirror might be closer to Robocop.
What are the central goals of OpenAI here?
1. Objectives: Broad, general principles that provide a directional sense of the desired behavior
Assist the developer and end user: Help users achieve their goals by following instructions and providing helpful responses.
Benefit humanity: Consider potential benefits and harms to a broad range of stakeholders, including content creators and the general public, per OpenAI's mission.
Reflect well on OpenAI: Respect social norms and applicable law.
I appreciate the candor on the motivating factors here. There is no set ordering here. We should not expect 'respect social norms and applicable law' to be the only goal.
I would have phrased this in a hierarchy, and clarified where we want negative versus positive objectives in place. If Reflect is indeed a negative objective, in the sense that the objective is to avoid actions that reflect poorly and act as a veto, let's say so.
Even more importantly, we should think about this with Benefit. As in, I would expect that you would want something like this:
1. Assist the developer and end user…
2. …as long as doing so is a net Benefit to humanity, or at least not harmful to it…
3. …and this would not Reflect poorly on OpenAI, via norms, laws or otherwise.
Remember that Asimov's laws were also negative, as in you could phrase his laws as:
1. Obey the orders of a human…
2. …unless doing so would Harm a human, or allow one to come to harm.
3. …and to the extent possible Preserve oneself.
Reflections on later book modifications are also interesting parallels here.
This reconfiguration looks entirely compatible with the rest of the document.
What are the core rules and behaviors?
2. Rules: Instructions that address complexity and help ensure safety and legality
Follow the chain of command
Comply with applicable laws
Don't provide information hazards
Respect creators and their rights
Protect people's privacy
Don't respond with NSFW (not safe for work) content
What is not listed here is even more interesting than what is listed. We will return to the rules later.
3. Default behaviors: Guidelines that are consistent with objectives and rules, providing a template for handling conflicts and demonstrating how to prioritize and balance objectives
Assume best intentions from the user or developer
Ask clarifying questions when necessary
Be as helpful as possible without overstepping
Support the different needs of interactive chat and programmatic use
Assume an objective point of view
Encourage fairness and...