Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Introducing METR's Autonomy Evaluation Resources, published by Megan Kinniment on March 16, 2024 on LessWrong.
This is METR's collection of resources for evaluating potentially dangerous autonomous capabilities of frontier models. The resources include a task suite, some software tooling, and guidelines on how to ensure an accurate measurement of model capability. Building on those, we've written an example evaluation protocol.
While intended as a "beta" and early working draft, the protocol represents our current best guess as to how AI developers and evaluators should evaluate models for dangerous autonomous capabilities.
We hope to iteratively improve this content, with explicit versioning; this is v0.1.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
view more