Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Hiring engineers and researchers to help align GPT-3, published by paulfchristiano on the LessWrong.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
My team at OpenAI, which works on aligning GPT-3, is hiring ML engineers and researchers. Apply here for the ML engineer role and here for the ML researcher role.
GPT-3 is similar enough to "prosaic" AGI that we can work on key alignment problems without relying on conjecture or speculative analogies. And because GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet.
I think this puts our team in a great place to have an impact:
If our research succeeds I think it will directly reduce existential risk from AI. This is not meant to be a warm-up problem, I think it’s the real thing.
We are working with state of the art systems that could pose an existential risk if scaled up, and our team’s success actually matters to the people deploying those systems.
We are working on the whole pipeline from “interesting idea” to “production-ready system,” building critical skills and getting empirical feedback on whether our ideas actually work.
We have the real-world problems to motivate alignment research, the financial support to hire more people, and a research vision to execute on. We are bottlenecked by excellent researchers and engineers who are excited to work on alignment.
What the team does
In the past Reflection focused on fine-tuning GPT-3 using a reward function learned from human feedback. Our most recent results are here, and had the unusual virtue of simultaneously being exciting enough to ML researchers to be accepted at NeurIPS while being described by Eliezer as “directly, straight-up relevant to real alignment problems.”
We’re currently working on three things:
[20%] Applying basic alignment approaches to the API, aiming to close the gap between theory and practice.
[60%] Extending existing approaches to tasks that are too hard for humans to evaluate; in particular, we are training models that summarize more text than human trainers have time to read. Our approach is to use weaker ML systems operating over shorter contexts to help oversee stronger ones over longer contexts. This is conceptually straightforward but still poses significant engineering and ML challenges.
[20%] Conceptual research on domains that no one knows how to oversee and empirical work on debates between humans (see our 2019 writeup). I think the biggest open problem is figuring out how and if human overseers can leverage “knowledge” the model acquired during training (see an example here).
If successful, ideas will eventually move up this list, from the conceptual stage to ML prototypes to real deployments. We’re viewing this as practice for integrating alignment into transformative AI deployed by OpenAI or another organization.
What you’d do
Most people on the team do a subset of these core tasks:
Design+build+maintain code for experimenting with novel training strategies for large language models. This infrastructure needs to support a diversity of experimental changes that are hard to anticipate in advance, work as a solid base to build on for 6-12 months, and handle the complexity of working with large language models. Most of our code is maintained by 1-3 people and consumed by 2-4 people (all on the team).
Oversee ML training. Evaluate how well models are learning, figure out why they are learning badly, and identify+prioritize+implement changes to make them learn better. Tune hyperparameters and manage computing resources. Process datasets for machine consumption; understand datasets and how they affect the model...
view more