Download - Thoughts on the Alignment Implications of Scaling Language Models by leogao

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: Alignment Forum Top Posts

Education

Thoughts on the Alignment Implications of Scaling Language Models by leogao

2021-12-04

Download Right click and do "save link as"

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Thoughts on the Alignment Implications of Scaling Language Models, published by leogao on the AI Alignment Forum.
[Epistemic status: slightly rambly, mostly personal intuition and opinion that will probably be experimentally proven wrong within a year considering how fast stuff moves in this field]
This post is also available on my personal blog.
Thanks to Gwern Branwen, Steven Byrnes, Dan Hendrycks, Connor Leahy, Adam Shimi, Kyle and Laria for the insightful discussions and feedback.
Background
By now, most of you have probably heard about GPT-3 and what it does. There’s been a bunch of different opinions on what it means for alignment, and this post is yet another opinion from a slightly different perspective.
Some background: I'm a part of EleutherAI, a decentralized research collective (read: glorified discord server - come join us on Discord for ML, alignment, and dank memes). We're best known for our ongoing effort to create a GPT-3-like large language model, and so we have a lot of experience working with transformer models and looking at scaling laws, but we also take alignment very seriously and spend a lot of time thinking about it (see here for an explanation of why we believe releasing a large language model is good for safety). The inspiration for writing this document came out of the realization that there's a lot of tacit knowledge and intuitions about scaling and LMs that's being siloed in our minds that other alignment people might not know about, and so we should try to get that out there. (That being said, the contents of this post are of course only my personal intuitions at this particular moment in time and are definitely not representative of the views of all EleutherAI members.) I also want to lay out some potential topics for future research that might be fruitful.
By the way, I did consider that the scaling laws implications might be an infohazard, but I think that ship sailed the moment the GPT-3 paper went live, and since we’ve already been in a race for parameters for some time (see: Megatron-LM, Turing-NLG, Switch Transformer, PanGu-α/盘古α, HyperCLOVA, Wudao/悟道 2.0, among others), I don’t really think this post is causing any non-negligible amount of desire for scaling.
Why scaling LMs might lead to Transformative AI
Why natural language as a medium
First, we need to look at why a perfect LM could in theory be Transformative AI. Language is an extremely good medium for representing complex, abstract concepts compactly and with little noise. Natural language seems like a very efficient medium for this; images, for example, are much less compact and don’t have as strong an intrinsic bias towards the types of abstractions we tend to draw in the world. This is not to say that we shouldn’t include images at all, though, just that natural language should be the focus.
Since text is so flexible and good at being entangled with all sorts of things in the world, to be able to model text perfectly, it seems that you'd have to model all the processes in the world that are causally responsible for the text, to the “resolution” necessary for the model to be totally indistinguishable from the distribution of real text. For more intuition along this line, the excellent post Methods of prompt programming explores, among other ideas closely related to the ideas in this post, a bunch of ways that reality is entangled with the textual universe:
A novel may attempt to represent psychological states with arbitrarily fidelity, and scientific publications describe models of reality on all levels of abstraction. [...] A system which predicts the dynamics of language to arbitrary accuracy does require a theory of mind(s) and a theory of the worlds in which the minds are embedded. The dynamics of language do not float free from cultural, psychological, or physical...