Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Thoughts on the Alignment Implications of Scaling Language Models, published by leogao on the AI Alignment Forum.
[Epistemic status: slightly rambly, mostly personal intuition and opinion that will probably be experimentally proven wrong within a year considering how fast stuff moves in this field]
This post is also available on my personal blog.
Thanks to Gwern Branwen, Steven Byrnes, Dan Hendrycks, Connor Leahy, Adam Shimi, Kyle and Laria for the insightful discussions and feedback.
Background
By now, most of you have probably heard about GPT-3 and what it does. There’s been a bunch of different opinions on what it means for alignment, and this post is yet another opinion from a slightly different perspective.
Some background: I'm a part of EleutherAI, a decentralized research collective (read: glorified discord server - come join us on Discord for ML, alignment, and dank memes). We're best known for our ongoing effort to create a GPT-3-like large language model, and so we have a lot of experience working with transformer models and looking at scaling laws, but we also take alignment very seriously and spend a lot of time thinking about it (see here for an explanation of why we believe releasing a large language model is good for safety). The inspiration for writing this document came out of the realization that there's a lot of tacit knowledge and intuitions about scaling and LMs that's being siloed in our minds that other alignment people might not know about, and so we should try to get that out there. (That being said, the contents of this post are of course only my personal intuitions at this particular moment in time and are definitely not representative of the views of all EleutherAI members.) I also want to lay out some potential topics for future research that might be fruitful.
By the way, I did consider that the scaling laws implications might be an infohazard, but I think that ship sailed the moment the GPT-3 paper went live, and since we’ve already been in a race for parameters for some time (see: Megatron-LM, Turing-NLG, Switch Transformer, PanGu-α/盘古α, HyperCLOVA, Wudao/悟道 2.0, among others), I don’t really think this post is causing any non-negligible amount of desire for scaling.
Why scaling LMs might lead to Transformative AI
Why natural language as a medium
First, we need to look at why a perfect LM could in theory be Transformative AI. Language is an extremely good medium for representing complex, abstract concepts compactly and with little noise. Natural language seems like a very efficient medium for this; images, for example, are much less compact and don’t have as strong an intrinsic bias towards the types of abstractions we tend to draw in the world. This is not to say that we shouldn’t include images at all, though, just that natural language should be the focus.
Since text is so flexible and good at being entangled with all sorts of things in the world, to be able to model text perfectly, it seems that you'd have to model all the processes in the world that are causally responsible for the text, to the “resolution” necessary for the model to be totally indistinguishable from the distribution of real text. For more intuition along this line, the excellent post Methods of prompt programming explores, among other ideas closely related to the ideas in this post, a bunch of ways that reality is entangled with the textual universe:
A novel may attempt to represent psychological states with arbitrarily fidelity, and scientific publications describe models of reality on all levels of abstraction. [...] A system which predicts the dynamics of language to arbitrary accuracy does require a theory of mind(s) and a theory of the worlds in which the minds are embedded. The dynamics of language do not float free from cultural, psychological, or physical...
view more