Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: AI Alignment 2018-19 ReviewΩ , published by rohinmshah on LessWrong
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
Preamble
What this post is
This is a review post of public work in AI alignment over 2019, with some inclusions from 2018. It has this preamble (~700 words), a short version / summary (~1.6k words), and a long version (~8.3k words). It is available as a Google Doc here.
There are many areas of work that are relevant to AI alignment that I have barely touched on, such as interpretability, uncertainty estimation, adversarial examples, and assured autonomy, primarily because I have not been following these fields and wouldn’t be able to write a good summary of what has happened in them. I have also mostly focused on articles that provide some conceptual insight, and excluded or briefly linked to papers that primarily make quantitative improvements on important metrics. While such papers are obviously important (ultimately, our techniques need to work well), there isn’t much to say about them in a yearly review other than that the quantitative metric was improved.
Despite these exclusions, there was still a ton of work to select from, perhaps around ~500 articles, of which over 300 have been linked to in this post. There are many interesting articles that I really enjoyed that get only a sentence of description, in which I ignore many of the points that the article makes. Most have been summarized in the Alignment Newsletter, so if you’d like to learn more about any particular link, but don’t want to read the entire thing, just search for its title in the database.
What you should know about the structure of this post
I am not speaking for myself; by default I am trying to explain what has been said, in a way that the authors of the articles would agree with. Any extra opinion that I add will be in italics.
As a post, this is meant to be read sequentially, but the underlying structure is a graph (nodes are posts, edges connect posts that are very related). I arranged it in a sequence that highlights the most salient-to-me connections. This means that the order in which I present subtopics is very much not a reflection of what I think is most important in AI safety: in my presentation order, I focused on edges (connections) rather than nodes (subtopics).
Other minor details:
Any links from earlier than 2018 will have their year of publication right after the link (except for articles that were reposted as part of Alignment Forum sequences).
I typically link to blog posts; in several cases there is also an associated paper that I have not linked.
How to read this post
I have put the most effort into making the prose of the long version read smoothly. The hierarchical organization is comparatively less coherent; this is partly because I optimized the prose, and partly because AI safety work is hard to cluster. As a result, for those willing to put in the effort, I’d recommend reading the long version directly, without paying too much attention to the hierarchy. If you have less time, or are less interested in the minutiae of AI alignment research, the short version is for you.
Since I don’t name authors or organizations, you may want to take this as your opportunity to form beliefs about which arguments in AI alignment are important based on the ideas (as opposed to based on trust in the author of the post).
People who keep up with AI alignment work might want to know which posts I’m referencing as they read, which is a bit hard since I don’t name the posts in the text. If this describes you, you should be reading this post on the Alignment Forum, where you can hover over most links to see what they link to. Alternatively, the references section in the Google Doc lists all links in the order that they appear in the post, ...
view more