Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: The academic contribution to AI safety seems large, published by technicalities on the Effective Altruism Forum.
Summary: I model the contribution to AI safety by academics working in adjacent areas. I argue that this contribution is at least on the same order as the EA bet, and seek a lower bound. Guesstimates here and here. I focus on present levels of academic work, but the trend is even more important.
Confidence: High in a notable contribution, low in the particular estimates. Lots of Fermi estimates.
A big reason for the EA focus on AI safety is its neglectedness:
...less than $50 million per year is devoted to the field of AI safety or work specifically targeting global catastrophic biorisks.
80,000 Hours (2019)
...we estimate fewer than 100 people in the world are working on how to make AI safe.
80,000 Hours (2017)
Grand total: $9.09m. [Footnote: this] doesn’t include anyone generally working on verification/control, auditing, transparency, etc. for other reasons.
Seb Farquhar (2018)
...what we are doing is less than a pittance. You go to some random city... Along the highway you see all these huge buildings for companies... Maybe they are designing a new publicity campaign for a razor blade. You drive past hundreds of these... Any one of those has more resources than the total that humanity is spending on [AI safety].
Nick Bostrom (2016)
Numbers like these helped convince me that AI safety is the best thing to work on. I now think that these are underestimates, because of non-EA lines of research which weren't counted.
Use "EA safety" for the whole umbrella of work done at organisations like FHI, MIRI, DeepMind and OpenAI’s safety teams, and by independent researchers. A lot of this - maybe a third - is conducted at universities; to avoid double counting I count it as EA and not academia.
The argument:
EA safety is small, even relative to a single academic subfield.
There is overlap between capabilities and short-term safety work.
There is overlap between short-term safety work and long-term safety work.
So AI safety is less neglected than the opening quotes imply.
Also, on present trends, there’s a good chance that academia will do more safety over time, eventually dwarfing the contribution of EA.
What’s ‘safety’?
EA safety is best read as about “AGI alignment”: work on assuring that the actions of an extremely advanced system are sufficiently close to human-friendly goals.
EA focusses on AGI because weaker AI systems aren’t thought to be directly tied to existential risk. However, Critch and Krueger note that “prepotent” - unstoppably advanced, but not necessarily human-level - AI could still pose x-risks. The potential for this latter type is key to the argument that short-term work is relevant to us, since the scaling curves for some systems seem to be holding up, and so might reach prepotence.
“ML safety” could mean just making existing systems safe, or using existing systems as a proxy for aligning an AGI. The latter is sometimes called “mid-term safety”, and this is the key class of work for my purposes.
In the following “AI safety” means anything which helps us solve the AGI control problem.
De facto AI safety work
The line between safety work and capabilities work is sometimes blurred. A classic example is ‘robustness’: it is both a safety problem and a capabilities problem if your system can be reliably broken by noise. Transparency (increasing direct human access to the goals and properties of learned systems) is the most obvious case of work relevant to capabilities, short-term safety, and AGI alignment. As well as being a huge academic fad, it's a core mechanism in 6 out of the 11 live AGI alignment proposals recently summarised by Hubinger.
More controversial is whether there’s significant overlap between short-term safety and AGI alignment. Al...
view more