Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Technical AGI safety research outside AI, published by richard_ngo on the AI Alignment Forum.
Write a Review
I think there are many questions whose answers would be useful for technical AGI safety research, but which will probably require expertise outside AI to answer. In this post I list 30 of them, divided into four categories. Feel free to get in touch if you’d like to discuss these questions and why I think they’re important in more detail. I personally think that making progress on the ones in the first category is particularly vital, and plausibly tractable for researchers from a wide range of academic backgrounds.
Studying and understanding safety problems
How strong are the economic or technological pressures towards building very general AI systems, as opposed to narrow ones? How plausible is the CAIS model of advanced AI capabilities arising from the combination of many narrow services?
What are the most compelling arguments for and against discontinuous versus continuous takeoffs? In particular, how should we think about the analogy from human evolution, and the scalability of intelligence with compute?
What are the tasks via which narrow AI is most likely to have a destabilising impact on society? What might cyber crime look like when many important jobs have been automated?
How plausible are safety concerns about economic dominance by influence-seeking agents, as well as structural loss of control scenarios? Can these be reformulated in terms of standard economic ideas, such as principal-agent problems and the effects of automation?
How can we make the concepts of agency and goal-directed behaviour more specific and useful in the context of AI (e.g. building on Dennett’s work on the intentional stance)? How do they relate to intelligence and the ability to generalise across widely different domains?
What are the strongest arguments that have been made about why advanced AI might pose an existential threat, stated as clearly as possible? How do the different claims relate to each other, and which inferences or assumptions are weakest?
Solving safety problems
What techniques used in studying animal brains and behaviour will be most helpful for analysing AI systems and their behaviour, particularly with the goal of rendering them interpretable?
What is the most important information about deployed AI that decision-makers will need to track, and how can we create interfaces which communicate this effectively, making it visible and salient?
What are the most effective ways to gather huge numbers of human judgments about potential AI behaviour, and how can we ensure that such data is high-quality?
How can we empirically test the debate and factored cognition hypotheses? How plausible are the assumptions about the decomposability of cognitive work via language which underlie debate and iterated distillation and amplification?
How can we distinguish between AIs helping us better understand what we want and AIs changing what we want (both as individuals and as a civilisation)? How easy is the latter to do; and how easy is it for us to identify?
Various questions in decision theory, logical uncertainty and game theory relevant to agent foundations.
How can we create secure containment and supervision protocols to use on AI, which are also robust to external interference?
What are the best communication channels for conveying goals to AI agents? In particular, which ones are most likely to incentivise optimisation of the goal specified through the channel, rather than modification of the communication channel itself?
How closely linked is the human motivational system to our intellectual capabilities - to what extent does the orthogonality thesis apply to human-like brains? What can we learn from the range of variation in human motivational systems (e.g. induced b...
view more