Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MATS AI Safety Strategy Curriculum, published by Ryan Kidd on March 8, 2024 on LessWrong.
As part of the MATS Winter 2023-24 Program, scholars were invited to take part in a series of weekly discussion groups on AI safety strategy. Each strategy discussion focused on a specific crux we deemed relevant to prioritizing AI safety interventions and was accompanied by a reading list and suggested discussion questions. The discussion groups were faciliated by several MATS alumni and other AI safety community members and generally ran for 1-1.5 h.
As assessed by our alumni reviewers, scholars in our Summer 2023 Program were much better at writing concrete plans for their research than they were at explaining their research's theory of change. We think it is generally important for researchers, even those early in their career, to critically evaluate the impact of their work, to:
Choose high-impact research directions and career pathways;
Conduct adequate risk analyses to mitigate unnecessary safety hazards and avoid research with a poor safety-capabilities advancement ratio;
Discover blindspots and biases in their research strategy.
We expect that the majority of improvements to the above areas occur through repeated practice, ideally with high-quality feedback from a mentor or research peers. However, we also think that engaging with some core literature and discussing with peers is beneficial. This is our attempt to create a list of core literature for AI safety strategy appropriate for the average MATS scholar, who should have completed the AISF Alignment Course.
We are not confident that the reading lists and discussion questions below are the best possible version of this project, but we thought they were worth publishing anyways. MATS welcomes feedback and suggestions for improvement.
Week 1: How will AGI arise?
What is AGI?
Karnofsky - Forecasting Transformative AI, Part 1: What Kind of AI? (13 min)
Metaculus - When will the first general AI system be devised, tested, and publicly announced? (read Resolution Criteria) (5 min)
How large will models need to be and when will they be that large?
Alexander - Biological Anchors: The Trick that Might or Might Not Work (read Parts I-II) (27 min)
Optional: Davidson - What a compute-centric framework says about AI takeoff speeds (20 min)
Optional: Habryka et al. - AI Timelines (dialogue between Ajeya Cotra, Daniel Kokotajlo, and Ege Erdil) (61 min)
Optional: Halperin, Chow, Mazlish - AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years (31 min)
How far can current architectures scale?
Patel - Will Scaling Work? (16 min)
Epoch - AI Trends (5 min)
Optional: Nostalgebraist - Chinchilla's Wild Implications (13 min)
Optional: Porby - Why I think strong general AI is coming soon (40 min)
What observations might make us update?
Ngo - Clarifying and predicting AGI (5 min)
Optional: Berglund et al. - Taken out of context: On measuring situational awareness in LLMs (33 min)
Optional: Cremer, Whittlestone - Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI (34 min)
Suggested discussion questions
If you look at any of the outside view models linked in "Biological Anchors: The Trick that Might or Might Not Work" (e.g., Ajeya Cotra's and Tom Davidson's models), which of their quantitative estimates do you agree or disagree with? Do your disagreements make your timelines longer or shorter?
Do you disagree with the models used to forecast AGI? That is, rather than disagree with their estimates of particular variables, do you disagree with any more fundamental assumptions of the model? How does that change your timelines, if at all?
If you had to make a probabilistic model to forecast AGI, what quantitative variables would you use and what fundamental assumptions would ...
view more