The Nonlinear Library: EA Forum
Education
EA - Center on Long-Term Risk: Annual review and fundraiser 2023 by Center on Long-Term Risk
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Center on Long-Term Risk: Annual review and fundraiser 2023, published by Center on Long-Term Risk on December 13, 2023 on The Effective Altruism Forum.Jesse CliftonCrossposted to LessWrong hereThis is a brief overview of theCenter on Long-Term Risk (CLR)'s activities in 2023 and our plans for 2024. We are hoping to fundraise $770,000 to fulfill our target budget in 2024.About usCLR works on addressing the worst-case risks from the development and deployment of advanced AI systems in order to reduces-risks. Our research primarily involves thinking about how to reduce conflict and promote cooperation in interactions involving powerful AI systems. In addition to research, we do a range of activities aimed at building a community of people interested in s-risk reduction, and support efforts that contribute to s-risk reduction via theCLR Fund.Review of 2023ResearchOur research in 2023 primarily fell in a few buckets:Commitment races and safe Pareto improvements deconfusion. Many researchers in the area considercommitment races a potentially important driver of conflict involving AI systems. But we have been missing a precise understanding of the mechanisms by which they could lead to conflict. We believe we made significant progress on this over the last year. This includes progress on understanding the conditions under which an approach to bargaining called "safe Pareto improvements (SPIs)" can prevent catastrophic conflict.Most of this work is non-public, but public documents that came out of this line of work includeOpen-minded updatelessness,Responses to apparent rationalist confusions about game / decision theory, and a forthcoming paper (seedraft) & post on SPIs for expected utility maximizers.Paths to implementing surrogate goals.Surrogate goals are a special case of SPIs and we consider them a promising route to reducing the downsides from conflict. We (along with CLR-external researchers Nathaniel Sauerberg and Caspar Oesterheld) thought about how implementing surrogate goals could be both credible and counterfactual (i.e., not done by AIs by default), e.g., usingcompute monitoring schemes.CLR researchers, in collaboration with Caspar Oesterheld and Filip Sondej, are also working on a project to "implement" surrogate goals/SPIs in contemporary language models.Conflict-prone dispositions. We thought about the kinds of dispositions that could exacerbate conflict, and how they might arise in AI systems. The primary motivation for this line of work is that, even if alignment does not fully succeed, we may be able to shape their dispositions in coarse-grained ways that reduce the risks of worse-than-extinction outcomes. See our post onmaking AIs less likely to be spiteful.Evaluations of LLMs. We continued ourearlier work on evaluating cooperation-relevant properties in LLMs. Part of this involved cheap exploratory work with GPT-4 and Claude (e.g., looking at behavior in scenarios from theMachiavelli dataset) to see if there were particularly interesting behaviors worth investing more time in.We also worked with external collaborators to develop "Welfare Diplomacy", a variant of the Diplomacy game environment designed to be better for facilitating Cooperative AI research. Wewrote a paper introducing the benchmark and using it to evaluate several LLMs.Community buildingProgress on s-risk community building was slow, due to the departures of our community building staff and funding uncertainties that prevented us from immediately hiring another Community Manager.We continued having career calls;We ran our fourthSummer Research Fellowship, with 10 fellows;We have now hired a new Community Manager, Winston Oswald-Drummond, who has just started.Staff & leadership changesWe saw some substantial staff changes this year, with three staff m...
Create your
podcast in
minutes
It is Free