Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Analyzing the moral value of unaligned AIs, published by Matthew Barnett on April 8, 2024 on The Effective Altruism Forum.
A crucial consideration in assessing the risks of advanced AI is the moral value we place on "unaligned" AIs - systems that do not share human preferences - which could emerge if we fail to make enough progress on technical alignment.
In this post I'll consider three potential moral perspectives, and analyze what each of them has to say about the normative value of the so-called "default" unaligned AIs that humans might eventually create:
Standard total utilitarianism combined with longtermism: the view that what matters most is making sure the cosmos is eventually filled with numerous happy beings.
Human species preservationism: the view that what matters most is making sure the human species continues to exist into the future, independently from impartial utilitarian imperatives.
Near-termism or present-person affecting views: what matters most is improving the lives of those who currently exist, or will exist in the near future.
I argue that from the first perspective, unaligned AIs don't seem clearly bad in expectation relative to their alternatives, since total utilitarianism is impartial to whether AIs share human preferences or not. A key consideration here is whether unaligned AIs are less likely to be conscious, or less likely to bring about consciousness, compared to alternative aligned AIs. On this question, I argue that there are considerations both ways, and no clear answers.
Therefore, it tentatively appears that the normative value of alignment work is very uncertain, and plausibly neutral, from a total utilitarian perspective.
However, technical alignment work is much more clearly beneficial from the second and third perspectives. This is because AIs that share human preferences are likely to both preserve the human species and improve the lives of those who currently exist. However, in the third perspective, pausing or slowing down AI is far less valuable than in the second perspective, since it forces existing humans to forego benefits from advanced AI, which I argue will likely be very large.
I personally find moral perspectives (1) and (3) most compelling, and by contrast find view (2) to be uncompelling as a moral view. Yet it is only from perspective (2) that significantly delaying advanced AI for alignment reasons seems clearly beneficial, in my opinion. This is a big reason why I'm not very sympathetic to pausing or slowing down AI as a policy proposal.
While these perspectives do not exhaust the scope of potential moral views, I think this analysis can help to sharpen what goals we intend to pursue by promoting particular forms of AI safety work.
Unaligned AIs from a total utilitarian point of view
Let's first consider the normative value of unaligned AIs from the first perspective. From a standard total utilitarian perspective, entities matter morally if they are conscious (under hedonistic utilitarianism) or if they have preferences (under preference utilitarianism). From this perspective, it doesn't actually matter much intrinsically if AIs don't share human preferences, so long as they are moral patients and have their preferences satisfied.
The following is a prima facie argument that utilitarians shouldn't care much about technical AI alignment work. Utilitarianism is typically not seen as partial to human preferences in particular. Therefore, efforts to align AI systems with human preferences - the core aim of technical alignment work - may be considered morally neutral from a utilitarian perspective.
The reasoning here is that changing the preferences of AIs to better align them with the preferences of humans doesn't by itself clearly seem to advance the aims of utilitarianism, in the sense of filling the cosmos w...
view more