Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind that AI may be integral to improving humanity’s long-term potential, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we review a collection of time-tested concepts from hazard analysis and systems safety, which have been designed to steer large processes in safer directions. We then discuss how AI researchers can realistically have long-term impacts on the safety of AI systems. Finally, we discuss how to robustly shape the processes that will affect the balance between safety and general capabilities.
2022: Dan Hendrycks, Mantas Mazeika
https://arxiv.org/pdf/2206.05862v7.pdf
view more