Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in philosophy from the University of Oxford.
You can find links and a transcript at www.hearthisidea.com/episodes/carlsmith
In this episode we talked about a report Joe recently authored, titled ‘Scheming AIs: Will AIs fake alignment during training in order to get power?’. The report “examines whether advanced AIs that perform well in training will be doing so in order to gain power later”; a behaviour Carlsmith calls scheming.
We talk about:
You can get in touch through our website or on Twitter. Consider leaving us an honest review wherever you're listening to this — it's the best free way to support the show. Thanks for listening!
Create your
podcast in
minutes
It is Free