[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566
Support ongoing human narrations of LessWrong's curated posts:
www.patreon.com/LWCurated
Source:
https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-through
Narrated for LessWrong by Perrin Walker.
Share feedback on this narration.
[Curated Post] ✓
[125+ Karma Post] ✓
Create your
podcast in
minutes
It is Free