AF - Sycophancy to subterfuge: Investigating reward tampering in large language models by Evan Hubinger
AF - Analysing Adversarial Attacks with Linear Probing by Yoann Poupart
LW - OpenAI #8: The Right to Warn by Zvi
LW - Towards a Less Bullshit Model of Semantics by johnswentworth
EA - The social disincentives of warning about unlikely risks by Lucius Caviola
EA - Launching the Global Health Funding Circle by Joey
LW - (Appetitive, Consummatory) (RL, reflex) by Steven Byrnes
EA - Questionable Narratives of "Situational Awareness" by fergusq
EA - Advice for EA org staff and EA group organisers interacting with political campaigns by Catherine Low
AF - Degeneracies are sticky for SGD by Guillaume Corlouer
EA - Kaya Guides Pilot Results by RachelAbbott
EA - Mr Beast is now officially an EA! by Dave Cortright
AF - Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller by Henry Cai
EA - On the Dwarkesh/Chollet Podcast, and the cruxes of scaling to AGI by JWS
LW - CIV: a story by Richard Ngo
EA - Moral Misdirection (full post) by Richard Y Chappell
LW - MIRI's June 2024 Newsletter by Harlan
LW - Rational Animations' intro to mechanistic interpretability by Writer
LW - Shard Theory - is it true for humans? by Rishika
EA - What "Effective Altruism" Means to Me by Richard Y Chappell
Create your
podcast in
minutes
It is Free
Teachers Talk Radio
LifeBlood
Navigating Life After 40
The Jordan B. Peterson Podcast
The Mel Robbins Podcast