The Pavlov Strategy by sarahconstantin

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong Top Posts

Education

The Pavlov Strategy by sarahconstantin

2021-12-12

iOS

Android Share

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Pavlov Strategy, published by...

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: The Pavlov Strategy, published by sarahconstantinon the LessWrong.
Epistemic Status: Common knowledge, just not to me
The Evolution of Trust is a deceptively friendly little interactive game. Near the end, there’s a “sandbox” evolutionary game theory simulator. It’s pretty flexible. You can do quick experiments in it without writing code. I highly recommend playing around.
One of the things that surprised me was a strategy the game calls Simpleton, also known in the literature as Pavlov. In certain conditions, it works pretty well — even better than tit-for-tat or tit-for-tat with forgiveness.
Let’s set the framework first. You have a Prisoner’s dilemma type game.
If both parties cooperate, they each get +2 points.
If one cooperates and the other defects, the defector gets +3 points and the cooperator gets -1 point
If both defect, both get 0 points.
This game is iterated — you’re randomly assigned to a partner and you play many rounds. Longer rounds reward more cooperative strategies; shorter rounds reward more defection.
It’s also evolutionary — you have a proportion of bots each playing their strategies, and after each round, the bots with the most points replicate and the bots with the least points die out. Successful strategies will tend to reproduce while unsuccessful ones die out. In other words, this is the Darwin Game.
Finally, it’s stochastic — there’s a small probability that any bot will make a mistake and cooperate or defect at random.
Now, how does Pavlov work?
Pavlov starts off cooperating. If the other player cooperates with Pavlov, Pavlov keeps doing whatever it’s doing, even if it was a mistake; if the other player defects, Pavlov switches its behavior, even if it was a mistake.
In other words, Pavlov:
cooperates when you cooperate with it, except by mistake
“pushes boundaries” and keeps defecting when you cooperate, until you retaliate
“concedes when punished” and cooperates after a defect/defect result
“retaliates against unprovoked aggression”, defecting if you defect on it while it cooperates.
If there’s any randomness, Pavlov is better at cooperating with itself than Tit-For-Tat. One accidental defection and two Tit-For-Tats are stuck in an eternal defect cycle, while Pavlov’s forgive each other and wind up back in a cooperate/cooperate pattern.
Moreover, Pavlov can exploit CooperateBot (if it defects by accident, it will keep greedily defecting against the hapless CooperateBot, while Tit-For-Tat will not) but still exerts some pressure against DefectBot (defecting against it half the time, compared to Tit-For-Tat’s consistent defection.)
The interesting thing is that Pavlov can beat Tit-For-Tat or Tit-for-Tat-with-Forgiveness in a wide variety of scenarios.
If there are only Pavlov and Tit-For-Tat bots, Tit-For-Tat has to start out outnumbering Pavlov quite significantly in order to win. The same is true for a population of Pavlov and Tit-For-Tat-With-Forgiveness. It doesn’t change if we add in some Cooperators or Defectors either.
Why?
Compared to Tit-For-Tat, Pavlov cooperates better with itself. If two Tit-For-Tat bots are paired, and one of them accidentally defects, they’ll be stuck in a mutual defection equilibrium. However, if one Pavlov bot accidentally defects against its clone, we’ll see
C/D -> D/D -> C->C
which recovers a mutual-cooperation equilibrium and picks up more points.
Compared to Tit-For-Tat-With-Forgiveness, Pavlov cooperates worse with itself (it takes longer to recover from mistakes) but it “exploits” TFTWF’s patience better. If Pavlov accidentally defects against TFTWF, the result is
D/C -> D/C -> D/D -> C/D -> D/D -> C/C,
which leaves Pavlov with a net gain of 1 point per turn, (over the first five turns before a cooperative equilibrium) compared to TFTWF’s 1/5 point per turn.
If TFTWF accidentally defects against Pavl...