Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Newcomb's Problem and Regret of Rationality , published by Eliezer Yudkowsky on the AI Alignment Forum.
The following may well be the most controversial dilemma in the history of decision theory:
A superintelligence from another galaxy, whom we shall call Omega, comes to Earth and sets about playing a strange little game. In this game, Omega selects a human being, sets down two boxes in front of them, and flies away.
Box A is transparent and contains a thousand dollars.
Box B is opaque, and contains either a million dollars, or nothing.
You can take both boxes, or take only box B.
And the twist is that Omega has put a million dollars in box B iff Omega has predicted that you will take only box B.
Omega has been correct on each of 100 observed occasions so far - everyone who took both boxes has found box B empty and received only a thousand dollars; everyone who took only box B has found B containing a million dollars. (We assume that box A vanishes in a puff of smoke if you take only box B; no one else can take box A afterward.)
Before you make your choice, Omega has flown off and moved on to its next game. Box B is already empty or already full.
Omega drops two boxes on the ground in front of you and flies off.
Do you take both boxes, or only box B?
And the standard philosophical conversation runs thusly:
One-boxer: "I take only box B, of course. I'd rather have a million than a thousand."
Two-boxer: "Omega has already left. Either box B is already full or already empty. If box B is already empty, then taking both boxes nets me $1000, taking only box B nets me $0. If box B is already full, then taking both boxes nets $1,001,000, taking only box B nets $1,000,000. In either case I do better by taking both boxes, and worse by leaving a thousand dollars on the table - so I will be rational, and take both boxes."
One-boxer: "If you're so rational, why ain'cha rich?"
Two-boxer: "It's not my fault Omega chooses to reward only people with irrational dispositions, but it's already too late for me to do anything about that."
There is a large literature on the topic of Newcomblike problems - especially if you consider the Prisoner's Dilemma as a special case, which it is generally held to be. "Paradoxes of Rationality and Cooperation" is an edited volume that includes Newcomb's original essay. For those who read only online material, this PhD thesis summarizes the major standard positions.
I'm not going to go into the whole literature, but the dominant consensus in modern decision theory is that one should two-box, and Omega is just rewarding agents with irrational dispositions. This dominant view goes by the name of "causal decision theory".
As you know, the primary reason I'm blogging is that I am an incredibly slow writer when I try to work in any other format. So I'm not going to try to present my own analysis here. Way too long a story, even by my standards.
But it is agreed even among causal decision theorists that if you have the power to precommit yourself to take one box, in Newcomb's Problem, then you should do so. If you can precommit yourself before Omega examines you; then you are directly causing box B to be filled.
Now in my field - which, in case you have forgotten, is self-modifying AI - this works out to saying that if you build an AI that two-boxes on Newcomb's Problem, it will self-modify to one-box on Newcomb's Problem, if the AI considers in advance that it might face such a situation. Agents with free access to their own source code have access to a cheap method of precommitment.
What if you expect that you might, in general, face a Newcomblike problem, without knowing the exact form of the problem? Then you would have to modify yourself into a sort of agent whose disposition was such that it would generally receive high rewards on Newcomblike problems....
view more