Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: The Solomonoff Prior is Malign , published by Mark Xu on the LessWrong.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This argument came to my attention from this post by Paul Christiano. I also found this clarification helpful. I found these counter-arguments stimulating and have included some discussion of them.
Very little of this content is original. My contributions consist of fleshing out arguments and constructing examples.
Thank you to Beth Barnes and Thomas Kwa for helpful discussion and comments.
What is the Solomonoff prior?
The Solomonoff prior is intended to answer the question "what is the probability of X?" for any X, where X is a finite string over some finite alphabet. The Solomonoff prior is defined by taking the set of all Turing machines (TMs) which output strings when run with no input and weighting them proportional to
2
−
K
, where
K
is the description length of the TM (informally its size in bits).
The Solomonoff prior says the probability of a string is the sum over all the weights of all TMs that print that string.
One reason to care about the Solomonoff prior is that we can use it to do a form of idealized induction. If you have seen 0101 and want to predict the next bit, you can use the Solomonoff prior to get the probability of 01010 and 01011. Normalizing gives you the chances of seeing 1 versus 0, conditioned on seeing 0101. In general, any process that assigns probabilities to all strings in a consistent way can be used to do induction in this way.
This post provides more information about Solomonoff Induction.
Why is it malign?
Imagine that you wrote a programming language called python^10 that works as follows: First, it takes all alpha-numeric chars that are not in literals and checks if they're repeated 10 times sequentially. If they're not, they get deleted. If they are, they get replaced by a single copy. Second, it runs this new program through a python interpreter.
Hello world in python^10:
ppppppppprrrrrrrrrriiiiiiiiiinnnnnnnnnntttttttttt('Hello, world!')
Luckily, python has an exec function that executes literals as code. This lets us write a shorter hello world:
eeeeeeeeexxxxxxxxxxeeeeeeeeeecccccccccc("print('Hello, world!')")
It's probably easy to see that for nearly every program, the shortest way to write it in python^10 is to write it in python and run it with exec. If we didn't have exec, for sufficiently complicated programs, the shortest way to write them would be to specify an interpreter for a different language in python^10 and write it in that language instead.
As this example shows, the answer to "what's the shortest program that does X?" might involve using some roundabout method (in this case we used exec). If python^10 has some security properties that python didn't have, then the shortest program in python^10 that accomplished any given task would not have these security properties because they would all pass through exec. In general, if you can access alternative ‘modes’ (in this case python), the shortest programs that output any given string might go through one of those modes, possibly introducing malign behavior.
Let's say that I'm trying to predict what a human types next using the Solomonoff prior. Many programs predict the human:
Simulate the human and their local surroundings. Run the simulation forward and check what gets typed.
Simulate the entire Earth. Run the simulation forward and check what that particular human types.
Simulate the entire universe from the beginning of time. Run the simulation forward and check what that particular human types.
Simulate an entirely different universe that has reason to simulate this universe. Output what the human types in the simulation of our universe.
Which one is the simplest? One property of the Solmonoff prior is ...
view more