Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Message Length, published by Zack_M_Davis on the AI Alignment Forum.
Someone is broadcasting a stream of bits. You don't know why. A 500-bit-long sample looks like this:
01100110110101011011111100001001110000100011010001101011011010000001010000001010
10100111101000101111010100100101010010101010101000010100110101010011111111010101
01010101011111110101011010101101111101010110110101010100000001101111100000111010
11100000000000001111101010110101010101001010101101010101100111001100001100110101
11111111111111111100011001011010011010101010101100000010101011101101010010110011
11111010111101110100010101010111001111010001101101010101101011000101100000101010
10011001101010101111...
The thought occurs to you to do Science to it—to ponder if there's some way you could better predict what bits are going to come next. At first you think you can't—it's just a bunch of random bits. You can't predict it, because that's what random means.
Or does it? True, if the sequence represented flips of a fair coin—every flip independently landing either 0 or 1 with exactly equal probability—then there would be no way you could predict what would come next: any continuation you could posit would be exactly as probable as any other.
But if the sequence represented flips of a biased coin—if, say, 1 came up 0.55 of the time instead of exactly 0.5—then it would be possible to predict better or worse. Your best bet for the next bit in isolation would always be 1, and you would more strongly anticipate sequences with slightly more 1s than 0s.
You count 265 1s in the sample of 500 bits. Given the hypothesis that the bits were generated by a fair coin, the number of 1s (or without loss of generality, 0s) would be given by the binomial distribution
500
k
0.5
k
0.5
500
−
k
, which has a standard deviation of
√
500
⋅
0.5
2
√
125
≈
11.18
, so your observation of
265
−
250
15
excess 1s is about
15
11.18
≈
1.34
standard deviations from the mean—well within the realm of plausibility of happening by chance, although you're at least slightly suspicious that the coin behind these bits might not be quite fair.
... that is, if it's even a coin. You love talking in terms of shiny, if hypothetical, "coins" rather than stodgy old "independent and identically distributed binary-valued random variables", but looking at the sample again, you begin to further doubt whether the bits are independent of each other. You've heard that humans are biased to overestimate the frequency of alternations (101010...) and underestimate the frequency of consecutive runs (00000... or 11111...) in "truly" (uniformly) random data, but the 500-bit sample contains a run of 13 0s (starting at position 243) and a run of 19 1s (starting at position 319). You're not immediately sure how to calculate the probability of that, but your gut says that should be very unlikely given the biased-coin model, even after taking into account that human guts aren't very good at estimating these things.
Maybe not everything in the universe is a coin. What if the bits were being generated by a Markov chain—if the probability of the next bit depended on the value of the one just before? If a 0 made the next bit more likely to be a 0, and the same for 1, that would make the 00000... and 11111... runs less improbable.
Except ... the sample also has a run of 17 alternations (starting at position 153). On the "fair coin" model, shouldn't that itself be
2
17
−
13
16
times as suspicious as the run of 13 0s and
2
17
−
19
1
4
as suspicious as the run of 19 1s which led you to hypothesize a Markov chain?—or rather, 8 and
1
8
times as suspicious, respectively, because there are two ways for an alternation to occur (0101010... or 1010101...).
A Markov chain in which a 0 or 1 makes another of the same more likely, makes alternations less likely: the Markov chain hypothesis c...
view more