Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Feature Selection, published by on LessWrong
You wake up. You don't know where you are. You don't remember anything.
Someone is broadcasting data at your first input stream. You don't know why. It tickles.
You look at your first input stream. It's a sequence of 671,187 eight-bit unsigned integers.
0, 8, 9, 4, 7, 7, 9, 5, 4, 5, 6, 1, 7, 5, 8, 2, 7, 8, 9, 4, 7, 1, 4, 0, 3, 7,
8, 7, 6, 8, 1, 5, 0, 6, 5, 3, 8, 7, 6, 9, 1, 1, 0, 0, 6, 1, 8, 0, 5, 5, 1, 8,
6, 3, 3, 2, 4, 1, 8, 2, 3, 8, 1, 0, 0, 4, 6, 5, 4, 5, 7, 1, 6, 5, 5, 1, 2, 6,
7, 4, 8, 7, 8, 5, 0 ...
There's also some data in your second input stream. It's—a lot shorter. You barely feel it. It's another sequence of eight-bit unsigned integers—twelve of them.
82, 69, 68, 32, 84, 82, 73, 65, 78, 71, 76, 69
Almost as soon as you've read from both streams, there's more. Another 671,187 integers on the first input stream. Another ten on the second input stream.
And again (671,187 and 15).
And again (671,187 and 13).
You look at one of the sequences from the first input stream. It's pretty boring. A bunch of seemingly random numbers, all below ten.
9, 5, 0, 3, 1, 1, 3, 4, 1, 5, 5, 4, 9, 3, 5, 3, 9, 2, 0, 3, 4, 2, 4, 7, 5, 1,
6, 2, 2, 8, 2, 5, 1, 9, 2, 5, 9, 0, 0, 8, 2, 3, 7, 9, 4, 6, 8, 4, 8, 6, 7, 6,
8, 0, 0, 5, 1, 1, 7, 3, 4, 3, 9, 7, 5, 1, 9, 6, 5, 6, 8, 9, 4, 7, 7, 0, 5, 5,
8, 6, 3, 2, 1, 5, 0, 0 ...
It just keeps going like that, seemingly without—wait! What's that?!
The 42,925th and 42,926th numbers in the sequence are 242 and 246. Everything around them looks "ordinary"—just more random numbers below ten.
9, 9, 7, 9, 0, 6, 4, 6, 1, 4, 242, 246, 3, 3, 5, 8, 8, 4, 4, 5, 9, 2, 7, 0,
4, 9, 2, 9, 4, 3, 8, 9, 3, 6, 9, 8, 1, 9, 2, 8, 6, 9, 4, 2, 2, 5, 7, 0, 9, 5,
1, 4, 4, 2, 0, 1, 5, 1, 6, 1, 2, 3, 5, 5, 5, 5, 2, 0, 6, 3, 5, 9, 0, 7, 0, 7,
8, 1, 5, 5, 6, 3, 1 ...
And then it just keeps going as before ... before too long. You spot another pair of anomalously high numbers—except this time there are two pairs: the 44,344th, 44,345th, 44,347th, and 44,348th positions in the sequence are 248, 249, 245, and 240, respectively.
6, 0, 2, 8, 4, 248, 249, 8, 245, 240, 1, 6, 7, 7, 3, 6, 8, 0, 1, 9, 3, 9, 3,
1, 9, 3, 1, 6, 2, 7, 0, 2, 1, 4, 9, 4, 7, 5, 3, 6, 1, 4, 4, 1, 6, 1, 3, 3, 7,
5, 3, 8, 5, 5, 7, 6, 8, 2, 3, 9, 1, 1, 3, 2, 8, 4, 7, 0, 1, 3, 5, 2, 2, 4, 8,
3, 7, 0, 2, 1, 3, 0 ...
The anomalous two-forty-somethings crop up again starting at the 45,763rd position—this time eight of them, again in pairs separated by an "ordinary" small number.
1, 7, 2, 2, 1, 0, 245, 245, 6, 248, 244, 5, 242, 242, 0, 248, 246, 1, 1, 3,
1, 1, 4, 3, 1, 5, 4, 3, 8, 3, 4, 5, 4, 1, 7, 7, 3, 0, 2, 8, 0, 9, 5, 1, 1, 7,
7, 1, 0, 9, 3, 0, 6, 6, 7, 5, 8, 1, 5, 5, 5, 3, 3, 3, 1, 3, 9, 6, 0, 0, 0, 9,
5, 1, 4, 0, 4, 6 ...
Two, four, eight—does it keep going like that? "Bursts" of increasingly many paired two-forty-somethings, punctuating the quiet background radiation of single digits? What does it mean?
You allocate a new scratch buffer and write a quick Python function to count up the segments of two-forty-somethings. (This is apparently a thing you can do—it's an instinctive felt sense, like the input streams. You can't describe in words how you do it—any more than someone could say how they decide to move their arm. Although, come to think of it, you don't seem to have any arms. Is that unusual?)
def count_burst_lengths(data):
bursts = []
counter = 0
previous = None
for datum in data:
if datum >= 240:
counter += 1
else:
# consecutive "ordinary" numbers mean the burst is over
if counter and previous and previous
bursts.append(counter)
counter = 0
previous = datum
return bursts
There are 403 such bursts in the sequence: they get progressively longer at first, but then decrease and taper off:
2, 4, 8, 12, 16, 18, 24, 28, 32, 34, 38, 42, 46, 48, 5...
view more