Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: How long does it take to become Gaussian? , published by Maxwell Peterson on the AI Alignment Forum.
The central limit theorems all say that if you convolve stuff enough, and that stuff is sufficiently nice, the result will be a Gaussian distribution. How much is enough, and how nice is sufficient?
Identically-distributed distributions converge quickly
For many distributions
d
, the repeated convolution
d
∗
d
∗
⋯
∗
d
looks Gaussian. The number of convolutions you need to look Gaussian depends on the shape of
d
. This is the easiest variant of the central limit theorem: identically-distributed distributions.
The uniform distribution converges real quick:
The result of uniform(1, 2) uniform(1, 2) ... uniform(1, 2), with 30 distributions total. This plot is an animated version of the plots in the previous post. The black curve is the Gaussian distribution with the same mean and variance as the red distribution. The more similar red is to black, the more Gaussian the result of the convolutions is.
The numbers on the x axis are increasing because the mean of
f
∗
g
is the sum of the means of
f
and
g
, so if we start with positive means, repeated convolutions shoot off into higher numbers. Similar for the variance - notice how the width starts as the difference between 1 and 2, but ends with differences in the tens. You can keep the location stationary under convolution by starting with a distribution centered at 0, but you can't keep the variance from increasing, because you can't have a variance of 0 (except in the limiting case).
Here's a more skewed distribution: beta(50, 1). beta(50, 1) is the probability distribution that represents knowing that a lake has bass and carp, but not how many of each, and then catching 49 bass in a row. It's fairly skewed! This time, after 30 convolutions, we're not quite Gaussian - the skew is still hanging around. But for a lot of real applications, I'd call the result "Gaussian enough".
beta(50, 1) convolved with itself 30 times.
A similar skew in the opposite direction, from the exponential distribution:
exp(20)
I was surprised to see the exponential distribution go into a Gaussian, because Wikipedia says that an exponential distribution with parameter
θ
goes into a gamma distribution with parameters gamma(
n
θ
) when you convolve it with itself
n
times. But it turns out gamma(
n
θ
) looks more and more Gaussian as
n
goes up.
How about our ugly bimodal-uniform distribution?
It starts out rough and jagged, but already by 30 convolutions it's Gaussian.
And here's what it looks like to start with a Gaussian:
The red curve starts out the exact same as the black curve, then nothing happens because Gaussians stay Gaussian under self-convolution.
An easier way to measure Gaussianness (Gaussianity?)
We're going to want to look at many more distributions under
n
convolutions and see how close they are to Gaussian, and these animations take a lot of space. We need a more compact way. So let's measure the kurtosis of the distributions, instead. The kurtosis is the fourth moment of a probability distribution; it describes the shape of the tails. All Gaussian distributions have kurtosis 3. There are other distributions with kurtosis 3, too, but they're not likely to be the result of a series of convolutions. So to check how close a distribution is to Gaussian, we can just check how far from 3 its kurtosis is.
We can chart the kurtosis as a function of how many convolutions have been done so far, for each of the five distributions above:
We see our conclusions from the animations repeated: the exp(20), being very skewed, is the furthest from Gaussian after 30 convolutions. beta(50, 1), also skewed, is also relatively far (though close in absolute terms). The bimodal and uniform got to Gaussian much faster, in the animations, and we see that refle...
view more