Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer, published by johnswentworth on April 18, 2024 on LessWrong.
Yesterday Adam Shai put up a
cool post which… well, take a look at the visual:
Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less.
I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explainer for why on earth that fractal would show up, and partly as an explainer for the possibilities this work potentially opens up for interpretability.
One sentence summary: when tracking the hidden state of a hidden Markov model, a Bayesian's beliefs follow a chaos game (with the observations randomly selecting the update at each time), so the set of such beliefs naturally forms a fractal structure. By the end of the post, hopefully that will all sound straightforward and simple.
Background: Fractals and Symmetry
Let's start with the famous Sierpinski Triangle:
Looks qualitatively a lot like Shai's theoretically-predicted fractal, right? That's not a coincidence; we'll see that the two fractals can be generated by very similar mechanisms.
The key defining feature of the Sierpinski triangle is that it consists of three copies of itself, each shrunken and moved to a particular spot:
Mathematically: we can think of the Sierpinski triangle as a set of points in two dimensions (i.e. the blue points in the image). Call that set S. Then "the Sierpinski triangle consists of three copies of itself, each shrunken and moved to a particular spot" can be written algebraically as
S=f1(S)f2(S)f3(S)
where f1,f2,f3 are the three functions which "shrink and position" the three copies. (Conveniently, they are affine functions, i.e. linear transformations for the shrinking plus a constant vector for the positioning.)
That equation, S=f1(S)f2(S)f3(S), expresses the set of points in the Sierpinski triangle as a function of that same set - in other words, the Sierpinski triangle is a fixed point of that equation. That suggests a way to (approximately) compute the triangle: to find a fixed point of a function, start with some ~arbitrary input, then apply the function over and over again. And indeed, we can use that technique to generate the Sierpinski triangle.
Here's one standard visual way to generate the triangle:
Notice that this is a special case of repeatedly applying Sf1(S)f2(S)f3(S)! We start with the set of all the points in the initial triangle, then at each step we make three copies, shrink and position them according to the three functions, take the union of the copies, and then pass that set onwards to the next iteration.
… but we don't need to start with a triangle. As is typically the case when finding a fixed point via iteration, the initial set can be pretty arbitrary. For instance, we could just as easily start with a square:
… or even just some random points. They'll all converge to the same triangle.
Point is: it's mainly the symmetry relationship S=f1(S)f2(S)f3(S) which specifies the Sierpinski triangle. Other symmetries typically generate other fractals; for instance, this one generates a fern-like shape:
Once we know the symmetry, we can generate the fractal by iterating from some ~arbitrary starting point.
Background: Chaos Games
There's one big problem with computationally generating fractals via the iterative approach in the previous section: the number of points explodes exponentially. For the Sierpinski triangle, we need to make three copies each iteration, so after n timesteps we'll be tracking 3^n times...
view more