Download - LW - Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer by johnswentworth

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer by johnswentworth

2024-04-18

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer, published by johnswentworth on April 18, 2024 on LessWrong. Yesterday Adam Shai put up a cool post which… well, take a look at the visual: Yup, it sure looks like that fractal is very noisily embedded in the residual activations of a neural net trained on a toy problem. Linearly embedded, no less. I (John) initially misunderstood what was going on in that post, but some back-and-forth with Adam convinced me that it really is as cool as that visual makes it look, and arguably even cooler. So David and I wrote up this post / some code, partly as an explainer for why on earth that fractal would show up, and partly as an explainer for the possibilities this work potentially opens up for interpretability. One sentence summary: when tracking the hidden state of a hidden Markov model, a Bayesian's beliefs follow a chaos game (with the observations randomly selecting the update at each time), so the set of such beliefs naturally forms a fractal structure. By the end of the post, hopefully that will all sound straightforward and simple. Background: Fractals and Symmetry Let's start with the famous Sierpinski Triangle: Looks qualitatively a lot like Shai's theoretically-predicted fractal, right? That's not a coincidence; we'll see that the two fractals can be generated by very similar mechanisms. The key defining feature of the Sierpinski triangle is that it consists of three copies of itself, each shrunken and moved to a particular spot: Mathematically: we can think of the Sierpinski triangle as a set of points in two dimensions (i.e. the blue points in the image). Call that set S. Then "the Sierpinski triangle consists of three copies of itself, each shrunken and moved to a particular spot" can be written algebraically as S=f1(S)f2(S)f3(S) where f1,f2,f3 are the three functions which "shrink and position" the three copies. (Conveniently, they are affine functions, i.e. linear transformations for the shrinking plus a constant vector for the positioning.) That equation, S=f1(S)f2(S)f3(S), expresses the set of points in the Sierpinski triangle as a function of that same set - in other words, the Sierpinski triangle is a fixed point of that equation. That suggests a way to (approximately) compute the triangle: to find a fixed point of a function, start with some ~arbitrary input, then apply the function over and over again. And indeed, we can use that technique to generate the Sierpinski triangle. Here's one standard visual way to generate the triangle: Notice that this is a special case of repeatedly applying Sf1(S)f2(S)f3(S)! We start with the set of all the points in the initial triangle, then at each step we make three copies, shrink and position them according to the three functions, take the union of the copies, and then pass that set onwards to the next iteration. … but we don't need to start with a triangle. As is typically the case when finding a fixed point via iteration, the initial set can be pretty arbitrary. For instance, we could just as easily start with a square: … or even just some random points. They'll all converge to the same triangle. Point is: it's mainly the symmetry relationship S=f1(S)f2(S)f3(S) which specifies the Sierpinski triangle. Other symmetries typically generate other fractals; for instance, this one generates a fern-like shape: Once we know the symmetry, we can generate the fractal by iterating from some ~arbitrary starting point. Background: Chaos Games There's one big problem with computationally generating fractals via the iterative approach in the previous section: the number of points explodes exponentially. For the Sierpinski triangle, we need to make three copies each iteration, so after n timesteps we'll be tracking 3^n times...