Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is:Testing The Natural Abstraction Hypothesis: Project Update, published by johnswentworth on the AI Alignment Forum.
I set myself six months to focus primarily on the Natural Abstraction Hypothesis project before stopping to re-evaluate. It’s been about six months since then. So, how has it gone?
This will be a “story telling” post, where I talk more about my research process and reasoning than about the results themselves. Be warned: this means I'm going to spout some technical stuff without explanation here and there, and in some cases I haven't even written a good explanation yet - this is a picture of my own thoughts. For more background on the results, the three main posts are:
The intro post for the overarching project, which I recommend reading.
Information At A Distance Is Mediated By Deterministic Constraints, which I also recommend reading.
Generalizing Koopman-Pitman-Darmois, which I do not recommend reading unless you want dense math.
Recap: The Original Plan
The Project Intro broke the Natural Abstraction Hypothesis into three sub-hypotheses:
Abstractability: for most physical systems, the information relevant “far away” can be represented by a summary much lower-dimensional than the system itself.
Human-Compatibility: These summaries are the abstractions used by humans in day-to-day thought/language.
Convergence: a wide variety of cognitive architectures learn and use approximately-the-same summaries.
That post suggested three types of experiments to test these:
Abstractability: does reality abstract well? Corresponding experiment type: run a reasonably-detailed low-level simulation of something realistic; see if info-at-a-distance is low-dimensional.
Human-Compatibility: do these match human abstractions? Corresponding experiment type: run a reasonably-detailed low-level simulation of something realistic; see if info-at-a-distance recovers human-recognizable abstractions.
Convergence: are these abstractions learned/used by a wide variety of cognitive architectures? Corresponding experiment type: train a predictor/agent against a simulated environment with known abstractions; look for a learned abstract model.
Alas, in order to run these sorts of experiments, we first need to solve some tough algorithmic problems. Computing information-at-a-distance in reasonably-complex simulated environments is a necessary step for all of these, and the “naive” brute-force method for this is very-not-tractable. It requires evaluating high-dimensional integrals over “noise” variables - a #P-complete problem in general. (#P-complete is sort of like NP-complete, but Harder.) Even just representing abstractions efficiently is hard - we’re talking about e.g. the state-distribution of a bunch of little patches of wood in some chunk of a chair given the state-distribution of some other little patches of wood in some other chunk of the chair. Explicitly writing out that whole distribution would take an amount of space exponential in the number of variables involved; that would be a data structure of size roughly O((# of states for a patch of wood)^(# of patches)).
My main goal for the past 6 months was to develop tools to make the experiments tractable - i.e. theorems, algorithms, working code, and proofs-of-concept to solve the efficiency problems.
When this 6 month subproject started out, I had a working proof-of-concept for linear systems. I was hoping that I could push that to somewhat more complex systems via linear approximations, figure out some useful principles empirically, and generally get a nice engineering-experiment-theory feedback loop going. That’s the fast way to make progress.
. Turns Out Chaos Is Not Linear
The whole “start with linear approximations and get a nice engineering-experiment-theory feedback loop going” plan ran straight into a brick wall. Not enti...
view more