Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Testing The Natural Abstraction Hypothesis: Project Intro, published by johnswentworth on the AI Alignment Forum.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
The natural abstraction hypothesis says that
Our physical world abstracts well: for most systems, the information relevant “far away” from the system (in various senses) is much lower-dimensional than the system itself. These low-dimensional summaries are exactly the high-level abstract objects/concepts typically used by humans.
These abstractions are “natural”: a wide variety of cognitive architectures will learn to use approximately the same high-level abstract objects/concepts to reason about the world.
If true, the natural abstraction hypothesis would dramatically simplify AI and AI alignment in particular. It would mean that a wide variety of cognitive architectures will reliably learn approximately-the-same concepts as humans use, and that these concepts can be precisely and unambiguously specified.
Ultimately, the natural abstraction hypothesis is an empirical claim, and will need to be tested empirically. At this point, however, we lack even the tools required to test it. This post is an intro to a project to build those tools and, ultimately, test the natural abstraction hypothesis in the real world.
Background & Motivation
One of the major conceptual challenges of designing human-aligned AI is the fact that human values are a function of humans’ latent variables: humans care about abstract objects/concepts like trees, cars, or other humans, not about low-level quantum world-states directly. This leads to conceptual problems of defining “what we want” in physical, reductive terms. More generally, it leads to conceptual problems in translating between human concepts and concepts learned by other systems - e.g. ML systems or biological systems.
If true, the natural abstraction hypothesis provides a framework for translating between high-level human concepts, low-level physical systems, and high-level concepts used by non-human systems.
The foundations of the framework have been sketched out in previous posts.
What is Abstraction? introduces the mathematical formulation of the framework and provides several examples. Briefly: the high-dimensional internal details of far-apart subsystems are independent given their low-dimensional “abstract” summaries. For instance, the Lumped Circuit Abstraction abstracts away all the details of molecule positions or wire shapes in an electronic circuit, and represents the circuit as components each summarized by some low-dimensional behavior - like V = IR for a resistor. This works because the low-level molecular motions in a resistor are independent of the low-level molecular motions in some far-off part of the circuit, given the high-level summary. All the rest of the low-level information is “wiped out” by noise in low-level variables “in between” the far-apart components.
In the causal graph of some low-level system, X is separated from Y by a bunch of noisy variables Z. For instance, X might be a resistor, Y might be a capacitor, and Z might be the wires (and air) between them. Noise in Z wipes out most of the low-level info about X, so that only a low-dimensional summary f(X) is relevant to predicting the state of Y.
Chaos Induces Abstractions explains one major reason why we expect low-level details to be independent (given high-level summaries) for typical physical systems. If I have a bunch of balls bouncing around perfectly elastically in a box, then the total energy, number of balls, and volume of the box are all conserved, but chaos wipes out all other information about the exact positions and velocities of the balls. My “high-level summary” is then the energy, number of balls, and volume of the box; all other low-lev...
view more