Download - LW - Simple versus Short: Higher-order degeneracy and error-correction by Daniel Murfet

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - Simple versus Short: Higher-order degeneracy and error-correction by Daniel Murfet

2024-03-11

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Simple versus Short: Higher-order degeneracy and error-correction, published by Daniel Murfet on March 11, 2024 on LessWrong.
TLDR: The simplicity bias in Bayesian statistics is not just a bias towards short description length.
The folklore relating the simplicity bias in Bayesian statistics to description length is incomplete: while it is true that the fewer parameters you use the better, the true complexity measure which appears in the mathematical theory of Bayesian statistics (that is, singular learning theory) is more exotic. The content of this complexity measure remains quite mysterious, but in this note we point out that in a particular setting it includes a bias towards runtime error-correction.
This suggests caution when reasoning about the role of inductive biases in neural network training.
Acknowledgements. Thanks to Jesse Hoogland, Liam Carroll, Rumi Salazar and Simon Pepin Lehalleur for comments.
1. Background
1.1 Relevance to Deep Learning
Consider the problem of solving an ordinary differential equation. A constructive proof involves actually writing down a solution, or an algorithm that in finite time will produce a solution. The Picard-Lindelöf theorem proves that a solution to a broad class of initial value problems exists, but the proof is not constructive: it sets up a contraction mapping on a complete metric space and appeals to the Banach fixed point theorem.
While the Picard-Lindelöf theorem uniquely characterises the solution as the fixed point of a contraction mapping, and gives an iterative process for approximating the solution, it does not construct the solution. However a construction is not necessary for many of the applications of Picard-Lindelöf (in differential geometry, topology and many parts of analysis).
This mode of reasoning about mathematical objects, where it suffices to have characterised[1] them by (universal) properties, is pervasive in modern mathematics (in the above example, the characterising property is the differential equation, or its associated contraction mapping).
However this may seem quite alien to a computer scientist or programmer, who for historical reasons tend to think that there is only one mode of reasoning about mathematical objects, and that is centred on the study of a construction.
In an era where programs are increasingly the product of gradient descent rather than human construction, this attitude is untenable. We may have to accept a mode of reasoning about learned programs, based on understanding the nature of the problems to which they are a solution and the iterative processes that produce them. To understand the implicit algorithms learned by neural networks, it may be necessary from this perspective to understand
the computational structures latent in the data distribution, and
the inductive biases of neural network training.
We do not currently have a good understanding of these matters. If we understood these inductive biases better, it could conceivably help us in the context of AI alignment to answer questions like "how likely is deceptive alignment", "how likely is consequentialism", and "what goals are instrumentally convergent"?
This note is about the inductive biases of the Bayesian learning process (conditioned on more samples, the posterior increasingly localises around true parameters). Since Bayesian statistics is both fundamental and theoretically tractable, this seems potentially useful for understanding the inductive biases of neural network training. However it is worth noting that the relation between these is not understood at present.
1.2 Singular Learning Theory
The asymptotic expansion of the Bayesian free energy, or "free energy formula'', proven by Watanabe in Singular Learning Theory (SLT) introduces the learning coefficient λ as a measure of complexity that balances ...