Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A starting point for making sense of task structure (in machine learning), published by Kaarel on February 25, 2024 on LessWrong.
ML models can perform a range of tasks and subtasks, some of which are more closely related to one another than are others. In this post, we set out two very initial starting points. First, we motivate reverse engineering models' task decompositions. We think this can be helpful for interpretability and for understanding generalization. Second, we provide a (potentially non-exhaustive, initial) list of techniques that could be used to quantify the 'distance' between two tasks or inputs.
We hope these distances might help us identify the task decomposition of a particular model. We close by briefly considering analogues in humans and by suggesting a toy model.
Epistemic status: We didn't spend much time writing this post. Please let us know in the comments if you have other ideas for measuring task distance or if we are replicating work.
Introduction
It might be useful to think about computation in neural networks (and in LMs specifically) on sufficiently complex tasks as a combination of (a) simple algorithms or circuits for specific tasks[1] and (b) a classifier, or family of classifiers, that determine which simple circuits are to be run on a given input.
(Think: an algorithm that captures (some of) how GPT-2 identifies indirect objects in certain cases combined with a method of identifying that indirect object identification is a thing that should be done.[2]) More concretely, some pairs of tasks might overlap in that they are computed together much more than are other pairs, and we might want to build a taxonomic tree of tasks performed by the model in which tree distance between tasks is a measure of how much computation they share.[3] For
example, a particularly simple (but unlikely) task structure could be a tree of depth 1: the neural network has one algorithm for classifying tasks which is run on all inputs, and then a single simple task is identified and the corresponding algorithm is run.
Why understanding task structure could be useful
Interpretability
We might hope to interpret a model by 1) identifying the task decomposition, and 2) reverse-engineering both what circuit is implemented in the model for each task individually, and how the model computes this task decomposition. Crucially, (1) is valuable for understanding the internals and behavior of neural networks even without (2), and techniques for making progress at it could look quite different to standard interpretability methods.
It could directly make the rest of mechanistic interpretability easier by giving us access to some ground truth about the model's computation - we might insist that the reverse engineering of the computation respects the task decomposition, or we might be able to use task distance metrics to identify tasks that we want to understand mechanistically.
Further, by arranging tasks into a hierarchy, we might be able to choose different levels of resolution on which to attempt to understand the behavior of a model for different applications.
Learning the abstractions
Task decomposition can give direct access to the abstractions learned by the model. Ambitiously, it may even turn out that task decomposition is 'all you need' - that the hard part of language modeling is learning which atomic concepts to keep track of and how they are related to each other.
In this case, it might be possible to achieve lots of the benefits of full reverse engineering, in the sense of understanding how to implement a similar algorithm to GPT4, without needing good methods for identifying the particular way circuits are implemented in any particular language model. Realistically, a good method for measuring task similarity won't be sufficient for this, but it could be a ...
view more