Download - LW - Inferring the model dimension of API-protected LLMs by Ege Erdil

Discover

Podcast Features
Your all-in-one podcasting solution.

Blog to Podcast
Turn your blog into an engaging podcast.
Livestream
High-performing audio live, without limits.

Podcast Studio
Easy-to-use audio recorder app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Podcast App
The best podcast player & podcast app.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.
Live Streaming
Receive livestream rewards from your audience and earn
recurring income from your Fan Club membership.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Advertisers
Enterprise
Pricing
Discover

The Nonlinear Library: LessWrong

Education

LW - Inferring the model dimension of API-protected LLMs by Ege Erdil

2024-03-19

Download Right click and do "save link as"

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inferring the model dimension of API-protected LLMs, published by Ege Erdil on March 19, 2024 on LessWrong.
A new paper by Finlayson et al. describes how to exploit the softmax bottleneck in large language models to infer the model dimension of closed-source LLMs served to the public via an API. I'll briefly explain the method they use to achieve this and provide a toy model of the phenomenon, though the full paper has many practical details I will elide in the interest of simplicity. I recommend reading the whole paper if this post sounds interesting to you.
Background
First, some background: large language models have a model dimension that corresponds to the size of the vector that each token in the input is represented by. Knowing this dimension dmodel and the number of layers nlayers of a dense model allows one to make a fairly rough estimate 10nlayersd2model of the number of parameters of the model, roughly because the parameters in each layer are grouped into a few square matrices whose dimensions are Θ(dmodel).[1]
Labs have become more reluctant to share information about their model architectures as part of a turn towards increasing secrecy in recent years. While it was once standard for researchers to report the exact architecture they used in a paper, now even rough descriptions such as how many parameters a model used and how much data it saw during training are often kept confidential. The model dimension gets the same treatment.
However, there is some inevitable amount of information that leaks once a model is made available to the public for use, especially when users are given extra information such as token probabilities and the ability to bias the probability distribution to favor certain tokens during text completion.
The method of attack
The key architectural detail exploited by Finlayson et al. is the softmax bottleneck. To understand what this is about, it's important to first understand a simple point about dimensionality.
Because the internal representation of a language model has dmodel dimensions per token, the outputs of the model cannot have more than dmodel dimensions in some sense. Even if the model upscales its outputs to a higher dimension doutput>dmodel, there will still only be "essentially" dmodel directions of variation in the output.
There are ways to make these claims more precise but I avoid this to keep this explanation simple: the intuition is just that the model cannot "create" information that's not already there in the input.
Another fact about language models is that their vocabulary size is often much larger than their model dimension. For instance, Llama 2 7B has a vocabulary size of nvocab=32000 tokens but a model dimension of only dmodel=4096.
Because an autoregressive language model is trained on the task of next-token prediction, its final output is a probability distribution over all of the possible tokens, which is nvocab1 dimensional (we lose one dimension because of the constraint that a probability distribution must sum to 1). However, we know that in some sense the "true" dimension of the output of a language model cannot exceed dmodel.
As a result, when nvocabdmodel, it's possible to count the number of "true" directions of variation in the nvocab1 dimensional next token probability distribution given by a language model to determine the unknown value of dmodel. This is achieved by inverting the softmax transformation that's placed at the end of language models to ensure their output is a legitimate probability distribution and looking at how many directions the resulting nvocab dimensional vector varies in.[2]
Results
Doing the analysis described above leads to the following results:
Informally, what the authors are doing here is to order all the directions of variation in the probability vector produced by t...