Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Community Notes by X, published by NicholasKees on March 18, 2024 on LessWrong.
I did an exploration into how
Community Notes (formerly Birdwatch) from X (formerly Twitter) works, and how its algorithm decides which notes get displayed to the wider community. In this post, I'll share and explain what I found, as well as offer some comments.
Community Notes is a fact-checking tool available to US-based users of X/Twitter which allows readers to attach notes to posts to give them clarifying context. It uses an
open-source
bridging-based ranking algorithm intended to promote notes which receive cross-partisan support, and demote notes with a strong partisan lean. The tool seems to be pretty popular overall, and most of the criticism aimed toward it seems to be about how Community Notes fails to be a
sufficient replacement for other, more top-down moderation systems.[1]
This seems interesting to me as an experiment in social technology that aims to improve group epistemics, and understanding how it works seems like a good place to start before trying to design other group epistemics algorithms.
How does the ranking algorithm work?
The full algorithm, while open-source, is quite complicated and I don't fully understand every facet of it, but I've done a once-over read of the
original Birdwatch paper, gone through the
Community Notes documentation, and read this
summary/commentary by Vitalik Buterin. Here's a summary of the "core algorithm" as I understand it (to which much extra logic gets attached):
Users are the people who have permission to rate community notes. To get permission, a person needs to have had an account on X for more than 6 months, be verified, and have committed no violations of X's rules. The rollout of community notes is slow, however, and so eligible account holders are only added to the Community Notes user pool periodically, and at random.
New users don't immediately get permission to write their own notes, having to first get a "rating impact" by rating existing notes (will explain this later).
Notes are short comments written by permitted users on posts they felt needed clarification. These are not immediately made publicly visible on X, first needing to be certified as "helpful" by aggregating ratings by other Community Notes users using their ranking algorithm.
Users are invited to rate notes as either "not helpful," "somewhat helpful," or "helpful." The results of all user-note pairs are recorded in a matrix r where each element run{0,0.5,1,null} corresponds to how user u rated note n. Users only rate a small fraction of notes, so most elements in the matrix are "null." Non-null elements are called "observed" ratings, and values of 0, 0.5, and 1 correspond to the qualitative ratings of "not helpful," "somewhat helpful," and "helpful" respectively.
This rating matrix is then used by their algorithm to compute a helpfulness score for each note. It does this is by learning a model of the ratings matrix which explains each observed rating as a sum of four terms:
^run=μ+iu+in+fufn
Where:
μ: Global intercept (shared across all ratings)
iu: User intercept (shared across all ratings by user u)
in: Note intercept (shared across all ratings of note n) This is the term which will eventually determine a note's "helpfulness."
fu, fn: Factor vectors for u and n. The dot product of these vectors is intended to describe the "ideological agreement" between a user and a note. These vectors are currently one dimensional, though the algorithm is in principle agnostic to the number of dimensions.
For U users and N notes that gets us 1 + 2U + 2N free parameters making up this model. These parameters are estimated via gradient descent every hour, minimizing the following squared error loss function (for observed ratings only):
run(run^run)2+λi(i2u+i2n+μ2)+λf(||fu||2...
view more