Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is: Data on forecasting accuracy across different time horizons and levels of forecaster experience, published by Charles Dillon on the Effective Altruism Forum.
Key Points
Forecasting well is a valuable skill for many purposes and people, including for EA organisations aiming to identify which areas they should focus on and what the outcomes of various initiatives would be.
There is a limited public record of people making scored forecasts over time horizons greater than ~1 year. Here I use data from PredictionBook and Metaculus to study performance of predictions over different time horizons. I also looked at performance between users with different levels of forecasting practice.
When looking at individual predictors, it seems that a very common failure mode among newer or less dedicated predictors is overconfidence, and that it is more prevalent than underconfidence across most subgroups.
Across both PredictionBook and Metaculus, there seemed to be a significant bias towards overestimating the chances of positive resolution. This effect seemed to get stronger as time to resolution increased.
At least in the PredictionBook data, there was some weak evidence to suggest prediction performance improves with making more predictions, but there were too many confounding factors here to draw any confident conclusion.
The conclusions I was able to draw from this were limited, and working to improve this by expanding the amount and quality of data available for analysis like this seems worth doing.
This post draws a lot on niplav's Range and Forecasting Accuracy, not least for much of the code used to extract the PredictionBook forecasts, and also in identifying the most promising sources of useful data.
I think that this post is probably most useful to individuals making forecasts being aware of common failure modes and attempting to learn from them, and informing decision makers about these failure modes also, rather than attempting to provide those looking to use forecasts with e.g. a transform they should apply to long term forecasts.
Background
There has been a great deal of interest in forecasting in the EA community in recent years, particularly with the prominence of longtermist thinking. It is clearly of great interest that we be well equipped to make predictions about future events, and to understand the accuracy and failure modes of such predictions. Additionally, many of the questions we care most about will have long time horizons, therefore any evidence we can gain which helps us become better at making better long term predictions in particular could be quite valuable (see also Muehlhauser, 2019).
Some potential tools for making longer term forecasts include:
extrapolating from shorter term forecasts expected to be correlated to the long term question - this to an extent transforms the question to a different problem, that of forecasting which intermediate milestones might usefully predict our ultimate questions, and how well.
assuming that those who are well calibrated in the short term will also do well in the long term, and using their forecasts.
The second point here is probably to a certain extent unavoidable, as most forecasters will get few totally independent iterations of making long term forecasts in their lifetimes. One could potentially make hundreds of 10 year predictions now, but lessons cannot be drawn directly from these for 10 years, and if the wrong lessons are learned, it could take another 10 year iteration to realise that. In addition, these hundreds of forecasts may not be independent. I think many forecasts made over longer time horizons will be subject to errors from the same sources, due to society wide effects such as, for example, rates of economic and technological development, a more/less peaceful climate for international relations,...
view more