One foundational concept in forecasting is base rates. Base rates are the observed past frequency of an outcome of interest. For example, if we want to forecast the probability of tomorrow’s 2:00 flight to Los Angeles arriving on time, we might look at numerous past frequencies: how frequently the 2:00 flight to Los Angeles has arrived on time in the last year, whether it arrived on time today specifically, how frequently the airline’s flights in general arrive on time, etc. We don’t necessarily blindly input one of these probabilities, or take an average of them, but we use them as an input- ideally a starting input- in coming to our answer.
Lenses
One way of thinking about base rates is that when we choose to use base rates we are making a philosophical decision. We are stating that the present situation is not unique. We are stating that the present situation has not been planned to generate a particular outcome. And if we end up forecasting a probability that is at the extreme or outside the extremes of the base rates we are considering (for example, if we give a 20% probability to a third-party candidate winning the next US presidential election), we are making an extraordinary claim and are obligated to provide a well thought out rationale that explains why this particular case is so different.
Another way of looking at base rates is that they harness one of the most common psychological biases: anchoring. Anchoring bias is the observation that a random subsample primed with large numbers will make higher-magnitude guesses than a subsample primed with small numbers, even outside the context of the guess. A forecaster makes a calculated decision that it will be helpful to have a bias or multiple biases, even while independently analyzing the problem.
Priors
We can also think of base rates as an attempt to artificially construct priors. Under Bayes’s Rule there is a formula for combining new information with what was known prior to that information (though a forecaster does not have to be so formulaic).
Say that a medical condition is very rare- it occurs in 0.1% of the population. Say that there is a test that correctly identifies everyone who has the condition, but 0.5% of people who don’t have the condition get incorrectly identified as having it (false positives). In this example, the prior is so low that even a positive test doesn’t indicate that an individual probably has the condition [1].
It would be nice to have similarly canonical priors for questions like “will the UK Prime Minister’s party win the next election”, but these don’t exist. We can conceptually do something similar by gathering base rates and then considering what information has emerged about this specific political scenario.
Art of base rates
Obviously base rates can be used when we are talking directly about “history repeating itself” such as forecasting the probability of an earthquake of at least a certain magnitude impacting San Francisco. But they also add value when estimating the probability of other events. Consider forecasting the chance of a particular leader leaving office. Life expectancy is a sort of base rate, and we can input the particular leader’s age and country to have a more relevant base rate to work with; we can evaluate the historical base rate of coups in “similar” countries; if the country has elections, we can evaluate how frequently the party in power loses elections in “similarly democratic” countries.
A forecaster should therefore be prepared to creatively identify applicable base rates and (formally or informally) weight them as part of the forecasting process. As an exercise, consider the probability of a particular NFL kicker making a game-winning 51 yard field goal. We might have the historical frequency of this particular kicker making game-winning field goals, 50+ yard field goals, 50+ yard field goals in this particular season, field goals in this particular game, field goals in this particular season, the historical frequency of all kickers making 50+ yard field goals…a lot to consider.
Sometimes, in addition, an intuitively obvious base rate is inapplicable. In its 2021-2022 term, the U.S. Supreme Court decided 58 cases and 16 of them were unanimous (not including per curiam opinions), for a base rate of ~28%. But a newsworthy Supreme Court case where a forecast is demanded is likely an atypically politically controversial one, so this base rate would be a terrible forecast. In other words, you need to make sure that the base rate doesn’t mask a high degree of variance depending on the category, when the category of the thing you are forecasting is known.
Use in dialogue with other forecasters
A final benefit of thinking in base rates is that it helps understand disagreement. We’ve seen that there are judgment calls to be made; if you know that another forecaster disagrees with you but it can largely be explained by thinking about the problem differently and identifying different base rates, this is a helpful advance. The two of you can discuss which ones apply or not, or you can agree to disagree [2].
[1] p(condition, given positive test) = p(positive test, given condition) * p(condition) / p(positive test). Since p(positive test) = 100% * 0.1% + 0.5% * 99.9% = ~0.6%: p(condition, given positive test) = 100% * 0.1% / ~0.6% = ~16.67%
[2] People who are concerned about AI existential risk know that the base rate of “technological innovations causing human extinction” is 0%, so they feel comfortable disagreeing with a forecaster who relies on that base rate.