Time will tell whether or not Brexit is a disaster the UK, but in any case it is hardly one for prediction markets.
Bremain was certainly the favorite of the bookmakers all along, while polls were inconclusive or wildly fluctuating, with loads of undecideds. The day before the poll, Hypermind gave Bremain a probability of 75%, and Brexit only 25%. In view of the result, some are questioning the reliability and relevance of forecasts from prediction markets. Fair enough.
So let’s take advantage of what Americans call a “teachable moment” to explain again what prediction market forecasts are, what they are not, and why Hypermind’s are particularly reliable.
Probabilities vs certainties
It can’t be said that Hypermind was “right” on Brexit. But to argue that it was “wrong” requires a total disregard for what probabilities mean. In fact, the very idea that a probabilistic forecast – 25% chance – can be proved right or wrong with a single observation is absurd. At the end of an interview in French weekly Le Point just two days before the vote, I was asked the question “If the Brexit wins, what conclusions you will draw?” Here’s my answer :
Hypermind’s forecasts are accurate probabilities, not certainties. Of all the events that we believe to have “only” 25% chances of happening, like Brexit today, we can guarantee that about one in four will happen, even if it was not the most likely outcome. Maybe Brexit will be that one … but there are three in four chances that it won’t be.
Well, Brexit was that one … there was a one in four chances. Only those who make the mistake of confusing 25% (unlikely) with 0% (not a chance) could blame Hypermind.
In fact, we probabilistic forecasters must live under a particularly ironic curse: we know full well that whenever an unlikely event happens – and it must, eventually, otherwise probabilities would be meaningless – we will be loudly (but wrongly) criticized.
How to assess the reliability of probabilistic forecasts
But then how do we know if the probability of 25% for Brexit was correctly estimated? Ideally, we would be able to re-run the referendum dozens of times and observe the frequency of Brexit outcomes: if it won about 1 in 4 times, the prediction of 25% likelihood would be validated. Conversely, if the results deviated too much from that 1/4 proportion of Brexit outcomes, we could conclude that the prediction was wrong. The correspondance between the predicted event probability and the actual event frequency of occurence is what is called “calibration”. The better calibrated a forecasting system is, the more its probabilities can be trusted.
Unfortunately, of course, we can’t ever re-run the referendum, nor any other even predicted by Hypermind. Each one is unique. So how can we measure the reliability of our forecasts? The accepted way of doing this is the next best thing : consider as a group all the questions ever addressed by Hypermind over the past two years, including Brexit. The market forecasted 181 political, geopolitical and macroeconomic questions, with 472 possible outcomes. Some were naturally more difficult to forecast than others, but none was trivial, as each question was sponsored by at least one government, bank, or media facing some strategic uncertainty.
The calibration results are illustrated by the graph below. The closer the data points are to the diagonal, the more calibrated the forecasts are. The probabilities generated by Hypermind are generally quite reliable: events that are given about 25% chances of happening do happen about 20-25% of the time. Events estimated at 50% occur about half the time. Events assigned a probability of 90% occur about nine times out of ten, and one in ten also fails to occur… The correlation is not perfect, but it is quite remarkable. It’s hard to do much better.
You will notice that the data also exhibit the so-called “favorite-longshot bias”, a slight S-curve pattern which results from overestimating improbable events and underestimating the more probable ones. Calibration would be better without this systematic distortion at the extremes. It is perhaps a bit ironic to note that the data from the Brexit question went against this pattern and thus helped slightly improve Hypermind’s overall calibration (from .007 to .006). It is as if the occurence of an unlikely event was long overdue in order to better match predicted probabilities to observed outcomes.
What does not kill you makes you stronger
A final lesson is that every confrontation with reality makes the system more reliable, whatever the outcome, because it learns. For every bettor that took a position against Brexit, there was necessarily at least another that bet on it. Everyone who lost that bet will now have less influence on the odds for future forecasts, since he or she will have less money to bet with. Conversely, the forecasts of those who bet correctly will henceforth weigh more on the consensus, because they have more money than ever to move the market prices. Thus the quality of future collective forecasts continuously improves.