Lessons from Brexit

 

brexit-andrieu-wide

Time will tell whether or not Brexit is a disaster the UK, but in any case it is hardly one for prediction markets.

Bremain was certainly the favorite of the bookmakers all along, while polls were inconclusive or wildly fluctuating, with loads of undecideds. The day before the poll, Hypermind gave Bremain a probability of 75%, and Brexit only 25%. In view of the result, some are questioning the reliability and relevance of forecasts from prediction markets. Fair enough.

brexit-bremain

Probabilities of Brexit and Bremain on Hypermind from June 16th (just before Jo Cox’s murder) to the announcement of the results on June 24. Just before election day (June 23), the probability of Brexit was hovering around 25%.

So let’s take advantage of what Americans call a “teachable moment” to explain again what prediction market forecasts are, what they are not, and why Hypermind’s are particularly reliable.

Probabilities vs certainties

It can’t be said that Hypermind was “right” on Brexit. But to argue that it was “wrong” requires a total disregard for what probabilities mean. In fact, the very idea that a probabilistic forecast – 25% chance – can be proved right or wrong with a single observation is absurd. At the end of an interview in French weekly Le Point just two days before the vote, I was asked the question “If the Brexit wins, what conclusions you will draw?” Here’s my answer :

Hypermind’s forecasts are accurate probabilities, not certainties. Of all the events that we believe to have “only” 25% chances of happening, like Brexit today, we can guarantee that about one in four will happen, even if it was not the most likely outcome. Maybe Brexit will be that one … but there are three in four chances that it won’t be.

Well, Brexit was that one … there was a one in four chances. Only those who make the mistake of confusing 25% (unlikely) with 0% (not a chance) could blame Hypermind.

The Curse

In fact, we probabilistic forecasters must live under a particularly ironic curse: we know full well that whenever an unlikely event happens – and it must, eventually, otherwise probabilities would be meaningless – we will be loudly (but wrongly) criticized.

How to assess the reliability of probabilistic forecasts

But then how do we know if the probability of 25% for Brexit was correctly estimated? Ideally, we would be able to re-run the referendum dozens of times and observe the frequency of Brexit outcomes: if it won about 1 in 4 times, the prediction of 25% likelihood would be validated. Conversely, if the results deviated too much from that 1/4 proportion of Brexit outcomes, we could conclude that the prediction was wrong. The correspondance between the predicted event probability and the actual event frequency of occurence is what is called “calibration”. The better calibrated a forecasting system is, the more its probabilities can be trusted.

Unfortunately, of course, we can’t ever re-run the referendum, nor any other even predicted by Hypermind. Each one is unique. So how can we measure the reliability of our forecasts? The accepted way of doing this is the next best thing : consider as a group all the questions ever addressed by Hypermind over the past two years, including Brexit. The market forecasted 181 political, geopolitical and macroeconomic questions, with 472 possible outcomes. Some were naturally more difficult to forecast than others, but none was trivial, as each question was sponsored by at least one government, bank, or media facing some strategic uncertainty.

The calibration results are illustrated by the graph below. The closer the data points are to the diagonal, the more calibrated the forecasts are. The probabilities generated by Hypermind are generally quite reliable: events that are given about 25% chances of happening do happen about 20-25% of the time. Events estimated at 50% occur about half the time. Events assigned a probability of 90% occur about nine times out of ten, and one in ten also fails to occur… The correlation is not perfect, but it is quite remarkable. It’s hard to do much better.

Calib 181 Brexit

Hypermind forecast calibration over 2 years on 181 question and 472 possible event outcomes. Every day at noon, the estimated probability of each outcome was recorded. Once all the questions are settled, we can compare, at each level of probability, the percentage of events predicted to occur and the percentage that actually occurred. The size of data points  indicates the number of forecasts recorded at each level of probability.

You will notice that the data also exhibit the so-called “favorite-longshot bias”, a slight S-curve pattern which results from overestimating improbable events and underestimating the more probable ones. Calibration would be better without this systematic distortion at the extremes. It is perhaps a bit ironic to note that the data from the Brexit question went against this pattern and thus helped slightly improve Hypermind’s overall calibration (from .007 to .006). It is as if the occurence of an unlikely event was long overdue in order to better match predicted probabilities to observed outcomes.

What does not kill you makes you stronger

A final lesson is that every confrontation with reality makes the system more reliable, whatever the outcome, because it learns. For every bettor that took a position against Brexit, there was necessarily at least another that bet on it. Everyone who lost that bet will now have less influence on the odds for future forecasts, since he or she will have less money to bet with. Conversely, the forecasts of those who bet correctly will henceforth weigh more on the consensus, because they have more money than ever to move the market prices. Thus the quality of future collective forecasts continuously improves.

4 Comments

  1. S-shape of empirical distribution shows conservative bias in estimates (overestimating unlikely events, underestimating likely ones)

    Like

    Reply

  2. It could be that the predictor population was socio-economically or demographically biased in this case. I think it would be very interesting to look at the composition of the predictor population in terms of these factors (such as ethnicity, birthplace, citizenship, income, age, education and gender) to see if a sample of UK referendum voters selected to match the predictor profile voted 25% to 75% in favor of BREMAIN.

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s