Lessons from Brexit



Time will tell whether or not Brexit is a disaster the UK, but in any case it is hardly one for prediction markets.

Bremain was certainly the favorite of the bookmakers all along, while polls were inconclusive or wildly fluctuating, with loads of undecideds. The day before the poll, Hypermind gave Bremain a probability of 75%, and Brexit only 25%. In view of the result, some are questioning the reliability and relevance of forecasts from prediction markets. Fair enough.


Probabilities of Brexit and Bremain on Hypermind from June 16th (just before Jo Cox’s murder) to the announcement of the results on June 24. Just before election day (June 23), the probability of Brexit was hovering around 25%.

So let’s take advantage of what Americans call a “teachable moment” to explain again what prediction market forecasts are, what they are not, and why Hypermind’s are particularly reliable.

Probabilities vs certainties

It can’t be said that Hypermind was “right” on Brexit. But to argue that it was “wrong” requires a total disregard for what probabilities mean. In fact, the very idea that a probabilistic forecast – 25% chance – can be proved right or wrong with a single observation is absurd. At the end of an interview in French weekly Le Point just two days before the vote, I was asked the question “If the Brexit wins, what conclusions you will draw?” Here’s my answer :

Hypermind’s forecasts are accurate probabilities, not certainties. Of all the events that we believe to have “only” 25% chances of happening, like Brexit today, we can guarantee that about one in four will happen, even if it was not the most likely outcome. Maybe Brexit will be that one … but there are three in four chances that it won’t be.

Well, Brexit was that one … there was a one in four chances. Only those who make the mistake of confusing 25% (unlikely) with 0% (not a chance) could blame Hypermind.

The Curse

In fact, we probabilistic forecasters must live under a particularly ironic curse: we know full well that whenever an unlikely event happens – and it must, eventually, otherwise probabilities would be meaningless – we will be loudly (but wrongly) criticized.

How to assess the reliability of probabilistic forecasts

But then how do we know if the probability of 25% for Brexit was correctly estimated? Ideally, we would be able to re-run the referendum dozens of times and observe the frequency of Brexit outcomes: if it won about 1 in 4 times, the prediction of 25% likelihood would be validated. Conversely, if the results deviated too much from that 1/4 proportion of Brexit outcomes, we could conclude that the prediction was wrong. The correspondance between the predicted event probability and the actual event frequency of occurence is what is called “calibration”. The better calibrated a forecasting system is, the more its probabilities can be trusted.

Unfortunately, of course, we can’t ever re-run the referendum, nor any other even predicted by Hypermind. Each one is unique. So how can we measure the reliability of our forecasts? The accepted way of doing this is the next best thing : consider as a group all the questions ever addressed by Hypermind over the past two years, including Brexit. The market forecasted 181 political, geopolitical and macroeconomic questions, with 472 possible outcomes. Some were naturally more difficult to forecast than others, but none was trivial, as each question was sponsored by at least one government, bank, or media facing some strategic uncertainty.

The calibration results are illustrated by the graph below. The closer the data points are to the diagonal, the more calibrated the forecasts are. The probabilities generated by Hypermind are generally quite reliable: events that are given about 25% chances of happening do happen about 20-25% of the time. Events estimated at 50% occur about half the time. Events assigned a probability of 90% occur about nine times out of ten, and one in ten also fails to occur… The correlation is not perfect, but it is quite remarkable. It’s hard to do much better.

Calib 181 Brexit

Hypermind forecast calibration over 2 years on 181 question and 472 possible event outcomes. Every day at noon, the estimated probability of each outcome was recorded. Once all the questions are settled, we can compare, at each level of probability, the percentage of events predicted to occur and the percentage that actually occurred. The size of data points  indicates the number of forecasts recorded at each level of probability.

You will notice that the data also exhibit the so-called “favorite-longshot bias”, a slight S-curve pattern which results from overestimating improbable events and underestimating the more probable ones. Calibration would be better without this systematic distortion at the extremes. It is perhaps a bit ironic to note that the data from the Brexit question went against this pattern and thus helped slightly improve Hypermind’s overall calibration (from .007 to .006). It is as if the occurence of an unlikely event was long overdue in order to better match predicted probabilities to observed outcomes.

What does not kill you makes you stronger

A final lesson is that every confrontation with reality makes the system more reliable, whatever the outcome, because it learns. For every bettor that took a position against Brexit, there was necessarily at least another that bet on it. Everyone who lost that bet will now have less influence on the odds for future forecasts, since he or she will have less money to bet with. Conversely, the forecasts of those who bet correctly will henceforth weigh more on the consensus, because they have more money than ever to move the market prices. Thus the quality of future collective forecasts continuously improves.

Les leçons du Brexit

L’avenir nous dira si le Brexit est une catastrophe ou non pour les Anglais, mais en tout cas cela n’en est pas une pour les marchés prédictifs.

Certes, le Bremain était le grand favori des bookmakers, alors que les sondages étaient contradictoires avec encore au moins 10% d’indécis. La veille du scrutin, Hypermind prévoyait 75% de probabilité pour Bremain et 25% de probabilité pour le Brexit. Au vu du résultat, certains s’interrogent sur la fiabilité et la pertinence des prévisions issues des marchés prédictifs. C’est légitime.

Je vais donc profiter de ce que les américains appellent un “teachable moment” pour expliquer à nouveau ce que sont les prévisions d’un marché prédictif, ce qu’elles ne sont pas, et pourquoi celles d’Hypermind sont fiables.

Des probabilités, pas des certitudes

On ne peut pas dire qu’Hypermind ait eu “raison” sur le Brexit. Mais il faut ne rien entendre aux probabilités pour assurer à l’inverse qu’Hypermind s’est “trompé”. En fait, l’idée même qu’une prévision probabiliste – 25% de chances – puisse être validée ou invalidée par une seule observation est absurde.  A la fin de l’interview dans Le Point, je répondais justement à la question “Si le Brexit l’emporte, quelles conclusions en tirerez-vous ?” de la façon suivante :

Les prévisions d’Hypermind sont des probabilités fiables, pas des certitudes. De tous les événements que nous estimons avoir « seulement » 25 % de chances de se réaliser, comme le Brexit aujourd’hui, nous pouvons garantir qu’environ un sur quatre se réalisera, même s’il n’était pas « favori ». Peut-être que le Brexit sera celui-là…, mais il y a trois chances sur quatre que non.

Le Brexit fut donc celui là… il y avait une chance sur quatre. Si erreur il y a, elle n’est donc pas tant du coté d’Hypermind, que du coté de ceux qui ne font pas la différence entre 25% (“peu probable”) et 0% (“aucune chance”).


De fait, c’est la malédiction particulière du prévisionniste probabiliste que de savoir pertinemment qu’à chaque fois qu’un évènement peu probable se réalisera – et il en faut, car sinon la probabilité n’aurait aucun sens – il se le verra bruyamment reproché. A tort.

Comment évaluer la fiabilité des prévisions

D’accord, me direz vous, mais alors comment savoir si la probabilité de 25% était correctement estimée ? Idéalement, il faudrait pouvoir observer le résultat non pas sur un seul référendum mais sur plusieurs dizaines : Est-ce qu’environ un sur quatre donnerait la victoire au Brexit, et trois sur quatre au Bremain ? Si oui, la prévision de 25% serait vérifiée. Mais si les résultats déviaient trop de ces proportions, alors on pourrait dire que la prévision était mauvaise. L’adéquation entre le pourcentage d’évènements prévus et le pourcentage d’événements réalisés est ce que l’on appelle “l’étalonnage” des prévisions. Mieux le prévisionniste est étalonné, plus ses probabilités sont fiables.

Malheureusement, il n’y aura pas d’autres référendums identiques, et chaque évènement traité par Hypermind est unique. Alors comment évaluer l’étalonnage et la fiabilité des prévisions ? Le mieux que l’on puisse faire c’est de considérer l’ensemble des questions traitées par Hypermind depuis deux ans, Brexit compris: il y en a eu 181, sur des sujets politiques, géopolitiques, et macroéconomiques, avec 472 réponses possibles. Les niveaux de difficultés variaient, naturellement, mais aucune n’était triviale, car chacune était commanditée par au moins un sponsor (gouvernent, banque, média, etc.) faisant face à quelque incertitude stratégique.

Les résultats sont illustrés par le graphe ci-dessous. Moins les data dévient de la diagonale, plus les prévisions sont bien étalonnées. On voit que les probabilités générées par Hypermind sont globalement fiables : les évènements auxquels on accorde 25% de chances se réalisent environ une fois sur quatre ou cinq. Les évènements auxquels on accorde 50% de chances se réalisent une fois sur deux. Les évènements estimés à 90% de chances se réalisent neuf fois sur dix, et ne se réalisent pas une fois sur dix, etc. La corrélation n’est pas parfaite, mais elle est très remarquable. Il est difficile de faire beaucoup mieux.

Calib 181 Brexit FR

Étalonnage des prévisions d’Hypermind sur 181 questions avec 472 réponses (évènements) possibles sur une période de deux ans. Chaque jour à midi, les probabilités estimées sur l’ensemble des réponses sont enregistrées. Quand les résultats sont connus, on peut comparer, à chaque niveau de probabilité, l’adéquation des pourcentages d’événements prévus et d’évènements observés. La taille des points indique le nombre de prévisions relevées à chaque niveau de probabilité.

Étalonnage des prévisions d’Hypermind sur 181 questions avec 472 réponses (événements) possibles sur une période de deux ans. Chaque jour à midi, les probabilités estimées sur l’ensemble des réponses sont enregistrées. Quand les résultats sont connus, on peut comparer, à chaque niveau de probabilité, l’adéquation des pourcentages d’évènements prévus et d’évènements réalisés. La taille des points indique le nombre de prévisions relevées à chaque niveau de probabilité.

Il est peut-être un peu ironique, et certainement contre-intuitif, de réaliser que les résultats de la question Brexit ont légèrement amélioré, plutôt que dégradé, l’étalonnage global d’Hypermind. C’est comme si le système attendait depuis longtemps qu’un évènement improbable se réalise afin de mieux étalonner ses probabilités !

Ce qui ne tue pas rend plus fort

Une dernière leçon à tirer est que chaque confrontation à la réalité rend le système plus fiable, quelque soit le résultat, car il apprend. Pour chaque parieur qui s’est positionné contre le Brexit, il y en a au moins un autre qui a parié dessus. La voix de chaque perdant s’en trouve diminuée, car il ou elle aura moins d’argent pour parier sur les questions suivantes, donc moins d’influence sur les cotes. Inversement, les opinions de ceux qui ont vu juste gagnent en influence, car ils ont désormais plus d’argent à miser sur leurs prévisions (a priori plus avisées que celles des autres). La qualité des prévisions collectives à venir est ainsi affinée.

Hypermind wins the 2016 Republican nomination race


Last week the Associated Press reported that Donald Trump had finally acquired enough delegates to lock in the GOP nomination. But he is not the only winner of this extraordinary primary season: of all the leading prediction markets, Hypermind was the most accurate by far. It outperformed Betfair, the Iowa Electronic Markets (IEM), and PredictIt, respectively the largest prediction market in the world (based in the UK), the longest-running and the newest US-based political markets.

Figure 1 below details the forecasts of each prediction market starting from January 25, a week before the Iowa primary, and ending on May 3, 2016, on the eve of the Indiana primary which proved fatal to Trump’s last two rivals. (No data is available for the IEM before January 25, so the this is also the longest period over which we can compare the performance of all four markets.)


Figure 1 – Probability of winning the GOP presidential nomination for Trump, Cruz, Rubio, or somebody else (Other), according to the four prediction markets, from January 25 to May 3, 2016.

On his way to victory, Trump crushed the hopes of 16 other candidates, and defied the expert forecasts of countless political pundits. However, as Figure 1 shows, even before the first ballot was cast in Iowa, the markets had already anointed Trump the favorite. Then, except for a short week between his Iowa stumble and his New Hampshire comeback in early February, he remained the favorite throughout the campaign until his last rivals finally quit.

Figure 1 also shows that Hypermind was systematically more bullish on Trump than the other markets were, and much less likely to lose confidence and overreact when he stumbled. The contrast is especially vivid in April, when the establishment-fueled fantasy of denying Trump the nomination at a contested convention got a lot of traction in all the markets, but much less so in Hypermind.

For a quantitative measure of accuracy it is customary to use the brier score, which sum the squared errors between the predictions and the true outcomes. The smaller the brier score, the better the prediction: in a 4-way prediction like this one, a perfect prediction has a brier score of 0, a chance prediction (i.e., 25% for each option) scores 0.75, while a totally wrong prediction scores 2.

To get a sense of how accurate the markets were throughout the comparison period, we compute each market’s brier score on a daily basis. Then we average those daily brier scores into a mean daily brier score for each market. The results are plotted in Figure 2 : Hypermind was 35% more accurate than Betfair, and 40% more accurate than IEM and PredictIt.


Figure 2 – Mean daily brier score for each prediction market from January 25 to May3, 2016. Lower scores mean better accuracy.

It is remarkable that a play-money market like Hypermind could significantly outperform the leading real-money markets on a question that made daily front-page news all over the world for many months. But it is not overly surprising. Consider this:

  1. It isn’t the first time that Hypermind more accurately forecasted U.S. elections than more often-quoted outfits. It did as well in the 2014 midterm elections (Servan-Schreiber & Atanasov, 2015).
  2. The idea that prediction markets work better when traders must “put their money where their mouth is” is a  hard-to-kill cliché that has no basis in fact, as Servan-Schreiber et al. (2004) proved more than a decade ago. Hard currency need not be involved as long as traders risk something that is valuable to them: reputation, status and self-satisfaction will do just fine for many, especially among the smartest. One particular advantage of play-money markets over their real-money counterparts is that they can better match influence with past success: everyone starts at the same level of wealth, and the only way to amass more play money than others, and thus weigh more on the market prices, is to bet successfully. There is less dumb money than in real-money markets.
  3. Hypermind is much more than just a play-money version of Betfair, IEM or PredictIt. Spawned from Lumenogic‘s multi-year collaboration with the Good Judgment Project, winner of the IARPA ACE forecasting competition, Hypermind’s sole purpose is to make the best possible predictions, rather than enriching a bookmaker, conducting academic research, or providing entertainment.  Its few thousand traders are carefully selected and rewarded (with cash prizes) solely based on actual performance. Good forecasters thrive, while poor forecasters whittle and drop out. In this competitive environment, there are no second chances, which makes the Hypermind community an elite bunch, not just any crowd.



Hypermind accuracy over its first 18 months

Hypermind was launched in May 2014. The chart below plots the accuracy of its predictions over the 151 questions and 389 outcomes that have expired at of this writing. All the predictions so far have been about politics, geopolitics, macroeconomics, business issues, and some current events. No sports.

To generate this chart, we proceeded as follows. The data was collected daily: every day at Noon we recorded the latest transaction price on each traded outcome and treated it as a probability for this outcome. These observations were then grouped in 20 probability bins: 1-5%, 6-10%, 11-15%, …, 96-99%. Then, we just plotted the average of the probabilities in each bin against the percentage of the outcomes represented in the bin that actually occurred.

The market is accurate to the extent that the two numbers are well calibrated, ie., that the data points are aligned with the chart’s diagonal. In our case the measure of calibration is .001, meaning that the average difference between the percentage of events actually coming true and the forecast at each level of probability is only about 3.3%.  If we did not know better, we might conclude that reality aligns itself with Hypermind’s predictions.

calibration 151x5 171215


Polls are dead, long live markets


The polling fiasco in the 2015 UK general election is just the latest in a string of high-profile failures over the last few months. This contrasts with the good performance of prediction markets, and Hypermind in particular.

Let’s start with the referendum on Scottish independence in september 2014. In the final weeks before the referendum, the polls consistently announced a cliffhanger with Yes and No tied within the margin of error. Yet the actual results gave “No” a large majority of 55%, 10 points ahead of “Yes” (45%).

The betting markets on the other hand clearly favored the “No” vote throughout. Witness for instance how the “Yes” vote on Hypermind always stayed below the 50% likelihood threshold, and was given a low probability just before the referendum took place on September 18th.


Then came the midterm congressional elections in the U.S., in november 2014. The big question then was whether the Republicans would recapture control of the Senate, which they did. The polls mostly saw this coming, but were much more timid in their forecasts than the betting markets.

In fact, as discussed earlier in this blog, Hypermind out-predicted all the poll-aggregation models operated by the biggest U.S. media, as well as Nate Silver’s FiveThirtyEight. (Only the Washington Post model ended-up out-predicting Hypermind at the very end, but its prediction was all over the place beforehand, as can be seen in the chart below.)


The Israeli elections in March 2015 again stumped the polls and the pundits. The closer we got to election day, the more Benyamin Netanyahu was given up for dead, politically. The latest polls even predicted his Likud party would be 4 seats behind his leftist rival, and considered how difficult it would be for him to assemble a 61 seat majority coalition in the Knesset. Instead, Likud scored 6 more seats than its closest rival, and Bibi was able to remain prime minister for a 4th term.

What about the betting markets ? On the day before the election, while noting that the election was a rare instance of an “actual tossup“, the New York Times also noted that Hypermind was giving Netanyahu 55% chances of staying prime minister. In fact, Hypermind had clearly kept Netanyahu in the favorite seat all through the campaign.


Which now brings us to the UK general election 2015. It concluded yesterday with a big win for David Cameron’s Conservative Party, a hair-breadth away from an absolute majority in parliament. This was in contrast to all the polling data which had Labour tied with the Conservatives, both very far from a majority. Based on the poll projections of a hung parliament, the pundits could not see how Cameron could gather a governing coalition, even when adding up Ukip and the LibDems. Everyone gave Labour’s Miliband a much better chance of forming a government, with tacit support from the Scottish Nationalist Party. In fact, the polls gave the Labour+SNP a clear majority in the House…

The story was different in the betting markets. At worst, Cameron’s chances of forming the next government remained close to 50%, tied with Labour’s Miliband’s, a far cry from the large Labour advantage everyone assumed from the parliamentary arithmetic based on poll projections. On Hypermind, a Cameron rebound even occurred just before election day.


It will take some time to understand why election polls, which had served the media so well for so long, seem to be suddenly experiencing a global meltdown. Perhaps the simple, powerful idea of the “representative panel” just no longer works well when individualism is pushed to the extreme in modern societies…

What is encouraging, though, is that betting markets – an approach that preexisted polls by decades – are proving more reliable, especially when the going gets tough. This is probably related to the idea, explored earlier in this blog, that predicting human affairs is in general best left to human brains than to algorithms and statistics.

How accurate is the Hypermind prediction market?

cristalballHypermind sells predictions, so the first question that comes up is usually: “how accurate are they?”. We have now accumulated enough data to be able to take a deep look, and the results are very good.

But before we dive in, let’s be clear about what we mean by “accuracy”. Market predictions are typically expressed as probabilities : it won’t say “Event E will occur”, it will say instead: “There is a 70% chance that event E will occur”. Implicit in that statement is that there also is a 30% chance that event E won’t occur… Which means that any single prediction like this cannot be considered right or wrong, whatever happens.

However, over many predictions, accuracy can be measured as a product of both calibration and discrimination:

Calibration – Predictions are said to be well calibrated when the events deemed more probable do occur more often, and those deemed less probable in fact occur less often. For example, if we consider all the events to which the market ascribed 30% probability, we should observe that 30% of them actually do occur. Similarly, if we consider all the events to which the market ascribed 80% probability, we should observe that 80% of them actually do occur. And so on.

Discrimination – This is a measure of how extreme the predictions are. The closer they are to 0% (absolutely unlikely) or 100% (absolutely likely), the more discriminating they are said to be. Decision makers like predictions that are discriminating because they are more actionable.


Only God’s predictions could be both perfectly calibrated and perfectly discriminating: events would always be predicted to be 0% likely or 100% likely, and the prediction would always be correct. Baring such perfection, calibration is preferable to discrimination: a fuzzy but generally correct forecast is better than a categorical but misleading forecast.


Let us now turn to Hypermind’s data. The prediction market has been operating since May 16th, 2014 with a panel of a few hundred traders recruited and rewarded based on performance.(1)

At this point, 75 questions of political, geopolitical, economic and business nature have been settled: questions about elections in Europe, the U.S., Brazil, Afghanistan and elsewhere, the P5+1 negotiations with Iran over its nuclear program, the war in Ukraine, the GE takeover of Alstom, the ECB stress test, the price of oil, and a whole lot more. The time horizon for the predictions in this data set was in the range of a few days to a few months. All in all, 41,442 trades have been conducted on 196 possible outcomes.

As the chart below illustrates, Hypermind’s predictions are well calibrated. The chart plots the percentage of events that occur at each price level between 1 and 99H (the market’s virtual money). It shows that the prices at which various outcomes are traded on the market can readily be interpreted as realistic probabilities for those outcomes, give or take a few percentage points.

caption goes here

To generate this chart, we recorded the price of each traded outcome every day at 12 Noon, grouped all outcomes traded at the same price and computed the percentage of them that actually occurred. The closer the data points are to the diagonal in the chart, the more the market’s prices predict true probabilities in the real world. Some data points are larger than others to indicate the relative number of outcomes traded at each price level. (The colors, however, are just for show!)

To assess discrimination, it is visually useful to plot the same data at a coarser level by clustering prices in ten intervals of 10H each. As the larger data points include more observations, we can see that most trades occur at price points closer to the extremes, where predictions are more certain, than towards the middle, around 50H, where uncertainty is at its peak.


By this measure, Hypermind is also usefully discriminating: For instance, on a daily basis, two thirds of its predictions (64%) indicate outcome probabilities below 20% (very unlikely) or above 80% (very likely). Similarly, 80% of its predictions are either unlikely (below 30%) or likely (above 70%).


This analysis shows that Hypermind’s predictions are both accurate and actionable, but it tells us little about the intrinsic difficulty of the questions, nor about how well other forecasting methods might have done in comparison on those same questions.

Unfortunately for this purpose, only a few of the questions addressed by Hypermind so far have also been systematically forecasted by other methods or venues. That is partly by design, because the value of Hypermind predictions depends as much on their exclusivity as on their accuracy. We would rather focus on important questions that only few – but the right few – care about, than on entertaining issues that everybody else is already forecasting.

A particularly interesting point of comparison is with the Good Judgment Project, a multi-million dollar research project sponsored by the U.S. government’s Intelligence Advanced Research projects Activity.(2) Since August 2014, Hypermind has been allowed to forecast several dozens of the same geopolitical questions submitted to the Good Judgment forecasters. Based on the score of questions that have closed so far, Hypermind seems to be performing very well. However, there isn’t enough data yet to draw firm conclusions, so this is an issue we will revisit at a later date when more questions have closed.

In the meantime, events like political elections are both important and entertaining, and are widely forecasted. In an earlier post, we documented how Hypermind outperformed all the big-data statistical poll-aggregation models (aka Nate Silver and friends) when predicting the results of the 2014 U.S. midterm elections.

Although the comparative data is still sparse, it clearly suggests that Hypermind exhibits excellent accuracy not so much because the predictions are easy, but because it performs at a best-in-class level.


(1) The first few hundred Hypermind traders were recruited based on remarkable performance in various prediction markets operated by NewsFutures and Lumenogic between 2000 and 2014.

(2) Full disclosure: Lumenogic, one of the firms backing Hypermind, has also been a member of the Good Judgment Project research team since 2012. Indeed, some of the prediction market technology used by Hypermind was originally developed by Lumenogic for this purpose.

Hypermind correctly predicted no deal with Iran on nuclear centrifuges

The year-long negotiations with Iran over its nuclear program have failed to reach an agreement by the november 24 deadline. At issue, in particular, was the number of centrifuges that Iran would be allowed to operate to enrich its uranium into weapons-grade material. It currently operates about 10,000, while the P5+1 countries initially aimed to bring that number below 4,000.

Starting in mid september, Hypermind featured a prediction market on this question, as part of a geopolitical contest featuring questions formulated by the Intelligence Advanced Research Projects Activity (IARPA) ACE project.

Iran centrifuges

As the chart shows, Hypermind’s forecast was correctly dire throughout the negotiations, predicting that no deal would be reached on that critical issue. Only briefly did it dip down from around 80% probability of “no agreement” to 50/50 uncertainty. The initial dip was caused by reports that the P5+1 countries, growing desperate for a deal, might allow Iran to operate 5 or 6,000 centrifuges… But the Hypermind prediction traders quickly resolved that this wouldn’t save the negotiations.