Graceful degradation in election forecasting

The Washington Post recently published a review of the relative performance of various forecasting methods in the recent US presidential election. Amusingly titled Which election forecast was the most accurate, or rather: The “least wrong”? , the article nicely complements our own post mortem published earlier. Like us it finds that:

  1. Hypermind was one the “least wrong” forecasters of the lot;
  2. Crowd-based methods fared better than statistical poll aggregators.

One take-away is that when all systems failed, human collective foresight failed less than the alternatives. You might call it “graceful degradation”.

Skeptics of crowd wisdom gleefully seize on 2016’s Brexit and Trump forecasting armageddons to argue that our kind can’t predict the future and that it is a hopeless quest at best, a cynical con at worst. That criticism entirely misses the point. Prediction markets have never claimed magical powers to predict everything all the time. That’s just impossible, and the world is better for it. However, the record shows that prediction markets tend to perform better, or fail less, than the alternatives. In that, they help push back the inherent limits of forecasting. That’s all, but it’s remarkable nonetheless.

In France, François Fillon’s stunning upset surprised everyone but Hypermind

While the latest polls in France’s primary of “the right and center” indicated a tight 3-way race, the prediction market correctly extrapolated a stunning upset and the order in which the contenders finished.

The primary “of the Right and Center” in France is a two-rounds election designed to select the candidate of the right’s “Les Republicains” party for the 2017 presidential election. Seven candidates were competing in the first round held on November 20, only two of which will compete in the second round held a week later.

For months, the favorite of the polls had been Alain Juppé, an ex-prime minister of Jacques Chirac and ex-foreign minister of Nicolas Sarkozy, who was polling just behind him. As all other contenders were far behind in the polls, the two were widely expected to face-off in the second round.

But on election day, Sarkozy’s ex-prime minister, François Fillon, pulled a stunning upset by coming in first (44.1%), far ahead of Juppé (28.5%), and leaving third-place Sarkozy in the dust (20.6%).

The last couple weeks of polling had seen Fillon catching up to the two leaders, and the final polls, two days before the vote, had the three of them tied around 30%. Poll-based expectations were thus for a very tight race, a far cry from Fillon’s overwhelming victory, or Sarkozy’s humiliating defeat.

primdroite-polls-vs-hm

The story told by Hypermind was quite different. Focusing not on vote-share in the 1st round, but on who will win the two-rounds election – which Fillon ended up winning in a landslide – the market also had Juppé as the overwhelming favorite up until the final polls showing the three contenders tied. But all this while it also had Fillon tied with Sarkozy in terms of probability of winning. When Fillon started rising in the polls, in the last week of the campaign, Fillon’s chances of winning the nomination also rose, while Sarkozy, still second to Juppé and ahead of Fillon in the polling, became third in the prediction market. When the last polls showed the three contenders tied, the market, in contrast, showed Fillon tied in the lead with Juppé, with Sarkozy far behind. From then on, over the last couple days before the vote, Fillon became the clear market leader while Juppé’s chances to win collapsed (but remained above Sarkozy’s).

Did Donald Trump the Wisdom of Crowds?

Should the stunning result of the 2016 election, coming only a few months after the equally surprising Brexit, lead us to question the usefulness of prediction markets and, perhaps, the very idea of the wisdom of crowds? Let’s look at the facts to try to understand what happened.

Prediction markets failed to call President Trump

Trump’s victory indeed confounded all the prediction markets. Never, at any point in the race, was he the favorite to win the White House. On election day, as recorded here, Hypermind gave him at most a 25% probability (1 chance in 4), which is more than any other crowd-based forecasting system gave him, including major UK bookmakers, leading real-money prediction markets Betfair and PredictIt, as well as various other wisdom-of-crowds methods. This is failure across the board, as with Brexit. The only silver lining is that Hypermind failed slightly less than others, as with Brexit.

hypermind-vs-crowds

Notes: Betfair, based in the U.K., is the largest real-money prediction market in the world, but doesn’t allow the participation of U.S. residents. PredictIt is the largest U.S.-based real-money prediction market. Pivit and Almanis, like Hypermind, are play-money markets offering prizes. Good Judgment (aka GJOpen) is not a market but a sophisticated “prediction poll“. PredictWise is primarily an aggregator of prediction-market data from various sources.

Prediction markets did not fail as badly as most poll aggregators

Another popular approach to election forecasting is the statistical aggregation of the vast amounts of polling data, both at the national and state levels. This approach, supposedly more data-driven and objective than crowd-based forecasts, did not perform any better. In fact, except for FiveThirtyEight, which, on election day, still only gave Trump less than 30% chances of a win, all the poll-based models were more pessimistic about Trump’s chances than the four leading crowd-based models. The variance among poll-based models was also much larger, which may indicate that, despite the sophisticated statistics, proper poll-aggregation still is closer to art than to science, and is overly impacted by the subjective choices of the modeler: If Nate Silver’s FiveThirtyEight failed less than others in the general election, it is because it became extra cautious after being blind-sided by Trump in the GOP primaries.

poll-vs-crowdIt’s a near miss, not an epic fail

Favoring Clinton wasn’t completely off-base: she decisively won the popular vote, by more than 2 million votes and a margin not only larger than Al Gore in 2000, but also John Kennedy in 1960 and Richard Nixon in 1968. Besides Gore, only three other candidates have won the popular vote but not the presidency, and they all lived in the 19th century: Andrew Jackson in 1824; Samuel Tilden in 1876; and Grover Cleveland in 1888. On election day, both Hypermind and the Iowa Electronic Markets (IEM) gave Clinton about 80% chances of winning the popular vote.

Furthermore, Trump’s win in the Electoral College was razor thin: just about 100,000 votes in Pennsylvania, Wisconsin, and Michigan combined apparently decided the election. That’s less than 0.8% of the more than 13.6 million votes cast in those states, and less than 0.1% of the 120 million votes cast nationwide. What is now touted as the inevitable outcome of widespread voter discontent could easily have gone the other way, as Nate Silver brilliantly explains.

Polls aren’t the reliable beacons of public opinion that they used to be

The post-mortem consensus is that the Trump surprise was mostly due to large polling errors in the Midwestern states mentioned above, which caused everyone to think they were safe for the Democratic candidate (everyone except the astute Trump campaign, evidently).

Poll aggregation models depend directly on the quality of the polling data, and if these data aren’t good, a model’s forecast can’t be either, no matter how sophisticated its mathematics are. In turn, the most reputable pollsters and poll-aggregators are a major source of information for prediction market traders before they place their bids, so bad polling data damages the crowd’s consensus forecast.

Prediction markets weren’t just following the polls

However closely Hypermind and some of the other prediction markets watched the polls, the fact that they performed better than most poll aggregators shows that they weren’t just following polls. Throughout the campaign, prediction traders had to balance two incompatible narratives and data sets. On the one hand, the cold numbers emerging from scientific polling by reputable institutes and their careful analysis by statistical whizzes. Those data always favored Clinton until the very end. On the other hand, there was the unusual size and enthusiasm of crowds at Trump’s meetings, which never showed any sign of faltering.

Are traders not sufficiently representative of the voting population?

No good explanation has yet been given for the failure of polls to detect the true level of support for Trump. But in the case of prediction markets, it is tempting to think that Trump voters and enthusiasts were under-represented in the population of prediction traders. That would nicely fit the populist narrative of Trump voters being invisible to an out-of-touch elite. However, there are several problems with this story:

Firstly, ever since the Iowa Electronic Markets started trading national US elections in 1988, it has been empirically proven that a trading population not at all representative of the U.S. population could better predict the election than careful polls of representative U.S. population samples.

Secondly, the U.K’s betting shops and prediction markets were full of regular-folk  punters and they got blind-sided by Brexit just the same.

Thirdly, prediction traders, at least on Hypermind, were in fact keenly aware of the point of view and arguments of those who backed Trump – after all, for every trader who bet on Clinton, there had to be a Trump-backing counterpart. Several participated actively to the forum discussions and explained their thinking very clearly. Yet, when stripped of the anti-Clinton “lock her up” rants, their case rested essentially on paying more attention to the size of Trump’s meeting crowds and long waiting lines than to his poll numbers. Just as those who bet on Clinton are now accused of living in a bubble, it wasn’t obvious that Trump backers weren’t deliberately ignoring inconvenient polling data to focus on their own reality.

In the end, more traders evidently gave more credence to the polling data – which turned out to be deeply flawed – but the direct exposure to the Trump-backing viewpoint, as well as the still-fresh memory of Brexit, prevented the markets from going all-in on Clinton, unlike most of the poll-based models.

Brexit did not obviously imply a Trump win

There was a lot of talk about a possible Brexit-like surprise towards the end of the campaign: “Brexit plus”, Trump liked to call it… With 20/20 hindsight, the trans-Atlantic populist parallel now seems obvious to everyone, but it wasn’t so obvious before the vote, when even those who had correctly bet on Brexit collectively gave Trump only 30% chances of winning. Furthermore, the Good Judgment researchers found that predictions of those Brexit champions tend to be sub-par on most other geopolitical questions. On Hypermind, only 47% of those who bet on Brexit also bet on President Trump. So having called Brexit correctly did not automatically give one any special insight into this US election, nor does it make one a superior forecaster in general.

The accuracy record of prediction markets is still amazing

Prediction markets like Hypermind occasionally fail to assign the better odds to winning outcomes. That’s a feature, not a bug, of probabilistic forecasting. That it happened on two of the highest-impact and closely watched events of 2016 – Brexit and President Trump – is deplorable publicity, but it doesn’t invalidate the method. We only claim to provide accurate probabilities, not black-or-white predictions, and the accuracy of probabilistic forecasts can only be judged over many outcomes. A couple misses, no matter how high-profile, are not enough to invalidate the long track record of accuracy that prediction markets have accumulated over more than a quarter century.

For instance, Hypermind’s performance over 2.5 years, 213 questions, and 561 outcomes (including Brexit and President Trump) shows remarkable calibration with reality over a broad range of electoral, geopolitical and economic forecasts. Besides prediction markets and related crowd-forecasting methods, nothing else and no-one else today can claim a better accuracy record.

calibration-213

Hypermind forecast calibration over 2.5 years, 213 electoral, geopolitical, and economic questions, and 561 possible event outcomes. Every day, at Noon, the estimated probability of each outcome was recorded. Once all the questions are settled, we compare, at each level of probability, the percentage of events predicted to occur to the percentage that actually occurred. The size of the data points indicates the relative number of daily forecasts recorded at each level of probability, out of a total of 56,949.

 

 

 

Brexit pundits embarrass themselves with misguided attacks on prediction markets

 

Brexit and the rise of Donald Trump are both high-impact anti-establishment events that have shattered powerful ambitions and squandered fortunes. Dazed and confused, the establishment is looking for scapegoats, and some of its wrath is currently focused on prediction markets.

In two sternly worded post-Brexit articles, pundits at The Economist (1) and the Financial Times (2) have accused prediction markets of having « spectacularly failed » and having been « wildly wrong » about both Trump and Brexit. Coming from such highly regarded publications, this hurts.

However, as a long-time prediction market operator, I see no reason feel « most embarrassed » as the FT’s John Authers suggests I should be. On the contrary, it seems to me that the articles’ authors are embarrassing themselves by revealing their own confusion about the empirical data, the rules of forecasting, and the meaning of probability.

Firstly, there is little basis to the assertion that prediction markets were wrong about Trump’s victorious march to he GOP nomination. As I have documented in a previous post, and as Figure 1 shows, Hypermind and the leading real-money prediction markets (Iowa Electronic Markets, PredictIt, and Betfair) had already anointed Trump the favorite before the first ballot was cast in Iowa. From then on, except for a short week between his Iowa stumble and his New Hampshire comeback in early February, he remained the favorite throughout the campaign until his last rivals finally quit in May. The markets “got” Trump.

panel-markets

FIGURE 1 – Probability of winning the GOP presidential nomination according to leading prediction markets, from January 25, 2016 (one week before the Iowa Caucuses) to May 3, 2016 (eve of the last contested primary in Indiana). Except for a short week between Iowa and New Hampshire, the markets always favored Trump.

 

What about Brexit? It is true that, on the eve of the vote, UK bookmakers and prediction markets all favored the losing “remain” outcome. Our own Hypermind gave it a probability of 75%. With the benefit of hindsight, the pundits judge that forecast harshly. They write that the polling data argued « strongly » against such confidence, and more specifically:

  • that polls always suggested that the referendum was on a « knife edge », that just before the vote the polling average showed a « dead heat », and that our traders succumbed to the dreaded confirmation bias: placing more weight on recently released polls favouring « remain » than on the similar number of surveys backing “leave”.
  • that our traders were « fooled by the trend in referendums in Scotland and Quebec for voters to move towards the status quo as voting approached. »

But the facts are wrong and the criticism is misguided.

Yes, the polls were close, but there was a clear trend towards “remain” in the last few days. For instance, the final update of the FT’s own poll tracker, including all polls published up to June 22, just before the vote, showed a 2% advantage for “remain” over “leave”.

Ft poll of polls final

Furthermore, of the six polls published June 22, four favored “remain”. Of the twelve polls taken after Jo Cox’s murder on June 16, seven favored “remain” while only four favored “leave”, a dramatic trend reversal from the previous period when nine of the last twelve polls favored “leave”. Given this data, prediction traders could be forgiven for thinking that polls showed a small but firm lead for “remain” in the final stretch, where it counts most. Confirmation bias it was not.

The number of undecideds was still relatively high, but based on historical precedent in Scotland (2014) and Quebec (1995) these voters were expected to mostly choose the status-quo “remain” (Figure 2). Even TNS, the only firm whose polls consistently favored “leave”, pointedly noted just before the vote that this historical trend could swing the vote to “remain”. (3) Rather than being “fooled”, traders smartly followed the advice of Nobel Prize winner Daniel Kahneman to consider the “outside view” by taking into account not only the specifics of the Brexit referendum but also previous outcomes in similar consultations. This is simply a best practice in forecasting.

DB Scotland Quebec

FIGURE 2 – A graphic published by Deutsche Bank before the Brexit vote shows that in both the Quebec referendum of 1995 and the Scottish referendum of 2014, the final opinion polls underestimated the status-quo “stay” vote by 3.5% and 3% respectively.

Another reason to believe “remain” had an advantage is that the voters themselves apparently expected it to prevail : an Opinium poll published just before the vote revealed that 46% predicted a “remain” victory, while just 27% expected “leave” to win. Recent research had shown that voter expectations usually forecast an election’s results better than voter intentions. (4)

Prediction traders are not just poll followers. The best of them factor in as much relevant information as they can find before making their most-informed guess about the residual uncertainty. In view of all the evidence available, the consensus probability for “leave” on the eve of the vote ended up around 25% on Hypermind (and a bit lower on other prediction markets and UK bookmakers).

With 20/20 hindsight, the pundits think that probability should have been a lot higher. Their implicit indictment is that because “leave” won, prediction markets should have given it a probability not only higher than 25% but also higher than “remain”, and that failing to do so is some sort of epic fail.

In the comments section to John Authers’ column, several astute readers tried to make the point that the fault lied perhaps not so much with the 25%-chance prediction as with an erroneous interpretation of that probability to mean that “leave” would not happen, hence the surprise. In fact, it had one chance in four, which in the absolute is hardly improbable. As a reader named Kraken elegantly put it: « It’s a feature, not a bug, of prediction markets that apparently unlikely events occur with some probability. »

This is exactly what I have argued in an earlier post on the lessons of Brexit. But if it seems obvious that one should not be astounded every time a 25%-chance outcome occurs, it apparently isn’t to the FT columnist. In his reply to Kraken, he insists that « it’s very unusual for prediction markets not to put more than a 50% chance on the winning outcome in a two-way chance. » It was, therefore « plainly a failure of prediction markets, and a very unusual one. »

But in fact, it isn’t unusual at all. The record shows that market predictions in general – and Hypermind’s in particular – are well calibrated, which means that the estimated probabilities finely predict the proportion of events you can expect to actually occur. From the point of view of a well-calibrated market, a share of unexpected events has to happen, and a share of expected events has to fail to occur. The exact proportion depends on the probability assigned to each event. So it isn’t unusual for a well-calibrated prediction market like Hypermind to assign 25% probability to an event that eventually occurs. In fact, it happens about 25% of the time. The failure lies instead with those who wrongly extrapolate from 25% (unlikely) to 0% (not a chance).

Or perhaps the pundit is really saying that prediction markets should always be expected to favor events that eventually happen, and never favor one that doesn’t? That would make them perfectly prescient, which is absurd. It is the Future we are talking about, and nobody but God himself (or herself) could be expected to be right every time. The best mere mortal markets can offer is fine probability calibration, and some solace in Margaret Thatcher’s immortal insight that « in politics, the unexpected happens. » (4)

 


(1) Polls versus prediction markets: Who said Brexit was a surprise? – by Dan Rosenheck, The Economist, June 24th, 2016

(2) Brexit shows no greater loser than political and market experts – by John Authers, Financial Times, July 1, 2016

(3) In this Daily Express article, dated June 22, Luke Taylor, Head of Social and Political Attitudes at TNS UK is quoted as saying: « Our latest poll suggests that Leave is in a stronger position than Remain but it should be noted that in the Scottish Independence Referendum and the 1995 Quebec Independence Referendum there was a late swing to the status quo and it is possible that the same will happen here. »

(4) Forecasting Elections: Voter Intentions versus Expectations – by David Rothschild & Justin Wolfers, 2013

(5) Thanks to FT reader PeterE for reminding us of this memorable quote.

Lessons from Brexit

 

brexit-andrieu-wide

Time will tell whether or not Brexit is a disaster the UK, but in any case it is hardly one for prediction markets.

Bremain was certainly the favorite of the bookmakers all along, while polls were inconclusive or wildly fluctuating, with loads of undecideds. The day before the poll, Hypermind gave Bremain a probability of 75%, and Brexit only 25%. In view of the result, some are questioning the reliability and relevance of forecasts from prediction markets. Fair enough.

brexit-bremain

Probabilities of Brexit and Bremain on Hypermind from June 16th (just before Jo Cox’s murder) to the announcement of the results on June 24. Just before election day (June 23), the probability of Brexit was hovering around 25%.

So let’s take advantage of what Americans call a “teachable moment” to explain again what prediction market forecasts are, what they are not, and why Hypermind’s are particularly reliable.

Probabilities vs certainties

It can’t be said that Hypermind was “right” on Brexit. But to argue that it was “wrong” requires a total disregard for what probabilities mean. In fact, the very idea that a probabilistic forecast – 25% chance – can be proved right or wrong with a single observation is absurd. At the end of an interview in French weekly Le Point just two days before the vote, I was asked the question “If the Brexit wins, what conclusions you will draw?” Here’s my answer :

Hypermind’s forecasts are accurate probabilities, not certainties. Of all the events that we believe to have “only” 25% chances of happening, like Brexit today, we can guarantee that about one in four will happen, even if it was not the most likely outcome. Maybe Brexit will be that one … but there are three in four chances that it won’t be.

Well, Brexit was that one … there was a one in four chances. Only those who make the mistake of confusing 25% (unlikely) with 0% (not a chance) could blame Hypermind.

The Curse

In fact, we probabilistic forecasters must live under a particularly ironic curse: we know full well that whenever an unlikely event happens – and it must, eventually, otherwise probabilities would be meaningless – we will be loudly (but wrongly) criticized.

How to assess the reliability of probabilistic forecasts

But then how do we know if the probability of 25% for Brexit was correctly estimated? Ideally, we would be able to re-run the referendum dozens of times and observe the frequency of Brexit outcomes: if it won about 1 in 4 times, the prediction of 25% likelihood would be validated. Conversely, if the results deviated too much from that 1/4 proportion of Brexit outcomes, we could conclude that the prediction was wrong. The correspondance between the predicted event probability and the actual event frequency of occurence is what is called “calibration”. The better calibrated a forecasting system is, the more its probabilities can be trusted.

Unfortunately, of course, we can’t ever re-run the referendum, nor any other even predicted by Hypermind. Each one is unique. So how can we measure the reliability of our forecasts? The accepted way of doing this is the next best thing : consider as a group all the questions ever addressed by Hypermind over the past two years, including Brexit. The market forecasted 181 political, geopolitical and macroeconomic questions, with 472 possible outcomes. Some were naturally more difficult to forecast than others, but none was trivial, as each question was sponsored by at least one government, bank, or media facing some strategic uncertainty.

The calibration results are illustrated by the graph below. The closer the data points are to the diagonal, the more calibrated the forecasts are. The probabilities generated by Hypermind are generally quite reliable: events that are given about 25% chances of happening do happen about 20-25% of the time. Events estimated at 50% occur about half the time. Events assigned a probability of 90% occur about nine times out of ten, and one in ten also fails to occur… The correlation is not perfect, but it is quite remarkable. It’s hard to do much better.

Calib 181 Brexit

Hypermind forecast calibration over 2 years on 181 question and 472 possible event outcomes. Every day at noon, the estimated probability of each outcome was recorded. Once all the questions are settled, we can compare, at each level of probability, the percentage of events predicted to occur and the percentage that actually occurred. The size of data points  indicates the number of forecasts recorded at each level of probability.

You will notice that the data also exhibit the so-called “favorite-longshot bias”, a slight S-curve pattern which results from overestimating improbable events and underestimating the more probable ones. Calibration would be better without this systematic distortion at the extremes. It is perhaps a bit ironic to note that the data from the Brexit question went against this pattern and thus helped slightly improve Hypermind’s overall calibration (from .007 to .006). It is as if the occurence of an unlikely event was long overdue in order to better match predicted probabilities to observed outcomes.

What does not kill you makes you stronger

A final lesson is that every confrontation with reality makes the system more reliable, whatever the outcome, because it learns. For every bettor that took a position against Brexit, there was necessarily at least another that bet on it. Everyone who lost that bet will now have less influence on the odds for future forecasts, since he or she will have less money to bet with. Conversely, the forecasts of those who bet correctly will henceforth weigh more on the consensus, because they have more money than ever to move the market prices. Thus the quality of future collective forecasts continuously improves.

Les leçons du Brexit

L’avenir nous dira si le Brexit est une catastrophe ou non pour les Anglais, mais en tout cas cela n’en est pas une pour les marchés prédictifs.

Certes, le Bremain était le grand favori des bookmakers, alors que les sondages étaient contradictoires avec encore au moins 10% d’indécis. La veille du scrutin, Hypermind prévoyait 75% de probabilité pour Bremain et 25% de probabilité pour le Brexit. Au vu du résultat, certains s’interrogent sur la fiabilité et la pertinence des prévisions issues des marchés prédictifs. C’est légitime.

Je vais donc profiter de ce que les américains appellent un “teachable moment” pour expliquer à nouveau ce que sont les prévisions d’un marché prédictif, ce qu’elles ne sont pas, et pourquoi celles d’Hypermind sont fiables.

Des probabilités, pas des certitudes

On ne peut pas dire qu’Hypermind ait eu “raison” sur le Brexit. Mais il faut ne rien entendre aux probabilités pour assurer à l’inverse qu’Hypermind s’est “trompé”. En fait, l’idée même qu’une prévision probabiliste – 25% de chances – puisse être validée ou invalidée par une seule observation est absurde.  A la fin de l’interview dans Le Point, je répondais justement à la question “Si le Brexit l’emporte, quelles conclusions en tirerez-vous ?” de la façon suivante :

Les prévisions d’Hypermind sont des probabilités fiables, pas des certitudes. De tous les événements que nous estimons avoir « seulement » 25 % de chances de se réaliser, comme le Brexit aujourd’hui, nous pouvons garantir qu’environ un sur quatre se réalisera, même s’il n’était pas « favori ». Peut-être que le Brexit sera celui-là…, mais il y a trois chances sur quatre que non.

Le Brexit fut donc celui là… il y avait une chance sur quatre. Si erreur il y a, elle n’est donc pas tant du coté d’Hypermind, que du coté de ceux qui ne font pas la différence entre 25% (“peu probable”) et 0% (“aucune chance”).

Malédiction

De fait, c’est la malédiction particulière du prévisionniste probabiliste que de savoir pertinemment qu’à chaque fois qu’un évènement peu probable se réalisera – et il en faut, car sinon la probabilité n’aurait aucun sens – il se le verra bruyamment reproché. A tort.

Comment évaluer la fiabilité des prévisions

D’accord, me direz vous, mais alors comment savoir si la probabilité de 25% était correctement estimée ? Idéalement, il faudrait pouvoir observer le résultat non pas sur un seul référendum mais sur plusieurs dizaines : Est-ce qu’environ un sur quatre donnerait la victoire au Brexit, et trois sur quatre au Bremain ? Si oui, la prévision de 25% serait vérifiée. Mais si les résultats déviaient trop de ces proportions, alors on pourrait dire que la prévision était mauvaise. L’adéquation entre le pourcentage d’évènements prévus et le pourcentage d’événements réalisés est ce que l’on appelle “l’étalonnage” des prévisions. Mieux le prévisionniste est étalonné, plus ses probabilités sont fiables.

Malheureusement, il n’y aura pas d’autres référendums identiques, et chaque évènement traité par Hypermind est unique. Alors comment évaluer l’étalonnage et la fiabilité des prévisions ? Le mieux que l’on puisse faire c’est de considérer l’ensemble des questions traitées par Hypermind depuis deux ans, Brexit compris: il y en a eu 181, sur des sujets politiques, géopolitiques, et macroéconomiques, avec 472 réponses possibles. Les niveaux de difficultés variaient, naturellement, mais aucune n’était triviale, car chacune était commanditée par au moins un sponsor (gouvernent, banque, média, etc.) faisant face à quelque incertitude stratégique.

Les résultats sont illustrés par le graphe ci-dessous. Moins les data dévient de la diagonale, plus les prévisions sont bien étalonnées. On voit que les probabilités générées par Hypermind sont globalement fiables : les évènements auxquels on accorde 25% de chances se réalisent environ une fois sur quatre ou cinq. Les évènements auxquels on accorde 50% de chances se réalisent une fois sur deux. Les évènements estimés à 90% de chances se réalisent neuf fois sur dix, et ne se réalisent pas une fois sur dix, etc. La corrélation n’est pas parfaite, mais elle est très remarquable. Il est difficile de faire beaucoup mieux.

Calib 181 Brexit FR

Étalonnage des prévisions d’Hypermind sur 181 questions avec 472 réponses (évènements) possibles sur une période de deux ans. Chaque jour à midi, les probabilités estimées sur l’ensemble des réponses sont enregistrées. Quand les résultats sont connus, on peut comparer, à chaque niveau de probabilité, l’adéquation des pourcentages d’événements prévus et d’évènements observés. La taille des points indique le nombre de prévisions relevées à chaque niveau de probabilité.

Étalonnage des prévisions d’Hypermind sur 181 questions avec 472 réponses (événements) possibles sur une période de deux ans. Chaque jour à midi, les probabilités estimées sur l’ensemble des réponses sont enregistrées. Quand les résultats sont connus, on peut comparer, à chaque niveau de probabilité, l’adéquation des pourcentages d’évènements prévus et d’évènements réalisés. La taille des points indique le nombre de prévisions relevées à chaque niveau de probabilité.

Il est peut-être un peu ironique, et certainement contre-intuitif, de réaliser que les résultats de la question Brexit ont légèrement amélioré, plutôt que dégradé, l’étalonnage global d’Hypermind. C’est comme si le système attendait depuis longtemps qu’un évènement improbable se réalise afin de mieux étalonner ses probabilités !

Ce qui ne tue pas rend plus fort

Une dernière leçon à tirer est que chaque confrontation à la réalité rend le système plus fiable, quelque soit le résultat, car il apprend. Pour chaque parieur qui s’est positionné contre le Brexit, il y en a au moins un autre qui a parié dessus. La voix de chaque perdant s’en trouve diminuée, car il ou elle aura moins d’argent pour parier sur les questions suivantes, donc moins d’influence sur les cotes. Inversement, les opinions de ceux qui ont vu juste gagnent en influence, car ils ont désormais plus d’argent à miser sur leurs prévisions (a priori plus avisées que celles des autres). La qualité des prévisions collectives à venir est ainsi affinée.

Hypermind wins the 2016 Republican nomination race

trumpwin

Last week the Associated Press reported that Donald Trump had finally acquired enough delegates to lock in the GOP nomination. But he is not the only winner of this extraordinary primary season: of all the leading prediction markets, Hypermind was the most accurate by far. It outperformed Betfair, the Iowa Electronic Markets (IEM), and PredictIt, respectively the largest prediction market in the world (based in the UK), the longest-running and the newest US-based political markets.

Figure 1 below details the forecasts of each prediction market starting from January 25, a week before the Iowa primary, and ending on May 3, 2016, on the eve of the Indiana primary which proved fatal to Trump’s last two rivals. (No data is available for the IEM before January 25, so the this is also the longest period over which we can compare the performance of all four markets.)

panel-markets

Figure 1 – Probability of winning the GOP presidential nomination for Trump, Cruz, Rubio, or somebody else (Other), according to the four prediction markets, from January 25 to May 3, 2016.

On his way to victory, Trump crushed the hopes of 16 other candidates, and defied the expert forecasts of countless political pundits. However, as Figure 1 shows, even before the first ballot was cast in Iowa, the markets had already anointed Trump the favorite. Then, except for a short week between his Iowa stumble and his New Hampshire comeback in early February, he remained the favorite throughout the campaign until his last rivals finally quit.

Figure 1 also shows that Hypermind was systematically more bullish on Trump than the other markets were, and much less likely to lose confidence and overreact when he stumbled. The contrast is especially vivid in April, when the establishment-fueled fantasy of denying Trump the nomination at a contested convention got a lot of traction in all the markets, but much less so in Hypermind.

For a quantitative measure of accuracy it is customary to use the brier score, which sum the squared errors between the predictions and the true outcomes. The smaller the brier score, the better the prediction: in a 4-way prediction like this one, a perfect prediction has a brier score of 0, a chance prediction (i.e., 25% for each option) scores 0.75, while a totally wrong prediction scores 2.

To get a sense of how accurate the markets were throughout the comparison period, we compute each market’s brier score on a daily basis. Then we average those daily brier scores into a mean daily brier score for each market. The results are plotted in Figure 2 : Hypermind was 35% more accurate than Betfair, and 40% more accurate than IEM and PredictIt.

PIBH-brier

Figure 2 – Mean daily brier score for each prediction market from January 25 to May3, 2016. Lower scores mean better accuracy.

It is remarkable that a play-money market like Hypermind could significantly outperform the leading real-money markets on a question that made daily front-page news all over the world for many months. But it is not overly surprising. Consider this:

  1. It isn’t the first time that Hypermind more accurately forecasted U.S. elections than more often-quoted outfits. It did as well in the 2014 midterm elections (Servan-Schreiber & Atanasov, 2015).
  2. The idea that prediction markets work better when traders must “put their money where their mouth is” is a  hard-to-kill cliché that has no basis in fact, as Servan-Schreiber et al. (2004) proved more than a decade ago. Hard currency need not be involved as long as traders risk something that is valuable to them: reputation, status and self-satisfaction will do just fine for many, especially among the smartest. One particular advantage of play-money markets over their real-money counterparts is that they can better match influence with past success: everyone starts at the same level of wealth, and the only way to amass more play money than others, and thus weigh more on the market prices, is to bet successfully. There is less dumb money than in real-money markets.
  3. Hypermind is much more than just a play-money version of Betfair, IEM or PredictIt. Spawned from Lumenogic‘s multi-year collaboration with the Good Judgment Project, winner of the IARPA ACE forecasting competition, Hypermind’s sole purpose is to make the best possible predictions, rather than enriching a bookmaker, conducting academic research, or providing entertainment.  Its few thousand traders are carefully selected and rewarded (with cash prizes) solely based on actual performance. Good forecasters thrive, while poor forecasters whittle and drop out. In this competitive environment, there are no second chances, which makes the Hypermind community an elite bunch, not just any crowd.

References: