Should the stunning result of the 2016 election, coming only a few months after the equally surprising Brexit, lead us to question the usefulness of prediction markets and, perhaps, the very idea of the wisdom of crowds? Let’s look at the facts to try to understand what happened.
Prediction markets failed to call President Trump
Trump’s victory indeed confounded all the prediction markets. Never, at any point in the race, was he the favorite to win the White House. On election day, as recorded here, Hypermind gave him at most a 25% probability (1 chance in 4), which is more than any other crowd-based forecasting system gave him, including major UK bookmakers, leading real-money prediction markets Betfair and PredictIt, as well as various other wisdom-of-crowds methods. This is failure across the board, as with Brexit. The only silver lining is that Hypermind failed slightly less than others, as with Brexit.
Notes: Betfair, based in the U.K., is the largest real-money prediction market in the world, but doesn’t allow the participation of U.S. residents. PredictIt is the largest U.S.-based real-money prediction market. Pivit and Almanis, like Hypermind, are play-money markets offering prizes. Good Judgment (aka GJOpen) is not a market but a sophisticated “prediction poll“. PredictWise is primarily an aggregator of prediction-market data from various sources.
Prediction markets did not fail as badly as most poll aggregators
Another popular approach to election forecasting is the statistical aggregation of the vast amounts of polling data, both at the national and state levels. This approach, supposedly more data-driven and objective than crowd-based forecasts, did not perform any better. In fact, except for FiveThirtyEight, which, on election day, still only gave Trump less than 30% chances of a win, all the poll-based models were more pessimistic about Trump’s chances than the four leading crowd-based models. The variance among poll-based models was also much larger, which may indicate that, despite the sophisticated statistics, proper poll-aggregation still is closer to art than to science, and is overly impacted by the subjective choices of the modeler: If Nate Silver’s FiveThirtyEight failed less than others in the general election, it is because it became extra cautious after being blind-sided by Trump in the GOP primaries.
It’s a near miss, not an epic fail
Favoring Clinton wasn’t completely off-base: she decisively won the popular vote, by more than 2 million votes and a margin not only larger than Al Gore in 2000, but also John Kennedy in 1960 and Richard Nixon in 1968. Besides Gore, only three other candidates have won the popular vote but not the presidency, and they all lived in the 19th century: Andrew Jackson in 1824; Samuel Tilden in 1876; and Grover Cleveland in 1888. On election day, both Hypermind and the Iowa Electronic Markets (IEM) gave Clinton about 80% chances of winning the popular vote.
Furthermore, Trump’s win in the Electoral College was razor thin: just about 100,000 votes in Pennsylvania, Wisconsin, and Michigan combined apparently decided the election. That’s less than 0.8% of the more than 13.6 million votes cast in those states, and less than 0.1% of the 120 million votes cast nationwide. What is now touted as the inevitable outcome of widespread voter discontent could easily have gone the other way, as Nate Silver brilliantly explains.
Polls aren’t the reliable beacons of public opinion that they used to be
The post-mortem consensus is that the Trump surprise was mostly due to large polling errors in the Midwestern states mentioned above, which caused everyone to think they were safe for the Democratic candidate (everyone except the astute Trump campaign, evidently).
Poll aggregation models depend directly on the quality of the polling data, and if these data aren’t good, a model’s forecast can’t be either, no matter how sophisticated its mathematics are. In turn, the most reputable pollsters and poll-aggregators are a major source of information for prediction market traders before they place their bids, so bad polling data damages the crowd’s consensus forecast.
Prediction markets weren’t just following the polls
However closely Hypermind and some of the other prediction markets watched the polls, the fact that they performed better than most poll aggregators shows that they weren’t just following polls. Throughout the campaign, prediction traders had to balance two incompatible narratives and data sets. On the one hand, the cold numbers emerging from scientific polling by reputable institutes and their careful analysis by statistical whizzes. Those data always favored Clinton until the very end. On the other hand, there was the unusual size and enthusiasm of crowds at Trump’s meetings, which never showed any sign of faltering.
Are traders not sufficiently representative of the voting population?
No good explanation has yet been given for the failure of polls to detect the true level of support for Trump. But in the case of prediction markets, it is tempting to think that Trump voters and enthusiasts were under-represented in the population of prediction traders. That would nicely fit the populist narrative of Trump voters being invisible to an out-of-touch elite. However, there are several problems with this story:
Firstly, ever since the Iowa Electronic Markets started trading national US elections in 1988, it has been empirically proven that a trading population not at all representative of the U.S. population could better predict the election than careful polls of representative U.S. population samples.
Secondly, the U.K’s betting shops and prediction markets were full of regular-folk punters and they got blind-sided by Brexit just the same.
Thirdly, prediction traders, at least on Hypermind, were in fact keenly aware of the point of view and arguments of those who backed Trump – after all, for every trader who bet on Clinton, there had to be a Trump-backing counterpart. Several participated actively to the forum discussions and explained their thinking very clearly. Yet, when stripped of the anti-Clinton “lock her up” rants, their case rested essentially on paying more attention to the size of Trump’s meeting crowds and long waiting lines than to his poll numbers. Just as those who bet on Clinton are now accused of living in a bubble, it wasn’t obvious that Trump backers weren’t deliberately ignoring inconvenient polling data to focus on their own reality.
In the end, more traders evidently gave more credence to the polling data – which turned out to be deeply flawed – but the direct exposure to the Trump-backing viewpoint, as well as the still-fresh memory of Brexit, prevented the markets from going all-in on Clinton, unlike most of the poll-based models.
Brexit did not obviously imply a Trump win
There was a lot of talk about a possible Brexit-like surprise towards the end of the campaign: “Brexit plus”, Trump liked to call it… With 20/20 hindsight, the trans-Atlantic populist parallel now seems obvious to everyone, but it wasn’t so obvious before the vote, when even those who had correctly bet on Brexit collectively gave Trump only 30% chances of winning. Furthermore, the Good Judgment researchers found that predictions of those Brexit champions tend to be sub-par on most other geopolitical questions. On Hypermind, only 47% of those who bet on Brexit also bet on President Trump. So having called Brexit correctly did not automatically give one any special insight into this US election, nor does it make one a superior forecaster in general.
The accuracy record of prediction markets is still amazing
Prediction markets like Hypermind occasionally fail to assign the better odds to winning outcomes. That’s a feature, not a bug, of probabilistic forecasting. That it happened on two of the highest-impact and closely watched events of 2016 – Brexit and President Trump – is deplorable publicity, but it doesn’t invalidate the method. We only claim to provide accurate probabilities, not black-or-white predictions, and the accuracy of probabilistic forecasts can only be judged over many outcomes. A couple misses, no matter how high-profile, are not enough to invalidate the long track record of accuracy that prediction markets have accumulated over more than a quarter century.
For instance, Hypermind’s performance over 2.5 years, 213 questions, and 561 outcomes (including Brexit and President Trump) shows remarkable calibration with reality over a broad range of electoral, geopolitical and economic forecasts. Besides prediction markets and related crowd-forecasting methods, nothing else and no-one else today can claim a better accuracy record.
Hypermind forecast calibration over 2.5 years, 213 electoral, geopolitical, and economic questions, and 561 possible event outcomes. Every day, at Noon, the estimated probability of each outcome was recorded. Once all the questions are settled, we compare, at each level of probability, the percentage of events predicted to occur to the percentage that actually occurred. The size of the data points indicates the relative number of daily forecasts recorded at each level of probability, out of a total of 56,949.