With Intrade gone and the rise of sophisticated statistical models à la FiveThirtyEight operated by various U.S. media, we haven’t heard much about prediction markets during the 2014 U.S. midterms election cycle. It was as if the allure of big data and statistical rock stars like Nate Silver had eclipsed the robust and well-documented success of collective human intelligence. Are prediction markets doomed to be road kill on the big-data super highway?
Not so fast.
In head-to-head comparisons, the Hypermind prediction market offers evidence that the aggregated brain power of a prediction market can still outpredict the much-hyped statistical machines.
Hypermind listed several stocks on the midterm elections in the 2014 U.S., focusing on control of the Senate and the 5 most undecided individual races in Kansas, Iowa, North Carolina, Colorado, and Georgia. This allows comparisons between Hypermind’s predictions and those of the 7 major statistical models: FiveThirtyEight (Nate Silver), Washington Post, New York Times, Huffington Post, Princeton Election Consortium, PredictWise, and Daily Kos.
In the analysis below we are comparing the predictions of each model against Hypermind, against each other, and against the average prediction of the 7 models. Importantly, we are not just comparing predictions made on election day, but throughout the weeks or months – depending on the question – during which the market and all models were simultaneously spewing predictions.*
Accuracy is measured using brier scores, which actually compute the error between the predictions and the true outcomes. The smaller the brier score, the better the prediction: a perfect prediction has a brier score of 0, while a chance prediction – think 50/50 – has a brier score of 0.5, and a totally wrong prediction scores 2.
To get a sense of how the methods compared overall, we computed for each question the brier score of each method every day throughout the comparison period. Then we averaged those daily brier scores into a mean daily brier score for each method and each question. Then we averaged those across the 6 questions to get an overall mean daily brier score for each method.**
The chart below plots the results. By this measure, all models except Princeton’s did slightly better than chance, but Hypermind out-predicted all of them, including the average prediction of all the models (“Models Mean”).
We then took a closer look at these elections’ most important question: would Republicans win control of the Senate? In this case, Hypermind again out-performed all the models, as can be seen in the chart below. Except for the Washington Post’s, all the models remained, throughout the comparison period, much less confident than Hypermind in the Republican’s ultimate control of the Senate.
The Washington Post model, although more unstable that any other – notice the large dip around 50% from late august to mid-september – did particularly well at the end of the campaign, so Hypermind’s advantage isn’t as visually obvious as it is against the other models. However, if we compute the average daily brier scores over the entire period during which the Washington Post and Hypermind operated in parallel – from early july to election day – we find a 36% accuracy advantage for Hypermind (.096) over the Washington Post (.150).
There is an important lesson to be learned here: even in this age of big data and super computers, human collective intelligence is still our best means of predicting the future. Isn’t that reassuring?
(*) The periods of comparison for each question were as follows: Senate Control [sept. 3 to nov. 4]; IA, KS, CO, NC [oct. 9 to nov. 4]; GA [oct. 20 to nov. 4].
(**) Computing mean daily brier scores over entire forecasting periods, like we do here, is also how the geopolitical predictions of the IARPA-sponsored Good Judgment Project are being scored by the U.S. Government.
Hypermind’s daily closing prices for each contract are available for download in Excel format.
Models’ data were recorded by the New York Times here and here. Available for download courtesy of the Upshot’s Josh Katz.