Hypermind sells predictions, so the first question that comes up is usually: “how accurate are they?”. We have now accumulated enough data to be able to take a deep look, and the results are very good.
But before we dive in, let’s be clear about what we mean by “accuracy”. Market predictions are typically expressed as probabilities : it won’t say “Event E will occur”, it will say instead: “There is a 70% chance that event E will occur”. Implicit in that statement is that there also is a 30% chance that event E won’t occur… Which means that any single prediction like this cannot be considered right or wrong, whatever happens.
However, over many predictions, accuracy can be measured as a product of both calibration and discrimination:
Calibration – Predictions are said to be well calibrated when the events deemed more probable do occur more often, and those deemed less probable in fact occur less often. For example, if we consider all the events to which the market ascribed 30% probability, we should observe that 30% of them actually do occur. Similarly, if we consider all the events to which the market ascribed 80% probability, we should observe that 80% of them actually do occur. And so on.
Discrimination – This is a measure of how extreme the predictions are. The closer they are to 0% (absolutely unlikely) or 100% (absolutely likely), the more discriminating they are said to be. Decision makers like predictions that are discriminating because they are more actionable.
Only God’s predictions could be both perfectly calibrated and perfectly discriminating: events would always be predicted to be 0% likely or 100% likely, and the prediction would always be correct. Baring such perfection, calibration is preferable to discrimination: a fuzzy but generally correct forecast is better than a categorical but misleading forecast.
Let us now turn to Hypermind’s data. The prediction market has been operating since May 16th, 2014 with a panel of a few hundred traders recruited and rewarded based on performance.(1)
At this point, 75 questions of political, geopolitical, economic and business nature have been settled: questions about elections in Europe, the U.S., Brazil, Afghanistan and elsewhere, the P5+1 negotiations with Iran over its nuclear program, the war in Ukraine, the GE takeover of Alstom, the ECB stress test, the price of oil, and a whole lot more. The time horizon for the predictions in this data set was in the range of a few days to a few months. All in all, 41,442 trades have been conducted on 196 possible outcomes.
As the chart below illustrates, Hypermind’s predictions are well calibrated. The chart plots the percentage of events that occur at each price level between 1 and 99H (the market’s virtual money). It shows that the prices at which various outcomes are traded on the market can readily be interpreted as realistic probabilities for those outcomes, give or take a few percentage points.
To generate this chart, we recorded the price of each traded outcome every day at 12 Noon, grouped all outcomes traded at the same price and computed the percentage of them that actually occurred. The closer the data points are to the diagonal in the chart, the more the market’s prices predict true probabilities in the real world. Some data points are larger than others to indicate the relative number of outcomes traded at each price level. (The colors, however, are just for show!)
To assess discrimination, it is visually useful to plot the same data at a coarser level by clustering prices in ten intervals of 10H each. As the larger data points include more observations, we can see that most trades occur at price points closer to the extremes, where predictions are more certain, than towards the middle, around 50H, where uncertainty is at its peak.
By this measure, Hypermind is also usefully discriminating: For instance, on a daily basis, two thirds of its predictions (64%) indicate outcome probabilities below 20% (very unlikely) or above 80% (very likely). Similarly, 80% of its predictions are either unlikely (below 30%) or likely (above 70%).
This analysis shows that Hypermind’s predictions are both accurate and actionable, but it tells us little about the intrinsic difficulty of the questions, nor about how well other forecasting methods might have done in comparison on those same questions.
Unfortunately for this purpose, only a few of the questions addressed by Hypermind so far have also been systematically forecasted by other methods or venues. That is partly by design, because the value of Hypermind predictions depends as much on their exclusivity as on their accuracy. We would rather focus on important questions that only few – but the right few – care about, than on entertaining issues that everybody else is already forecasting.
A particularly interesting point of comparison is with the Good Judgment Project, a multi-million dollar research project sponsored by the U.S. government’s Intelligence Advanced Research projects Activity.(2) Since August 2014, Hypermind has been allowed to forecast several dozens of the same geopolitical questions submitted to the Good Judgment forecasters. Based on the score of questions that have closed so far, Hypermind seems to be performing very well. However, there isn’t enough data yet to draw firm conclusions, so this is an issue we will revisit at a later date when more questions have closed.
In the meantime, events like political elections are both important and entertaining, and are widely forecasted. In an earlier post, we documented how Hypermind outperformed all the big-data statistical poll-aggregation models (aka Nate Silver and friends) when predicting the results of the 2014 U.S. midterm elections.
Although the comparative data is still sparse, it clearly suggests that Hypermind exhibits excellent accuracy not so much because the predictions are easy, but because it performs at a best-in-class level.