Live Webinar with Dr. Emile Servan-Schreiber

I was recently invited by our friends at PredictIt to discuss the accuracy and significance of prediction markets and collective intelligence.

During this live 30 min webinar, I dive into why markets, like PredictIt and Hypermind, have the ability to forecast the future by pooling the speculation of many. I go into the types of individuals who thrive in predictions markets, and why diversity and independent thinking is required for accuracy. I also discuss why the possibility of reward and loss promotes more objective and less passionate thinking, enhancing the quality of the opinions that can be aggregated. And more !

 

 

Hypermind accuracy over its first 18 months

Hypermind was launched in May 2014. The chart below plots the accuracy of its predictions over the 151 questions and 389 outcomes that have expired at of this writing. All the predictions so far have been about politics, geopolitics, macroeconomics, business issues, and some current events. No sports.

To generate this chart, we proceeded as follows. The data was collected daily: every day at Noon we recorded the latest transaction price on each traded outcome and treated it as a probability for this outcome. These observations were then grouped in 20 probability bins: 1-5%, 6-10%, 11-15%, …, 96-99%. Then, we just plotted the average of the probabilities in each bin against the percentage of the outcomes represented in the bin that actually occurred.

The market is accurate to the extent that the two numbers are well calibrated, ie., that the data points are aligned with the chart’s diagonal. In our case the measure of calibration is .001, meaning that the average difference between the percentage of events actually coming true and the forecast at each level of probability is only about 3.3%.  If we did not know better, we might conclude that reality aligns itself with Hypermind’s predictions.

calibration 151x5 171215

 

Polls are dead, long live markets

NFCROWD

The polling fiasco in the 2015 UK general election is just the latest in a string of high-profile failures over the last few months. This contrasts with the good performance of prediction markets, and Hypermind in particular.

Let’s start with the referendum on Scottish independence in september 2014. In the final weeks before the referendum, the polls consistently announced a cliffhanger with Yes and No tied within the margin of error. Yet the actual results gave “No” a large majority of 55%, 10 points ahead of “Yes” (45%).

The betting markets on the other hand clearly favored the “No” vote throughout. Witness for instance how the “Yes” vote on Hypermind always stayed below the 50% likelihood threshold, and was given a low probability just before the referendum took place on September 18th.

SCOTLAND

Then came the midterm congressional elections in the U.S., in november 2014. The big question then was whether the Republicans would recapture control of the Senate, which they did. The polls mostly saw this coming, but were much more timid in their forecasts than the betting markets.

In fact, as discussed earlier in this blog, Hypermind out-predicted all the poll-aggregation models operated by the biggest U.S. media, as well as Nate Silver’s FiveThirtyEight. (Only the Washington Post model ended-up out-predicting Hypermind at the very end, but its prediction was all over the place beforehand, as can be seen in the chart below.)

midterms2014Senate

The Israeli elections in March 2015 again stumped the polls and the pundits. The closer we got to election day, the more Benyamin Netanyahu was given up for dead, politically. The latest polls even predicted his Likud party would be 4 seats behind his leftist rival, and considered how difficult it would be for him to assemble a 61 seat majority coalition in the Knesset. Instead, Likud scored 6 more seats than its closest rival, and Bibi was able to remain prime minister for a 4th term.

What about the betting markets ? On the day before the election, while noting that the election was a rare instance of an “actual tossup“, the New York Times also noted that Hypermind was giving Netanyahu 55% chances of staying prime minister. In fact, Hypermind had clearly kept Netanyahu in the favorite seat all through the campaign.

BIBISTAY

Which now brings us to the UK general election 2015. It concluded yesterday with a big win for David Cameron’s Conservative Party, a hair-breadth away from an absolute majority in parliament. This was in contrast to all the polling data which had Labour tied with the Conservatives, both very far from a majority. Based on the poll projections of a hung parliament, the pundits could not see how Cameron could gather a governing coalition, even when adding up Ukip and the LibDems. Everyone gave Labour’s Miliband a much better chance of forming a government, with tacit support from the Scottish Nationalist Party. In fact, the polls gave the Labour+SNP a clear majority in the House…

The story was different in the betting markets. At worst, Cameron’s chances of forming the next government remained close to 50%, tied with Labour’s Miliband’s, a far cry from the large Labour advantage everyone assumed from the parliamentary arithmetic based on poll projections. On Hypermind, a Cameron rebound even occurred just before election day.

UKPRIME-full-en

It will take some time to understand why election polls, which had served the media so well for so long, seem to be suddenly experiencing a global meltdown. Perhaps the simple, powerful idea of the “representative panel” just no longer works well when individualism is pushed to the extreme in modern societies…

What is encouraging, though, is that betting markets – an approach that preexisted polls by decades – are proving more reliable, especially when the going gets tough. This is probably related to the idea, explored earlier in this blog, that predicting human affairs is in general best left to human brains than to algorithms and statistics.

Why you need collective intelligence in the age of big data

(c) Philippe Andrieu - click to visit the artist's website

There’s an old joke about a someone who has lost his car keys and keeps looking for them under a street light, but with no success. After a while, a policeman finally asks why he doesn’t extend his search elsewhere. “Because that’s where the light is,” answers the man.

The current obsession with Big Data is somewhat reminiscent of this so-called “street light effect” – the tendency to look for answers where they are easiest to look for, not most likely to be found.

In fact, whether or not a big-data search party is likely to discover something useful really depends on the kinds of data that are at hand. Computers are really good at processing data that are well structured: digital, clean, explicit and unambiguous. But when the data are unstructured – analog, noisy, implicit or ambiguous – human brains are better at making sense of them.

Whereas a single human brain, or a modest personal computer, may deal with small data sets of the preferred kind, the “bigger” the data is, the more computing power has to be brought to bear. In the case of structured data, bigger computers will come in handy, but in the case of unstructured data – the kind computers can’t properly deal with – there’s also a hard limit on how much computing power a single human brain can deliver. So the best way to make sense of big unstructured data sets is to tap into the collective intelligence of a multitude of brains.

Big Data vs Collective Intelligence

The best kind of computing power to bring to bear on big data depends on the kind of data that has to be processed. Collective intelligence delivers the best performance when dealing with big unstructured data sets.

When the goal is to peer into the future, statistical big-data approaches are especially brittle, because the data at hand are necessarily rooted in the past. That’s ok when what you are trying to forecast is extremely similar to what has already happened – like a mature product line in a stable market – but it breaks down disgracefully when you are dealing with brand new products or disrupted markets.

Here are just a few examples of situations we have encountered where collective forecasting proves superior to data-driven projections:

Disrupted market: When in the mid-2000 the world-wide demand for dairy products suddenly increased three-fold in the space of a few months, after a decade of stability, dairy product producers could not rely any more on their data-driven forecasting models. Instead, they tapped into the collective forecasting insights of their people on the ground, closest to the customers, to better understand and model the new demand drivers.

New products: A few years ago Lumenogic collaborated with a team of marketing researchers to run a prediction market within a Fortune 100 consumer packaged firm, focusing on new products. When compared to the forecasts issued from the classic data-driven methods, the researchers found that the collective forecasts provide superior results in 67% of the cases, reduce average error by approximately 15 percentage points, and reduce the error range by over 40%.

Political elections: In the past 20 years, prediction markets have become famous for their ability to outperform polls as a means to forecast electoral outcomes.  So much so that a skewer of distinguished economists eventually petitioned the U.S. government to legalize political betting for the benefit of society – which it did recently, to some extent, as evidenced by the recent launch of PredictIt. The big-data camp fought back in the form of poll aggregators, as popularized by statistical wizard Nate Silver, and further enriched by other non-poll data sets such as campaign contributions, ad spend, etc. To no avail. In last november’s U.S. Midterm elections, the collective intelligence of Hypermind’s few hundred (elite) traders outperformed all the big data-driven statistical prediction models put forth by major media organizations. That’s because the wisdom of crowds is able to aggregate a lot of information – unstructured data – about what makes each election unique, whereas this data lies out of the reach of statistical algorithm, however sophisticated.

Despite the current and growing flood digital data – the kind computers and algorithms can deal with – we should not lose sight that the world offers magnitudes more unstructured data – the kind only human brains can collectively make sense of.  So if you ever find yourself searching fruitlessly under that big-data street light, remember that collective intelligence may provide just the night goggles you need to extend your search.

How accurate is the Hypermind prediction market?

cristalballHypermind sells predictions, so the first question that comes up is usually: “how accurate are they?”. We have now accumulated enough data to be able to take a deep look, and the results are very good.

But before we dive in, let’s be clear about what we mean by “accuracy”. Market predictions are typically expressed as probabilities : it won’t say “Event E will occur”, it will say instead: “There is a 70% chance that event E will occur”. Implicit in that statement is that there also is a 30% chance that event E won’t occur… Which means that any single prediction like this cannot be considered right or wrong, whatever happens.

However, over many predictions, accuracy can be measured as a product of both calibration and discrimination:

Calibration – Predictions are said to be well calibrated when the events deemed more probable do occur more often, and those deemed less probable in fact occur less often. For example, if we consider all the events to which the market ascribed 30% probability, we should observe that 30% of them actually do occur. Similarly, if we consider all the events to which the market ascribed 80% probability, we should observe that 80% of them actually do occur. And so on.

Discrimination – This is a measure of how extreme the predictions are. The closer they are to 0% (absolutely unlikely) or 100% (absolutely likely), the more discriminating they are said to be. Decision makers like predictions that are discriminating because they are more actionable.

accuracy-en-sqr

Only God’s predictions could be both perfectly calibrated and perfectly discriminating: events would always be predicted to be 0% likely or 100% likely, and the prediction would always be correct. Baring such perfection, calibration is preferable to discrimination: a fuzzy but generally correct forecast is better than a categorical but misleading forecast.

HYPERMIND DATA

Let us now turn to Hypermind’s data. The prediction market has been operating since May 16th, 2014 with a panel of a few hundred traders recruited and rewarded based on performance.(1)

At this point, 75 questions of political, geopolitical, economic and business nature have been settled: questions about elections in Europe, the U.S., Brazil, Afghanistan and elsewhere, the P5+1 negotiations with Iran over its nuclear program, the war in Ukraine, the GE takeover of Alstom, the ECB stress test, the price of oil, and a whole lot more. The time horizon for the predictions in this data set was in the range of a few days to a few months. All in all, 41,442 trades have been conducted on 196 possible outcomes.

As the chart below illustrates, Hypermind’s predictions are well calibrated. The chart plots the percentage of events that occur at each price level between 1 and 99H (the market’s virtual money). It shows that the prices at which various outcomes are traded on the market can readily be interpreted as realistic probabilities for those outcomes, give or take a few percentage points.

caption goes here

To generate this chart, we recorded the price of each traded outcome every day at 12 Noon, grouped all outcomes traded at the same price and computed the percentage of them that actually occurred. The closer the data points are to the diagonal in the chart, the more the market’s prices predict true probabilities in the real world. Some data points are larger than others to indicate the relative number of outcomes traded at each price level. (The colors, however, are just for show!)

To assess discrimination, it is visually useful to plot the same data at a coarser level by clustering prices in ten intervals of 10H each. As the larger data points include more observations, we can see that most trades occur at price points closer to the extremes, where predictions are more certain, than towards the middle, around 50H, where uncertainty is at its peak.

calibration081214-10

By this measure, Hypermind is also usefully discriminating: For instance, on a daily basis, two thirds of its predictions (64%) indicate outcome probabilities below 20% (very unlikely) or above 80% (very likely). Similarly, 80% of its predictions are either unlikely (below 30%) or likely (above 70%).

COMPARISON POINTS

This analysis shows that Hypermind’s predictions are both accurate and actionable, but it tells us little about the intrinsic difficulty of the questions, nor about how well other forecasting methods might have done in comparison on those same questions.

Unfortunately for this purpose, only a few of the questions addressed by Hypermind so far have also been systematically forecasted by other methods or venues. That is partly by design, because the value of Hypermind predictions depends as much on their exclusivity as on their accuracy. We would rather focus on important questions that only few – but the right few – care about, than on entertaining issues that everybody else is already forecasting.

A particularly interesting point of comparison is with the Good Judgment Project, a multi-million dollar research project sponsored by the U.S. government’s Intelligence Advanced Research projects Activity.(2) Since August 2014, Hypermind has been allowed to forecast several dozens of the same geopolitical questions submitted to the Good Judgment forecasters. Based on the score of questions that have closed so far, Hypermind seems to be performing very well. However, there isn’t enough data yet to draw firm conclusions, so this is an issue we will revisit at a later date when more questions have closed.

In the meantime, events like political elections are both important and entertaining, and are widely forecasted. In an earlier post, we documented how Hypermind outperformed all the big-data statistical poll-aggregation models (aka Nate Silver and friends) when predicting the results of the 2014 U.S. midterm elections.

Although the comparative data is still sparse, it clearly suggests that Hypermind exhibits excellent accuracy not so much because the predictions are easy, but because it performs at a best-in-class level.


NOTES

(1) The first few hundred Hypermind traders were recruited based on remarkable performance in various prediction markets operated by NewsFutures and Lumenogic between 2000 and 2014.

(2) Full disclosure: Lumenogic, one of the firms backing Hypermind, has also been a member of the Good Judgment Project research team since 2012. Indeed, some of the prediction market technology used by Hypermind was originally developed by Lumenogic for this purpose.

Hypermind correctly predicted no deal with Iran on nuclear centrifuges

The year-long negotiations with Iran over its nuclear program have failed to reach an agreement by the november 24 deadline. At issue, in particular, was the number of centrifuges that Iran would be allowed to operate to enrich its uranium into weapons-grade material. It currently operates about 10,000, while the P5+1 countries initially aimed to bring that number below 4,000.

Starting in mid september, Hypermind featured a prediction market on this question, as part of a geopolitical contest featuring questions formulated by the Intelligence Advanced Research Projects Activity (IARPA) ACE project.

Iran centrifuges

As the chart shows, Hypermind’s forecast was correctly dire throughout the negotiations, predicting that no deal would be reached on that critical issue. Only briefly did it dip down from around 80% probability of “no agreement” to 50/50 uncertainty. The initial dip was caused by reports that the P5+1 countries, growing desperate for a deal, might allow Iran to operate 5 or 6,000 centrifuges… But the Hypermind prediction traders quickly resolved that this wouldn’t save the negotiations.

Hypermind out-predicts big-data models in the 2014 U.S. midterm elections

With Intrade gone and the rise of sophisticated statistical models à la FiveThirtyEight operated by various U.S. media, we haven’t heard much about prediction markets during the 2014 U.S. midterms election cycle. It was as if the allure of big data and statistical rock stars like Nate Silver had eclipsed the robust and well-documented success of collective human intelligence. Are prediction markets doomed to be road kill on the big-data super highway?

Not so fast.

In head-to-head comparisons, the Hypermind prediction market offers evidence that the aggregated brain power of a prediction market can still outpredict the much-hyped statistical machines.

Hypermind listed several stocks on the midterm elections in the 2014 U.S., focusing on control of the Senate and the 5 most undecided individual races in Kansas, Iowa, North Carolina, Colorado, and Georgia. This allows comparisons between Hypermind’s predictions and those of the 7 major statistical models: FiveThirtyEight (Nate Silver), Washington Post, New York Times, Huffington Post, Princeton Election Consortium, PredictWise, and Daily Kos.

In the analysis below we are comparing the predictions of each model against Hypermind, against each other, and against the average prediction of the 7 models. Importantly, we are not just comparing predictions made on election day, but throughout the weeks or months – depending on the question – during which the market and all models were simultaneously spewing predictions.*

Accuracy is measured using brier scores, which actually compute the error between the predictions and the true outcomes. The smaller the brier score, the better the prediction: a perfect prediction has a brier score of 0, while a chance prediction – think 50/50 – has a brier score of 0.5, and a totally wrong prediction scores 2.

To get a sense of how the methods compared overall, we computed for each question the brier score of each method every day throughout the comparison period. Then we averaged those daily brier scores into a mean daily brier score for each method and each question. Then we averaged those across the 6 questions to get an overall mean daily brier score for each method.**

The chart below plots the results. By this measure, all models except Princeton’s did slightly better than chance, but Hypermind out-predicted all of them, including the average prediction of all the models (“Models Mean”).

midterms2014Overall

We then took a closer look at these elections’ most important question: would Republicans win control of the Senate? In this case, Hypermind again out-performed all the models, as can be seen in the chart below. Except for the Washington Post’s, all the models remained, throughout the comparison period, much less confident than Hypermind in the Republican’s ultimate control of the Senate.

 midterms2014Senate

The Washington Post model, although more unstable that any other – notice the large dip around 50% from late august to mid-september – did particularly well at the end of the campaign, so Hypermind’s advantage isn’t as visually obvious as it is against the other models. However, if we compute the average daily brier scores over the entire period during which the Washington Post and Hypermind operated in parallel – from early july to election day – we find a 36% accuracy advantage for Hypermind (.096) over the Washington Post (.150).

There is an important lesson to be learned here: even in this age of big data and super computers, human collective intelligence is still our best means of predicting the future. Isn’t that reassuring?

Notes
(*) The periods of comparison for each question were as follows: Senate Control [sept. 3 to nov. 4]; IA, KS, CO, NC [oct. 9 to nov. 4]; GA [oct. 20 to nov. 4].
(**) Computing mean daily brier scores over entire forecasting periods, like we do here, is also how the geopolitical predictions of the IARPA-sponsored Good Judgment Project are being scored by the U.S. Government.
Data Sources
Hypermind’s daily closing prices for each contract are available for download in Excel format.
Models’ data were recorded by the New York Times here and here. Available for download courtesy of the Upshot’s Josh Katz.