Analysis of survey and prediction market data from previous large-scale replication projects

By Domenico Viganola

How reliable and representative are the research findings in scientific publications?

Motivated by concerns about replication rates in the social and behavioural sciences, a number of large-scale replication projects were initiated (see Ref. 1).

The typical replication project is aimed to evaluate a large sample of studies from specific research fields through direct replication, which involves the collection and analysis of novel experimental data with methodology as similar as possible to the original study.

 When the main replication indicator is defined as a statistically significant effect in the same direction as in the original study (typically p < 0.05 in a two-sided test), the ‘successful’ replication rates range from 39% to 62%, meaning that between one third and two thirds of the selected studies find results consistent with the original ones.

All the projects share a similar structure, and the participants’ forecasts are elicited in an analogous way: before the replication outcomes become public information, peer researchers first participate in a survey eliciting beliefs about the replication probability of each selected claim and thereafter participate in prediction markets that lasted, on average, 2 weeks. In line with the surveys, also the prediction markets are designed to predict which studies are likely to replicate and which ones are not. When pooling the prediction markets and the prediction surveys data from these four studies, we created a dataset with the elicited peer beliefs about the replication outcomes of 104 published studies, mainly in the fields of experimental psychology and experimental economics. Pooling the data across these four studies allows getting substantially more statistical power to test the performance of the prediction markets and surveys than in the individual studies based on relatively few observations. Overall, 51 out of the 104 studies replicated.

Interpreting a predicted replication probability over 50% as predicting a successful replication and a predicted replication probability below 50% as predicting a failed replication, we find that the prediction markets correctly predict 76 of the 104 studies (73%), and that the survey correctly predicted 68 out of 103 studies (66%).

Both the markets and the surveys are more accurate when concluding that a study will not replicate rather than when concluding that a study will replicate. The prediction accuracy of the markets is higher than that of surveys. Both the prediction market estimates of the likelihood of successful replication and the prediction surveys estimates are highly correlated with the replication outcomes of the studies selected for replication, suggesting that to some extent, studies that replicate are systematically different and identifiable from studies that do not successfully replicate. This implies that peer beliefs can be elicited to obtain important information about the reproducibility of scientific claims.

Figure 1: Prediction market and survey beliefs for the successful replication probability. The figure shows the beliefs elicited through prediction markets – filled dots – and through surveys – hollow dots with a cross – for each of the studies in the pooled dataset. The replication studies on the y-axis are ranked in terms of final prices of the prediction markets, with studies less likely to replicate at the bottom and the studies more likely to replicate at the top. Both the prediction markets beliefs and the survey beliefs are highly correlated with successful replications (Spearman correlations = 0.567, p < 0.001, n = 104 for the prediction markets and 0.557, p < 0.001, n = 103 for the survey).

In terms of comparing which elicitation method performs better in the task of aggregating beliefs and providing more accurate forecasts, the empirical analyses performed with the available data suggest that the markets perform somewhat better than the simple survey averages. However, it is likely that alternative methods to aggregate survey responses will yield more predictive survey-based forecasts.

Want to receive updates about Replication Markets? Share your contact information below.