The use of p-values and hypothesis testing is ubiquitous across science. Everyone remembers the rules they learnt in their first statistics courses; 0.05 is the magic cut-off for statistical significance. Statistical evidence with p-values below 0.05 are accepted – models with p-values higher than this threshold are discarded.
However, this statistical practice has come under increasing criticism. In 2016 the American Statistical Association (Wasserstein & Lazar, 2016) released a statement with six principles on the interpretation of the p-value, with the view to to encourage data analysis to go beyond the p-value. Benjamin et al (2018) proposed changing the default cut-off of statistical significance to 0.005.
We looked at the replication outcomes and associated original p-values of 104 studies. These 104 studies were replicated as part of the Replication Project Psychology (RPP), Many Labs 2 Project (ML2), Experimental Economics Replication Project (EERP), or the Social Science Replication Project (SSRP).
Of these 104 studies’ hypothesis tests, 5 (4.8%) studies had p-values above 0.05 (significant at the 10% level), 40 (38.5%) studies had p-values between 0.01 and 0.05 (significant at the 5% level), 25 (24%) studies had p-values between 0.01 and 0.001 and 34 (32.7%) studies had p-values of 0.001 and below.
Analysing the replication rates in each of these categories we can we see that p-value of the original study alone is good predictor of replication success.
- For p-value <=0.001 replication success rate is 82%
- For p-value <= 0.01 replication success rate is 44%
- For p-value > 0.01 replication success rate is 27%
This information provides us a great starting point for forecasting the replication rates. When setting up prediction markets, an initial price is set. In previous replication markets (RPP, EERP, SSRP and ML2) initial prices were set at 0.5. This initial pricing was not too far off overall replication rate (around 49%), and was a simple, straight-forward starting price. However, we can see from above that we can make a much more informed initial estimate of a study’s replication success probability through the p-value of the original study.
To set the initial prices, we will split each study into one of three categories based on its p-value.
- For studies with p-values of 0.001 or below the initial price will be 0.8
- For studies with p-values between 0.001 and 0.01 the initial price will be 0.4
- For studies with p-values higher than 0.01 the initial price will be 0.3
These initial prices are simply the rounded figures of the percentage successful replications in the past data.
- Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6. https://doi.org/10.1038/s41562-017-0189-z