We have big changes to announce! Read to learn about…
- simpler definition of replication
- more markets paying out
- simpler surveys
Simpler Definition of Replication
SCORE has adopted a simpler, unified definition of replication:
Replication is testing the same claims using data that was not used in the original study.
That required some changes from us. Starting in Round 6, Replication Markets will no longer distinguish between “data replication” and “direct replication.”
Studies selected for replication will use the most appropriate type, and all will count for market resolutions.
More Markets Pay Out!
The new definition of replication increases the expected market replications from about 50 in Rounds 1-5 to about 125 in Rounds 6-10.
Simpler Surveys
Survey-Takers, rejoice! The new definition also means we can simplify our surveys!
Surveys now have only 4 questions (rather than 7). In essence, they ask:
- What is the probability that a high-power replication would succeed?
- What fraction of participants will give an estimate larger than 50% on the previous question?
- How plausible is this claim?
- Is there anything else we should know?

With the change, the other three questions became redundant. We considered keeping them anyway, but they were providing little information: Old Q3 was 93% correlated with old Q1, with a slope of 1. (We will make this data available to explore after claims resolve.)
9 thoughts on “Changes in Round 6”
[…] to the changes in SCORE’s definition of “replication,” (see our blog post), more claims will resolve. This necessitates a change in our prize structure and approval by the […]
[…] change in definition of “replication” (see our blog post on this topic) more than doubled the number of claims that will resolve in Round 6 and later. This has affected […]
But what if it is impossible to find data ‘not used in the original study’ that is still a direct test of the claims? … if the claims are very specific and not based on a replicable experiment (e.g., ‘impact of XXX on the incidence of civil war in Latin America in the 20th century’)
@david – Good question. If the claims made it to RM, then it’s likely the replication team at Center for Open Science thought there was a path to replication. Now, it’s possible some claims sneaked in that can only be tested for “reproducibility” – in which case you would want to skip them in the market. But it’s also possible COS thought the relevant claim was broader than that particular data set. In this case, is the claim only about 20thC Latin America, or would it apply to 19th or 21st C Latin America? Or 20thC Africa?
(Back that the start, we suspected forecasters would be able to guess which claims would never be tested, so the old survey asked about that. But we weren’t seeing signal there, so dropped it in favor of getting more shorter surveys for the same effort.)
Thanks for asking!
OK thanks. But some traders may assume that these claims will be 1. chosen by random, and 2. if only testable for reproducability/coding replication they *would* be tested/paid. That was my impression.
I’d be interested to get a broader impression. Are you on Reddit? We started using r/ReplicationMarkets for the Preprint Markets and are seeing more discussion there than we did here.
Context: It turns out the pandemic forced CoS to do far fewer replications than planned, and is making up the gap in reproductions. This may be especially true of Round 11 (100 COVID claims). Our read is we cannot use reproductions because the Rules specified direct/data replications, and forecasting a reproduction is very different.
I’d love to be able to use the reproductions, but given the Rules and our responses to questions, I don’t see a clear path. We can probably assume that if a group replications becomes reproductions, the forecast chances should increase — unless as you say most forecasters were really treating them as reproductions anyway.
Among all all replications that become reproductions, we could perhaps assume the **rank order** of forecasts remains the same. But that’s not as clear, and even if true, may not get us very far.
I’m so sorry, I just saw this comment; I am bad with WordPress it seems.
It is indeed challenging
It looks like we may have 100 usable replication results in the near future. Still far fewer than expected, but maybe enough to say “good enough”.