[Updated Jan. 2020 to cover unified definition, and clarify]
Concept
Replication is when you repeat a previous study to see if you get the same results. Ideally this happens a lot in science. In practice, not as much. But what counts?
According to Brian Nosek (“What is replication?”, 2019), people commonly say replication is repeating a study’s procedure and observing whether the previous finding recurs, but this definition fails because the changes to procedure define the replication.
Consider: when replicating an Israeli study in the US, he didn’t use the original materials – they were in Hebrew! Replications change participants, campus, country, language, etc. But if any of these negated something like the Stroop effect, that would be a surprising and important discovery.
So, Nosek argues replication is really a conceptual notion.
Replication is attempting to reproduce a previously observed finding with no reason to expect a different outcome.
(He says “no a priori reason” — reasons made up afterwards don’t count. We also assume competent, good-faith replication.)
Claiming something is a replication “is a theoretical commitment.”
Practical Definition for DARPA SCORE
Originally, we distinguished between direct replications and data replications, and privileged direct replications. Direct replications test the original claim by gathering new data, such as a new psychology experiment.
Data-analytic replications test the original claim using new found data, data appropriate for replication but not originally collected by the replication research team. For example, using the same economic indicator as the original-study researchers, but from a different time period.
But direct replications exclude most economics, sociology, and political science. So, as of February 2020 (Round 6), we will adopt DARPA’s new unified definition:
A replication is testing the same claim using data that was not used in the original study.
Unifying increases the number of evaluated claims in SCORE from 100 to 250. (However, R1-R5 markets will only pay prizes for direct replications, because that’s what we said then.)
Sample Size & Statistical Power
A decent replication has to have a sample large enough that failure to detect a result is almost certainly due to the claim being wrong, rather than not looking hard enough. If we redo the study but with only 3 participants, the result is almost certainly noise.
We cannot just use the original sample size: most published studies are actually too small. Therefore we need to ensure sufficient power.
You might want to read:
- What is statistical power?
- What is a high-quality replication?
- FAQ: What is the replication crisis?
One thought on “What is replication?”
[…] will presume you already know what is a replication. So… what is “high quality”? Ideal first, then […]