C19 Preprint Resolution Details


In late 2020 we asked forecasters to predict the fate of 400 popular COVID-19 preprints posted to bioRxiv and medRxiv between January and September of that year. For each we asked:

  • Would it be published within a year of posting, and if so, would it be in a high-impact (JIF≥10) journal?  [No / High / Low]
  • Rank its citations on a 0..100 scale for the 400 preprints, as of a year after posting, as counted by Google Scholar. (Including cites to the published form, if any.)
We ran both surveys and markets, but we’re concerned with markets here. There were glitches — between preprint selection and market launch, many preprints had already been published. (Also here; and here.) That glitch revealed  large liquidity problems due to fewer forecasters than expected. Happily, liquidity can be adjusted by adding points, which we did.  In the end we had over 14K trades by over 63 forecasters

Now it’s time to resolve the questions. Then we can pay prizes. 


Resolving Publication

To resolve publication within a year of posting, we:

  • checked bio/medRxiv for links to the published version
  • consulted @scottleibrand’s spreadsheet
  • reviewed email tips from forecasters (esp. @vitorper, @unipedal)
  • web searches for authors, key text
  • resolved conflicts among the previous
  • deduplicated one preprint (Hess-2020) that had two entries on medRxiv, one marked published and the other not

There were 137 unique journals.  Top destinations were:

Nature                       22
Science 16
Nature Communications 14
Emerging Infections Diseases 7
Eurosurveilance 7

To resolve high/low impact factor (2020 JIF ≥10), we had several people use the JCR 2020 index (free account) and resolved discrepancies by manual inspection:

  • In 4 cases one source had >10 and the other <10.
  • 3 of these were caused by JIF trending across the threshold and either JCR updating or us reading the 2019 value.
  • 1 case was caused by getting a similarly-named journal


Resolving Citations

Citation counts were gathered from Google Scholar using the doi for the original preprint and for its published version.  

  • When we did this on the 1-year anniversary it was straightforward.
  • Otherwise we use Google Scholar’s citation-by-date function.

However, not all citations are dated. Then we estimate actual citations by multiplying total citations by the proportion of dated citations occurring before the 1-year resolution date. 

For example, Daneshkhah-2020-medRxiv has the following data:

Total Citations
Dated Before
Total Dated
  • “Total Citations” is the count when we looked (after 1-year).
  • “Dated Before” counts the dated citations occurring before the 1-year resolution date.
  • “Total Dated” is the total citations in citations-by-date function.
Then Estimated = (Total Citations) * (Dated Before) / (Total Dated)
Here, 220 * (73/114) = 140.877
These are then transformed into 1..100 ranks using Pandas’ .rank(pct=True)*100.  

Check Our Resolutions

As noted in the previous post, you are invited to check our resolutions for accuracy.  We hope to finalize these resolutions and announce prizes next week. 

One thought on “C19 Preprint Resolution Details”

Contribute to the discussion...

This site uses Akismet to reduce spam. Learn how your comment data is processed.

May we send you invitations to future research projects? 

Add your email address to our low-volume distribution list.

Note: We cannot re-use participant lists for new recruiting. Please opt-in (above) for occasional announcements about related future studies.

We’re sorry to see you go! Please visit our social media sites.

This site uses cookies to provide you with a better browsing experience.

Visit our Privacy Policy for more information.