Review

In late 2020 we asked forecasters to predict the fate of 400 popular COVID-19 preprints posted to bioRxiv and medRxiv between January and September of that year. For each we asked:

  • Would it be published within a year of posting, and if so, would it be in a high-impact (JIF≥10) journal?  [No / High / Low]
  • Rank its citations on a 0..100 scale for the 400 preprints, as of a year after posting, as counted by Google Scholar. (Including cites to the published form, if any.)
We ran both surveys and markets, but we’re concerned with markets here. There were glitches — between preprint selection and market launch, many preprints had already been published. (Also here; and here.) That glitch revealed  large liquidity problems due to fewer forecasters than expected. Happily, liquidity can be adjusted by adding points, which we did.  In the end we had over 14K trades by over 63 forecasters

Now it’s time to resolve the questions. Then we can pay prizes. 

 

Resolving Publication

To resolve publication within a year of posting, we:

  • checked bio/medRxiv for links to the published version
  • consulted @scottleibrand’s spreadsheet
  • reviewed email tips from forecasters (esp. @vitorper, @unipedal)
  • web searches for authors, key text
  • resolved conflicts among the previous
  • deduplicated one preprint (Hess-2020) that had two entries on medRxiv, one marked published and the other not

There were 137 unique journals.  Top destinations were:

Nature                       22
Science 16
Nature Communications 14
Emerging Infections Diseases 7
Eurosurveilance 7

To resolve high/low impact factor (2020 JIF ≥10), we had several people use the JCR 2020 index (free account) and resolved discrepancies by manual inspection:

  • In 4 cases one source had >10 and the other <10.
  • 3 of these were caused by JIF trending across the threshold and either JCR updating or us reading the 2019 value.
  • 1 case was caused by getting a similarly-named journal

 

Resolving Citations

Citation counts were gathered from Google Scholar using the doi for the original preprint and for its published version.  

  • When we did this on the 1-year anniversary it was straightforward.
  • Otherwise we use Google Scholar’s citation-by-date function.

However, not all citations are dated. Then we estimate actual citations by multiplying total citations by the proportion of dated citations occurring before the 1-year resolution date. 

For example, Daneshkhah-2020-medRxiv has the following data:

Total Citations
Dated Before
Total Dated
220
73
114
 
  • “Total Citations” is the count when we looked (after 1-year).
  • “Dated Before” counts the dated citations occurring before the 1-year resolution date.
  • “Total Dated” is the total citations in citations-by-date function.
Then Estimated = (Total Citations) * (Dated Before) / (Total Dated)
Here, 220 * (73/114) = 140.877
 
These are then transformed into 1..100 ranks using Pandas’ .rank(pct=True)*100.  

Check Our Resolutions

As noted in the previous post, you are invited to check our resolutions for accuracy.  We hope to finalize these resolutions and announce prizes next week.