C19 Preprint Resolution Details

Jan 21 2022

Review

In late 2020 we asked forecasters to predict the fate of 400 popular COVID-19 preprints posted to bioRxiv and medRxiv between January and September of that year. For each we asked:

Would it be published within a year of posting, and if so, would it be in a high-impact (JIF≥10) journal? [No / High / Low]
Rank its citations on a 0..100 scale for the 400 preprints, as of a year after posting, as counted by Google Scholar. (Including cites to the published form, if any.)

We ran both surveys and markets, but we’re concerned with markets here. There were glitches — between preprint selection and market launch, many preprints had already been published. (Also here; and here.) That glitch revealed large liquidity problems due to fewer forecasters than expected. Happily, liquidity can be adjusted by adding points, which we did. In the end we had over 14K trades by over 63 forecasters.

Now it’s time to resolve the questions. Then we can pay prizes.

Resolving Publication

To resolve publication within a year of posting, we:

checked bio/medRxiv for links to the published version
consulted @scottleibrand’s spreadsheet
reviewed email tips from forecasters (esp. @vitorper, @unipedal)
web searches for authors, key text
resolved conflicts among the previous
deduplicated one preprint (Hess-2020) that had two entries on medRxiv, one marked published and the other not

There were 137 unique journals. Top destinations were:

Nature                       22
Science                      16
Nature Communications        14
Emerging Infections Diseases  7
Eurosurveilance               7

To resolve high/low impact factor (2020 JIF ≥10), we had several people use the JCR 2020 index (free account) and resolved discrepancies by manual inspection:

In 4 cases one source had >10 and the other <10.
3 of these were caused by JIF trending across the threshold and either JCR updating or us reading the 2019 value.
1 case was caused by getting a similarly-named journal

Resolving Citations

Citation counts were gathered from Google Scholar using the doi for the original preprint and for its published version.

When we did this on the 1-year anniversary it was straightforward.
Otherwise we use Google Scholar’s citation-by-date function.

However, not all citations are dated. Then we estimate actual citations by multiplying total citations by the proportion of dated citations occurring before the 1-year resolution date.

For example, Daneshkhah-2020-medRxiv has the following data:

Total Citations	Dated Before	Total Dated
220	73	114

“Total Citations” is the count when we looked (after 1-year).
“Dated Before” counts the dated citations occurring before the 1-year resolution date.
“Total Dated” is the total citations in citations-by-date function.

Then Estimated = (Total Citations) * (Dated Before) / (Total Dated)

Here, 220 * (73/114) = 140.877

These are then transformed into 1..100 ranks using Pandas’ .rank(pct=True)*100.

Check Our Resolutions

As noted in the previous post, you are invited to check our resolutions for accuracy. We hope to finalize these resolutions and announce prizes next week.

Posted inUncategorized

Review

Resolving Publication

Resolving Citations

Check Our Resolutions

One thought on “C19 Preprint Resolution Details”

Leave a Reply to Resolutions for Covid-19 Preprints – Replication MarketsCancel reply