What does peer review know?
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here.
You can listen to this post above, or via most podcast apps here.
People rag on peer review a lot (including, occasionally, New Things Under the Sun). Yet it remains one of the most common ways to allocate scientific resources, whether those be R&D dollars or slots in journals. Is this all a mistake? Or does peer review help in its purported goal to identify the science most likely to have an impact and hence, perhaps most deserving of some of those limited scientific resources?
A simple way to check is to compare peer review scores to other metrics of subsequent scientific impact; does peer review predict eventual impact?
A number of studies find it does.
Subscribe now
Peer Review at the NIH
Let’s start with peer review at the stage of reviewing research proposals.
Li and Agha (2015) looks at more than 100,000 research projects funded by the NIH over 1980-2008, comparing the percentile rank of the application peer review scores to the outcomes of these research projects down the road. For each grant, they look for publications (and patents) that acknowledge the grant’s support. Besides counting the number of publications and patents each grant results in, they can also see how often the publications are cited. Note, they are only looking at projects that actually were funded by the NIH, so we don’t need to worry that their results are just picking up differences between funded and unfunded projects.
The upshot is, better peer review scores are correlated with more impact, whether you want to measure that as the number of resulting journal articles, patents, or citations. For example, here’s a scatter plot of the raw data, comparing peer review percentile ranks (lower is better) to citations and publications. Lots of noise, but among funded projects, if people think your proposal is stronger, you’re more likely to get publications and citations.
From Li and Agha (2015)
Li and Agha also look at the correlation between peer review scores and impact measures after controlling for other potentially relevant factors, such as the year or field of the grant, or the PI’s publication history, institution, and career characteristics. The results are moderated a bit, but basically still stand - compare two grants in the same year, in the same study section, from PIs who look pretty similar on paper, and the grant with higher peer review scores will tend to produce more papers, patents, receive more citations, and produce more very highly cited papers.
Among funded proposals, the predictive power of peer review seems to be highest at the top; the difference in citations, for example, between a top-scoring proposal and one at the 20th percentile tends to be much larger than the difference in citations between one at the 20th and 40th percentile.1 Moreover, even at the top, the correlation between peer review scores and outcomes isn’t great. If you compare proposals that score at the top to proposals at the 10th percentile (of grants that were ultimately still funded), the top proposal is twice as likely to result in a one-in-a-thousand top cited paper. I think that’s not actually that high - since a 10th percentile proposal isn’t that far off from the average, if peer review was really accurate, you might have expected the top proposal to be something like ten times as likely to produce a hit paper than as an average proposal.
Park, Lee, and Kim (2015) exploits a peculiar moment in NIH history to provide further evidence that the NIH peer review processes, on average, pick projects with higher scientific impact. In 2009, the US government passed the American Recovery and Reinvestment Act, a stimulus bill meant to fight the economic headwinds of the 2008 financial crisis. The bill authorized $831bn in new spending, of which a tiny corner, $1.7bn, was used by the NIH to fund research projects that would not normally have been funded. This provides a rare opportunity to see how good projects that would otherwise have been rejected by the NIH (which relies heavily on peer review to select projects) fare when they unexpectedly receive funding.
When Park, Lee, and Kim (2015) compare stimulus-funded proposals (which got lower peer review scores) to normally funded proposals, they find the stimulus-funded proposals tend to lead to fewer publications and that these publications tended to receive fewer citations. On average, a research proposal with peer review scores high enough to be funded under the NIH’s normal budget produces 13% more publications than a stimulus funded project. If we focus on a proposal’s most high-impact publication (in terms of citations), Park and coauthors find proposals funded only because of the stimulus got 7% fewer citations. Lastly, we can look at the 5% of publications funded by these NIH grants that received the highest amount of citations. A normally funded research proposal had a 7% chance of producing one of these “highest impact” papers; a stimulus-funded proposal had a 4% chance of producing one.
I think these results are pretty consistent with Li and Agha (2015) in a few ways. They replicate the general finding that in the NIH, higher peer review scores are associated with more research impact (as measured with imperfect quantitative methods). But they also find peer review doesn’t have super forecasting acumen. Note that Park, Lee, and Kim are not comparing proposals that just barely clear the NIH’s normal funding threshold to proposals that just barely miss it - they don’t have the data needed for that. Instead, they are comparing the entire batch of proposals rated above the NIH’s normal funding threshold to a batch of proposals that fall uniformly below it. The batch of normally funded proposals includes the ones that were rated very highly by peer review, which Li and Agha’s work suggests is where peer review tends to work best. Even so, the differences Park, Lee, and Kim find aren’t enormous.
Peer Review at Journals
We have some similar results about the correlation between peer review scores and citations at the publication stage too. As discussed in more detail in Do academic citations measure the impact of new ideas? Card and DellaVigna (2020) have data on about 30,000 submissions to four top economics journals, including data on their peer review scores over (roughly) 2004-2013. Because, in economics, it is quite common for draft versions of papers to be posted in advance of publication, Card and Dellavigna can see what happens to papers that are accepted or rejected from these journals, including how many citations they go on to receive (both as drafts and published versions). As with Li and Agha (2015), they find there is indeed a positive correlation between the recommendation of reviewers and the probability a paper is among the top 2% most highly cited in the journal.
From Card and Dellavigna (2020)
Neither is this because high peer review scores lead to publication in top economics journals (though that’s also true). Card and Dellavigna also track the fate of rejected articles and find that even among rejects to these journals, those that get higher peer review scores still go on to receive more citations.
Siler, Lee, and Bero (2014) obtain similar results using a smaller sample of submissions to the Annals of Internal Medicine, the British Medical Journal, and The Lancet over 2003 and 2004. For a sample of 139 submissions that received at least two peer review scores, they can track down the eventual fate of the submission (either published in one of these three journals or another). Among the 89 peer-reviewed submissions that were ultimately rejected, the peer review scores (from the first, initial review) were positively correlated with the number of citations the submissions eventually received, though the correlation was pretty weak. For the 40 submissions that were reviewed and accepted, again positive (initial) peer review reports were positively correlated with the number of citations eventually received. In this latter case, the correlation was too weak to be confident it’s not just noise (possible because the sample was so small).
Siler, Lee, and Bero also emphasize that the three journals actually rejected the 14 papers that would go on to receive the most citations (though they did manage to get the 15th!).
From Siler, Lee, and Bero (2014)
Perhaps more reassuring is the fact that generally speaking, papers that went on to be highly cited tended to be identified as publishable in other papers pretty quickly. The figure below compares the eventual number of citations received to the time elapsed between submission to one of the three journals under study and eventual publication somewhere else. No highly cited papers took longer than 500 days (not great, but better than 2000!) to find a home. That could be because peer review at one of the next journals the paper was submitted to was quick to recognize the quality of these articles, or possibly that they rapidly resubmitted after getting favorable feedback from initial peer reviewers. But this evidence is pretty indirect and other explanations are also possible (for example, maybe the authors believed in the paper’s merit and submitted them more frequently for review, or they were more frequently desk-rejected and so could be resubmitted fast).
From Siler, Lee, and Bero (2014)
That said, we also have one more study looking at peer review reports and eventual impact, this time in the American Sociological Review. Teplitskiy and Bakanic (2016) have data on 167 articles published in the American Sociological Review in the 1970s, as well as their peer review scores. Among this set of published article, they find no statistically significant relationship between peer review scores and the number of citations papers go on to earn.
After a...