Can taste beat peer review?
Like the rest of New Things Under the Sun, this article will be updated as the state of the academic literature evolves; you can read the latest version here.
You can listen to this post above, or via most podcast apps here.
Note: Have an idea for a research project about how to improve our scientific institutions? Consider applying for a grant of up to $10,000 from the Metascience Challenge on experiment.com, led by Paul Niehaus, Caleb Watney, and Heidi Williams. From their call for proposals:
We're open to a broad set of proposals to improve science -- for example, experimental designs, surveys, qualitative interviews with scientists, pilot programs for new mechanisms, scientific talent development strategies, and other research outputs that may be relevant for scientific research funders.
The deadline to apply is April 30. On to our regularly scheduled programming!
Scientific peer review is widely used as a way to distribute scarce resources in academic science, whether those are scarce research dollars or scarce journal pages.1 Peer review is, on average, predictive of the eventual scientific impactof research proposals and journal articles, though not super strongly. In some sense, that’s quite unsurprising; most of our measures of scientific impact are, to some degree, about how the scientific community perceives the merit of your work: do they want to let it into a journal? Do they want to cite it? It’s not surprising that polling a few people from a given community is mildly predictive of that community’s views.
At the same time, peer review has several potential short-comings:
Multiple people reading and commenting on the same document costs more than having just one person do it
Current peer review practices provides little incentives to do a great job at peer review
Peer review may lead to biases against riskier proposals
One alternative is to empower individuals to make decisions about how to allocate scientific resources. Indeed, we do this with journal editors and grant makers, though generally in consultation with peer review. Under what conditions might we expect individuals empowered to exercise independent judgement to outperform peer review?
To begin, while peer review does seem to add value, it doesn’t seem to add a ton of value; at the NIH, top-scoring proposals aren’t that much better than average, in terms of their eventual probability of leading to a hit (see this for more discussion). Maybe individuals selected for their scientific taste can do better, in the same way some people seem to have an unusual knack for forecasting.
Second, peer reviewers are only really accountable for their recommendations insofar as it affects their professional reputations. And often they are anonymous, except to a journal editor or program manager. That doesn’t lead to strong incentives to try and really pin down the likely scientific contribution of a proposal or article. To the extent it is possible to make better judgments by exerting more effort, we might expect better decision-making from people who have more of their professional reputation on the line, such as editors and grant-makers.
Third, the very process of peer review may lead to risk aversion. Individual judgment, relying on a different process, may be able to avoid these pitfalls, at least if taking risks is aligned with professional incentives. Alternatively, it could be that a tolerance for risk is a rare trait in individuals, so that most peer reviewers are risk averse. If so, a grant-maker or journal that wants to encourage risk could do so by seeking out (rare) risk-loving individuals, and putting them in decision-making roles.
Lastly, another feature of peer review is that most proposals or papers are evaluated independently of each other. But it may make sense for a grant-maker or journal to adopt a broader, portfolio-based strategy for selecting science, sometimes elevating projects with lower scores if they fit into a broader strategy. For example, maybe a grant-maker would want to support in parallel a variety of distinct approaches to a problem, to maximize the chances at least one will succeed. Or maybe they will want to fund mutually synergistic scientific projects.
We have a bit of evidence that empowered individual decision-makers can indeed offer some of these advantages (often in consultation with peer review).
Subscribe now
Picking Winners Before Research
To start, Wagner and Alexander (2013) is an evaluation of the NSF’s Small Grants for Exploratory Research programme. This program, which ran from 1990-2006, allowed NSF programme managers to bypass peer review and award small short-term grants (up to $200,000 over 2 years).2 Proposals were short (just a few pages), made in consultation with the programme manager (but not other external review), and processed fast. The idea was to provide a way for programme managers to fund risky and speculative projects that might not have made it through normal peer review. Over its 16 years, the SGER (or “sugar”) program disbursed $284mn via nearly 5,000 awards.
Wagner and Alexander argue the SGER program was a big success. By the time of their study, about two thirds of SGER recipients had used their results to apply for larger grant funding from the conventional NSF programs, and of those that applied 80% were successful (at least, among those who had received a decision). They also specifically identify a number of “spectacular” successes, where SGER providing seed funding for highly transformative research (judged as such from a survey of SGER awardees and programme managers, coupled with citation analysis).
Indeed, Wagner and Alexander’s main critique of the programme is that it was insufficiently used. Up to 5% of agency funds could be allocated to the program, but a 2001 study found only 0.6% of the budget actually was. Wagner and Alexander also argue that, by their criteria, around 10% of funded projects were associated with transformational research, whereas a 2007 report by the NSF suggests research should be transformational about 3% of the time. That suggests perhaps program managers were not taking enough risks with the program. Moreover, in a survey of awardees, 25% said an ‘extremely important’ reason for pursuing an SGER grant was that their proposed research idea would be seen as either too high-risk, too novel, too controversial, or too opposed to the status quo for a peer review panel. That’s a large fraction, but it’s not a majority (the paper doesn’t report the share who rate these factors as important but not extremely important though). Again, maybe the high-risk programme is not taking enough risks!
In general though, the SGER programme’s experience seems to support the idea that individual decision-makers can do a decent job supporting less conventional research.
Goldstein and Kearney (2018) is another look at how well discretion compares to peer review, this time in the context of the Advanced Research Projects Agency - Energy (ARPA-E). ARPA-E does not function like a traditional scientific grant-maker, where most of the money is handed out to scientists who independently propose projects for broadly defined research priorities. Instead, ARPA-E is composed of program managers who are goal oriented, seeking to fund research projects in the service of overcoming specific technological challenges. Proposals are solicited and scored by peer reviewers along several criteria, on a five-point scale. But program managers are very autonomous and do not simply defer to peer review; instead, they decide what to fund in terms of how proposals fit into their overall vision. Indeed, in interviews conducted by Goldstein and Kearney, program managers report that they explicitly think of their funded proposals as constituting a portfolio, and will often fund diverse projects (to better insure at least one approach succeeds), rather than merely the highest scoring proposals.
From Goldstein and Kearney (2018)
Goldstein and Kearney have data on 1,216 proposals made up through the end of 2015. They want to see what kinds of projects program managers select, and in particular, how they use their peer review feedback. Overall, they find proposals with higher average peer review scores are more likely to get funded, but the effects are pretty weak, explaining about 13% of the variation in what gets funded. The figure above shows the average peer review scores for 74 different proposals to the “Batteries for Electrical Energy Storage in Transportation” program: filled in circles were funded. As you can see, program managers picked many projects outside the top.
From Goldstein and Kearney (2018)
What do ARPA-E program managers look at, besides the average peer review score? Goldstein and Kearney argue that they are very open to proposals with highly divergent scores, so long as at least one of the peer review reports is very good. Above, we have the same proposals to the Batteries program listed above, but instead of ordering them by their average peer review score, now we’re ordering them by their maximum peer review score. Now we’re seeing more proposals getting funded that are clustered around the highest score. This is true beyond the battery program: across all 1,216 project proposals, for a given average score, the probability of being funded is higher if the proposal receives a wider range of peer review scores. Goldstein and Kearney also find proposals are more likely to be funded if they are described as “creative” by peer reviewers, even after taking into account the average peer review score.
ARPA-E was first funded in 2009, and this study took place in 2018, using proposals made up through 2015. So there hasn’t been a ton of time to assess how well the program has worked. But Goldstein and Kearney do an initial analysis to see how well projects turn out when program managers use their discretion to override peer review. To do this, they divid...