Scientists are in a real bind when it comes to peer review. It’s hard to be objective when we’re all among the peer reviewing and peer-reviewed, or plan to be. Still, we should be able to mobilize science’s repertoire to solve our problems.
Yet, with exceptions for a few journals – most notably The BMJ – we haven’t used strong scientific methods to decide what to do about peer review for publications. We’ve largely let peer review remain in science’s blind spot.
When it comes to the potential for bias in the editorial process, that blind spot often turns into a fiercely hot one. Perhaps most of all on the question of being open with identities. Personal convictions and fears, with a study or two that reinforce them, drive strong opinions that in turn drive practice.
Determining practice based on a consensus of practitioner opinion doesn’t always end well, though, does it? Even well-intentioned interventions can cause harm – or have no real consequences at all, leaving the problems untouched. And there’s a lot that can go wrong with editorial review.
There are some consequences that flow inevitably from the choice of anonymity or naming, like workload for journals, or the ability for peer reviewer conflicts of interests unknown to editors to be revealed. I’ll come back to that later. But first, what evidence do we have that masking the identities of authors and peer reviewers achieves what it is meant to?
Well, it’s complicated. Which means it really needs a solid, up-to-date systematic review. The only systematic review, by Tom Jefferson and colleagues, has several drawbacks though. It doesn’t address fairness or bias as outcomes, for example. It’s restricted to biomedical journals, and the search for studies was done in 2004. [I’ve added a comment about it on PubMed Commons: archived here, with author reply.] [Update on 12 June 2016: A new systematic review by Rachel Bruce and colleagues addressed openness and blinding. It was more limited in scope – only randomized controlled trials and biomedical journals – and found no studies that were not included here. I added a comment on PubMed Commons, archived here, with author reply.]
So I’ve taken a deep dive into this literature. I found 17 relevant comparative studies, 12 of which are controlled trials. The quality of these studies varies greatly, especially the ability to control for variables. Some are in hypothetical situations. But there are some very good, decent-sized trials.
There is a numbered list of all the relevant studies I found, alphabetically by author, with links, at the bottom of this post. And there’s a declaration of my own interests (including close ties with The BMJ, which features prominently in this literature).
Let’s start with authors. Blinding peer reviewers to authors’ identity is one of the most commonly advocated strategies to ensure that the most deserving papers make it past the peer review gauntlet.
When it’s coupled with a policy of not revealing peer reviewers’ identities, this style is called double-blind peer review. It can mean having a cover page without the authors’ identities and affiliations, through to more detailed effort to remove identifying clues from the manuscript when it goes to the peer reviewers. (When peer reviewers’ identities only are not revealed, that’s called single-blind peer review.)
This was addressed by 6 randomized controlled trials (RCTs) (1, 5, 6, 8, 12, 13), 1 non-randomized controlled trial (2), and 1 cross-sectional study comparing citation rates of journals with different policies (10).
The non-randomized trial (2) had a number of quality issues, with a particularly large number of comparisons and no method used to test for multiplicity. The author used statistical significance testing, but without pre-specifying a level. The findings didn’t reach the conventional 5% level.
The rate of failure of blinding in the trials was high: average failure rates ranged from 46% to 73% (although in 1 journal within one of the trials it was only 10%). Even when the researchers analyzed actual blinding separately, though, the major weight of the evidence here still showed no statistically significant difference in review quality or editorial decision (5 out of 6 RCTs).
When it made a difference to peer reviewers’ reports (10, 12), the difference was small. That difference was 0.4 on a 5-point scale, where editors had pre-specified 0.5 as a difference large enough to affect editorial decision-making (12), and an average of less than 1 extra citation in the study that used citations as the measure of article quality (10).
So masking authors’ identities hasn’t been shown to make an important difference in what studies get published. We’ll come back to studies looking at the impact on bias later.
There were 4 RCTs testing the effects of revealing the peer reviewers’ identities (6, 14, 16, 17), and 1 study comparing a journal that uses single-blind peer review with one that reveals author and peer reviewer names, publishing the peer review report as part of the pre-publication history of articles accepted in the journal (9 – available in an abstract only). All of them are in biomedical journals. They cover over 1,200 peer review reports, about half of which are for The BMJ.
There was also 1 RCT testing telling peer reviewers their reviews would be posted on the internet as part of the pre-publication history of the article (15). That trial is for The BMJ too.
The trials are methodologically sound, with one exception that has fundamental flaws. Questions about the quality of that study (17) were raised in the systematic review by Jefferson and colleagues. The intervention and control groups had very different sizes (222 and 186): groups in a properly randomized trial should have a similar size. Intention-to-treat analysis wasn’t done and there’s not enough information about what happened to the drop-outs. So this study is more prone to bias than the others.
In the trials, editors did not judge the reviews as better on global scores on average. The researchers in the comparative study of open peer review judged the reports to be better.
Peer reviewers were more likely to substantiate the points they made (9, 14, 16, 17) when they knew they would be named. They were especially likely to provide extra substantiation if they were recommending an article be rejected, and they knew their report would be published if the article was accepted anyway (9, 15).
In some studies, when the reviewers knew they would be named, they were likely to be more courteous or regarded as helpful by the authors (9, 14, 17).
There’s no support here for the concern that naming peer reviewers leads to systematically less critical reviews – and some support for improvement.
There was one large effect: many peer reviewers declined the invitation to peer review when they knew there was a chance they would be named – especially when they knew their colors would be nailed to the public mast if the article was published (15).
This is the most complicated question. It’s also the most important one, since it seems anonymity makes no major difference to the quality of the published literature – at least not in the journals where that’s been studied.
Does revealing author and/or peer reviewer identity increase or reduce bias? There were several studies that were either set up to assess the impact of anonymity on bias, looked for signs of impact in subgroups, or compared single- and double-blind journals retrospectively. But none looked at any effect on bias of revealing peer reviewers’ identities.
There were 7 studies here, including 2 RCTs (1, 5), 2 non-randomized controlled trials (2, 11), and 1 comparison study with historical control (3). There was 1 study comparing acceptance rates between single-blinded and non-author-blinded journals (4), and 1 comparing reports in a single-blinded journal of peer reviewers who were sure of the authors’ identities anyway, those who suspected they knew, and those who truly had no idea (7).
Other than the RCTs, these studies did not include good methods to control for confounding. Some had serious methodological flaws. So conclusions based on the non-RCTs here are going to be shaky.
There was no difference between acceptance rates for manuscripts from the US versus other countries in 1 RCT that measured this outcome (1). The other RCT looked at whether or not seniority affected outcome in a trial of naming versus blinding authors (5). It found that manuscripts by more experienced scientists were rated more highly by everyone, and even more highly when authorship was blinded.
Out of the non-randomized studies, 1 found reviewers were less favorable to articles when they had no idea who they were (7). Gender of the peer reviewers made no difference. (No analysis was apparently done on gender of the authors.)
The other 4 non-randomized studies examined gender by looking at knowledge of authors’ names with one or more women among them. There’s no consistency of how blinding was done in journals in these studies, nor how gender was determined (such as any author, lead author, or corresponding author).
The methodological weakness of 1 of these was already discussed above (2). Blinding authors didn’t make a difference to the acceptance rate of manuscripts with women authors in that study. But males were more likely to recommend rejection of manuscripts with women authors, and vice versa. In another (11), women reviewers were more likely to reject men’s manuscripts, but not vice versa.
The last 2 both found an association between journal policy of blinding authors and the rate of publications with women authors (3, 4).
The first of these had a historical control, looking at rates before and after a journal changed its policy, but could only look at publications, not at acceptance rates (3). The researchers gathered data for comparable journals in that field for the same period, but didn’t do detailed analyses of the available variables or consider the risks of multiple testing.
Other authors re-analyzed the data from that study, and found no interaction effect between the change of policy and the increased rate of publications with women authors [PDF]. It appears the rate was increasing in those fields anyway.
The other looked at rates of acceptance of articles submitted to 12 economics journals between 1978 and 1979, although not the whole period for each of them (4). Half had single-blind reviewing, half double, but this was coincidental: they were the 12 responding to a request for data from 36 journals. Manuscripts with one or more female authors were more likely to be accepted at a journal with a policy of double-blind peer review in that study.
There was 1 other controlled study on the question of gender bias, which involved students (including graduate students) rather than peer reviewers from a journal. So I didn’t include it on my list, or review its quality thoroughly. (It found no evidence suggesting gender bias.)
There’s not much help in this group of studies. We would need much stronger evidence to draw conclusions about the effect of author blinding on bias. Based on these studies, a policy of blinding authors could have a benefit, a detrimental effect, or no effect on gender bias.
So with that knowledge from studies, how does revealing identities stack up against anonymity and attempting to blind authorship?
I think institutionalizing anonymity in publication peer review is probably going out on a limb. It’s only partially successful at hiding authors’ identities, and mostly only when people in their field don’t know what authors have been working on. If blinding authors was a powerful mechanism, that would be evident by now.
Author and peer reviewer anonymity haven’t been shown to have an overall benefit, and they may cause harm. Part of the potential for harm is if journals act as though it’s a sufficiently effective mechanism to prevent bias.
That said, we don’t have an overwhelming evidence basis for anything. In my opinion, the weight tilts in one direction at the moment, with a trade-off of potential harms we don’t know enough about. This ongoing knowledge gap is letting science and scientists down.
Although there are dozens of journals in these studies, the strongest evidence comes from a relatively small number. Collegiality, competitiveness, and discrimination are going to vary from one journal’s community to another. While there is reassurance about systemic gender bias in peer review since the 1990s (for example here, here, and here), that certainly doesn’t mean it never happens.
At some prominent science journals, odds are stacked against women (for example, Nature and Science). And as Fiona Ingleby recently showed from her experience at PLOS One, there are individuals out there who are still comfortable with extreme levels of sexism. Journals need to monitor editor dynamics deeply to be sure there isn’t a problem. (See for example JAMA’s self-analysis).
There are clear signs of other biases that have been shown at some journals. Stand-outs are editors’ personal connections, institutional prestige, as well as a US and English language advantage internationally. Science has a status bias problem.
The last thing we need, though, is for people to avoid submitting articles because of fear of bias or aggressive peer review. Under-representation in article submission could be a bigger problem than bias from peer reviewers.
A lot of scientists are concerned that there’s widespread fear of retaliation for writing critical peer review reports about the work of people with more status than them. There’s no doubt that happens. But then, even who you cite or don’t cite in an article or anywhere else can come back and bite you.
Building a reputation and increasing your network of collaborators is critical to success in science, too – and that can come from being visible and critical. What is science about, if not applying your intellect critically? But early career researchers have to make their decisions about anonymity without the benefit of good scientific evidence on the balance of potential benefits and harms and wider impact on their careers.
Some journals offer authors the option to opt-in to double-blind reviewing. That doesn’t seem to have been evaluated in a published study. Although double-blind peer review has a lot of advocates, that doesn’t mean a lot of people want to do it. Nature reports that only 20% of authors take the option. It might protect them from bias, but this is unknown territory. Hopefully, choosing it doesn’t itself become a cause of bias.
On the other hand, the anonymity of peer review reports definitely enables negative, and even egregious, behavior – without accountability. Revealing who’s peer reviewing can also reveal conflicts of interests of which editors are either unaware or perhaps even allow without informing the authors. Accountability increases the care and effort many people would put into a peer review.
One of the arguments in favor of allowing anonymity, is that people aware of scientific misconduct in a manuscript won’t put that in writing unless they can be anonymous. I’m not sure what the chances are of just that person being chosen to be the peer reviewer, even though it’s important when it does happen. Blinded peer review has never been a good mechanism for preventing misconducted research getting into the literature, though, so it’s hard to weigh this up against common issues.
Misconduct occurs in peer review too. Opening the black box might give others a chance to detect issues like a large peer review “ring” that continued undetected for too long, and the kinds of peer review abuse investigated by COPE (the Committee on Publication Ethics). An open system allows researchers to investigate a range of other issues, such as this study by Sally Hopewell and colleagues of the impact on the quality of reports of clinical trials.
Substantiating our statements, and being accountable for what we say and how we say it when we are gatekeepers for publication, is decisive for me. That’s all the more important for people whose work or critique loses out because of status bias, and those who may be repelled from publishing and science by reviewer aggressiveness.
There are scientific communities where open collaboration, articulating critique well and accepting it too, are not only the cultural norm, they’re what propel successful careers. That needs to spread widely. I think it will happen one scholarly community and journal at a time. However the rest of science catches up, I doubt it can happen behind closed curtains.
Update: By October 2017, and digging more into research on bias in journals, I shifted to a stronger position. I posted about that here: The Fractured Logic of Blinded Peer Review in Journals. And later, Signing Critical Peer Review & the Fear of Retaliation: What Should We Do?
Declarations: A part of my day job is responsibility for PubMed Commons, a forum for open, signed, post-publication commenting. I am currently an academic editor for PLOS Medicine and on the human research ethics advisory group for PLOS One. I’ve had editorial roles with other journals in the past, including part-time professional lead editing.
As I mention The BMJ’s research and policy favorably in this post, I note that I have close ties with this journal. I was a member of their ethics committee for several years (including regular supported travel) and participated in advising on some special issues of the journal. I recently traveled to speak at Evidence Live with their support, and I’ve published multiple articles with them (see this search, and a further two invited commentaries: this and at the foot of this). I contributed a chapter to their book, Peer Review in Health Sciences (2nd edition, 2003, edited by Tom Jefferson and Fiona Godlee).
Alphabetical list of comparative studies comparing on revealing and not revealing author/peer reviewer identities:
Note: Quite a few of these have academic spin, especially in the abstracts.
The study is done in biomedical publication unless identified “Non-biomed”.
- Alam (2011): RCT (randomized controlled trial).
- Blank (1991): Non-randomized controlled trial. Non-biomed.
- Budden (2008): Comparative study. Non-biomed. [PDF]
- Ferber (1980): Comparative study. Non-biomed.
- Fisher (1994): RCT.
- Godlee (1998): RCT.
- Isenberg (2009): Comparative study.
- Justice (1998): RCT.
- Kowalczuk (2013): Comparative study. (Abstract only.)
- Laband (1994): Comparative (cross-sectional) study.
- Lloyd (1990): Controlled trial, method of allocation to the groups unclear. Non-biomed.
- McNutt (1990): RCT.
- Van Rooyen (1998): RCT.
- Van Rooyen (1999): RCT.
- Van Rooyen (2010): RCT.
- Vinther (2012): RCT.
- Walsh (2000): RCT.
(The controlled trial I “excluded” because the reviewers weren’t associated with a journal was Borsuk (2009).)
* The thoughts Hilda Bastian expresses here at Absolutely Maybe are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services.