You would think that something as critical to science as peer review in journals would itself have a strong grounding in science, wouldn’t you? But it doesn’t. The quantity of research is meagre, relative to the importance of the topic – especially the kinds of studies that could be a strong evidence base for our actions. People sure have a lot of un-evidence-based strong opinions on the process of getting opinions about scientific manuscripts, though!
Peer review didn’t start to become common at journals until the 1940s [PDF]. The influential medical journal editor, Franz J. Ingelfinger, pointed out in 1974 that the American Journal of Medicine didn’t use peer review in the ’40s and ’50s – and neither did The Lancet from the ’60s to the time he was writing:
If [they] can do without the reviewing system, and do very well at that, why do so many editors of medical journals, particularly in America, faithfully and meticulously depend on a system of peer review that is, to be sure, far older than the other two peer review systems now in the news, but is also laborious, disorganized and time-consuming? The answer is moot, for the system as it applies to biomedicine has never been subjected to the kind of evaluation that those subject to its discipline use in studying the phenomena of health and disease. That data on the performance of the reviewing system are lacking is all the more astounding in view of the momentous influence the system exerts on the lives of those who write biomedical articles.
There’s been a bit more evaluation since then, but nowhere near enough. Meanwhile, it’s gotten far more time-consuming. Publons recently estimated that 68.5 million hours are spent peer reviewing for journals a year – which is roughly the equivalent of over 34,000 people working full-time, year in, year out. And the effects of journals’ decisions on scientific workers’ careers is even more momentous, too.
So let’s take a tour of milestones in journal peer review research, to get a bit of an idea of what we know and don’t know. I’ve tried to pick out particularly influential and/or groundbreaking research, but there’s no strong method here. I’d be delighted to hear about others’ picks, either in the comments here, or on Twitter.
(Note: bits in [square brackets] in quotes are my words.]
Is there bias against researchers from “minor” universities?
“An analysis of manuscripts received by the American Sociological Review…” (1945)
Too few journals have allowed or released studies of what goes on behind the curtain. The first I’ve found was when Dorris West Goodrich was a new academic editor at the American Sociological Review. She reported on what happened to 182 manuscripts submitted in the 16 months up to September 1945 when the team had the reigns. She had no assessment of quality – and in fact points to various chance elements that affect decisions to accept, like the quality or quantity that arrive at around the same time.
The acceptance rate was 57% – with a third of those coming from invited papers for annual meetings or special issues. Although members and non-members of the American Sociological Society were equally likely to submit manuscripts, only 19% of members’ submissions were rejected versus 59% of non-members.
What about “major” versus “minor” universities (for sociology)? The rate of acceptance for authors from major departments was disproportionately high: 83%. People from “non-major” academic institutions, and non-academic institutions, experienced the average (57% and 53% respectively). People from no institution only had a 27% acceptance rate.
Couldn’t find an influential study this decade. The “peer-reviewed journal” was on the rise, but it wasn’t being led, or followed by, robust science.
Does the institutional home of a journal preferentially advance people from that institution?
“Institutional affiliation of the contributors to three professional journals” (1961)
Pan Yotopolous responded to a publication on contributors to the American Economic Review by taking it to a new level and comparing it to another 2: the Journal of Political Economy (JPE), based at the University of Chicago, and the Quarterly Journal of Economics (QJE), based at Harvard:
As it might be expected, the University of Chicago dominated the contributions to the JPE with 15.6 per cent of the total pages while Harvard University dominated the contributions to the QJE with 14.5 per cent. This heavy concentration of authors in one institution for each journal leaves its imprint on the [big] picture…
The overall domination of a small number of institutions, he said, could be a sign of bias at journals. Or it could be something else, like the advantage of having more time and support for research for academics at those universities.
Does blinding authors’ names and institutions affect publication patterns?
“The gatekeepers of science…” (1967)
Diana Crane reported a survey of blinding practices. And she looked at the question with two non-experimental studies, too. The first was a before and after study of a journal that introduced blinding of authors’ identities in 1955 (American Sociological Review). She examined the articles of 1,322 authors in the 20 years around 1955. The second compared this with 294 authors to a journal in another social science discipline which didn’t have blinding (American Economic Review). Crane concluded:
These findings suggest that anonymity does not produce the expected results… [D]iversity in the academic backgrounds of editors rather than anonymous evaluation of manuscripts is the more important factor influencing the selection of manuscripts…
It appears that the academic characteristics of authors of articles selected for publication by scientific journals are similar to the characteristics of the editors of the journals and that anonymity does not affect this relationship…
The analysis presented here suggests that disciplines vary in the extent to which articles by authors from diverse institutional backgrounds are selected for publication in their principal journals.
The diversity factors she’s talking about are age, educational level, and institutional affiliation.
Crane also reported her analysis of blinding practices at 50 journals in 7 disciplines: 9 of them blinded peer reviewers to authors’ names and institutions – and 8 of those were sociology journals.
Historical and sociological overview – plus data on rejection rates and analysis of author prestige
Groundbreaking sociologists of science, Harriet Zuckerman and Robert Merton, tackled the subject from various angles. They surveyed 117 journals in sciences and humanities in 1967 and got rejection rates from 97 of them. They ranged from a 90% mean for 3 history journals, down to a less than a quarter of articles for physics (12 journals), geology (2 journals), and linguistics (1 journal):
…the more humanistically oriented the journal, the higher the rate of rejecting manuscripts for publication; the more experimentally and observationally oriented, with an emphasis on rigour of observation and analysis, the lower the rate of rejection…Beyond this are objective differences in the relative amount of space available for publication…Journals in the sciences can apparently publish a higher proportion of manuscripts submitted to them because the available space is greater than that found in the humanities [and the articles are shorter].
They also analyzed the editorial records of the 14,512 manuscripts submitted to The Physical Review from 1948 to 1956. It was a leading journal that published 6% of the journal literature in physics globally, and it was by far the most important to academic American physicists at the time.
To study the impact of author prestige, they looked only at a sample of single author papers, with both the authors and the 354 people who reviewed their papers stratified for signs of prestige. For example, in the first rank went Nobel prize winners, members of the Royal Society and National Academy of Sciences etc.
There were signs of what might be prestige bias, although we would need analysis of respective quality of articles to be sure. For example, physicists at leading versus other university departments submitted articles at about the same rate – but the acceptance rate was 91% versus 72%.
First experimental study?
“Publication prejudices: an experimental study of confirmatory bias in the peer review system” (1977)
Michael Mahoney approached this question with this assumption:
…given that the researched question is relevant and the experimental methodology adequate, the obtained results – whatever they might be – should be of interest to the scientific community. Assuming that they are clearly and comprehensively described, the data should not be viewed prejudicially on the basis of whether they conform to current theoretical predictions.
He selected 75 peer reviewers listed for 1974 in a journal with a “very energetic” advocacy perspective on behaviorist psychology. They were randomly assigned to 5 versions of an article, where the introduction and methods were the same, but the results and discussion differed or were omitted: 67 returned peer review reports. The result? “Wide variability”, with “apparent prejudice against ‘negative’ or disconfirming results”:
Looking only at the comments, one would hardly think that very similar or even identical manuscripts were being evaluated…
Confirmatory bias is the tendency to emphasize and believe experiences which support one’s views and to ignore or discredit those which do not…[R]eviewers were strongly biased against manuscripts which reported results contrary to their theoretical perspective.
Country bias in peer reviewers?
“A study of the evaluation of research papers by primary journals in the UK” (1978)
I didn’t manage to get hold of a copy of this monograph by Michael Gordon, so I’m relying on data and descriptions from here. Gordon studied the archives of physical science journals, as well as interviewing the editors of 32 journals.
In an analysis of 2,572 peer reviews of 1,980 articles in 2 journals, authors from the UK were more likely to get a hard time from American peer reviewers, and vice versa. Peer reviewers from “major” universities showed bias against authors from “minor” universities, while authors from “minor” universities did not.
Are women authors disadvantaged in the editorial process?
“Are women economists at a disadvantage in publishing journal articles?” (1980 – PDF)
Marianne Ferber and Michelle Teiman studied the question in a few ways, comparing publications at several economic journals (and a statistics journal). One of the comparisons they made was between outcomes at journals with and without double-blind peer review. Only 12 of the 36 journals invited provided the data about manuscript submission and gender though. For the various comparisons they made, articles by women, or with women authors, had higher rates of acceptance than articles without them at journals with double-blind peer reviewing, and no strong difference at the others.
The capriciousness of acceptance – and impact of prestige?
“The fate of published articles, submitted again” (1982)
Douglas Peters and Stephen Ceci published results of a small study with lots of flaws, that had an outsized impact. The journal in which it was published guaranteed it would set the cat among the pigeons by accompanying it, not with an editorial commentary, but with 59 invited expert commentaries!
Peters and Ceci had selected one article by authors from prestigious institutions, from each of 12 leading American psychology journals. They had been published 18 to 32 months previously. The articles were given fictitious authors and institutions and re-submitted to the same journal that published them in the first place.
Only 3 were recognized as re-submissions. Of the others, all but 1 was rejected – “primarily for reasons of methodology and statistical treatment”.
Documenting the full flow of manuscripts from submission to publication at one journal or another
“A difficult balance: editorial peer review in medicine” (1985 – PDF)
This is a look at the state of knowledge and editors’ opinions by Stephen Lock, then editor of the BMJ, from a Rock Carling Lecture. The monograph includes a prospective study of every manuscript submitted to the BMJ between 1 January and 15 August 1979. A total of 1,551 manuscripts were analyzed, with an acceptance rate of 21%. Editors accepted 74% of the papers that were recommended by external peer reviewers, and 35% of those reviewers had rejected. Just over half of the 328 accepted papers had scientific revisions when they were published.
Of the 1,143 that were rejected, 825 were rejected without going to peer review. Most (68%) were later published in another journal – 130 of them in a journal of equal or higher status – and 25% were never published. The fate of the other 7% couldn’t be determined. Only 20% of the manuscripts were changed before appearing in another journal.
Prominent calls for more research on peer review
“Journal peer review: the need for a research agenda” (1985)
John Bailar and Kay Patterson reviewed the evidence on journal peer review, publishing their call for a research agenda in the influential New England Journal of Medicine:
It seems to us that there is a paradox here: the arbiters of rigor, quality, and innovation in scientific reports submitted for publication do not apply to their own work the standards they use in judging the work of others… Most studies of journal peer review have been methodologically weak, and most have focused on process rather than outcome.
They only found 12 studies in the previous 10 years that met their criteria: relevant to biomedicine, designed to test an hypothesis or study a specific issue, and based on a clearly defined sample – but none used a random sample of journals. Bailar and Patterson map out issues needing research, from effectiveness, to different methods and costs, stressing the importance of cross-journal studies and rigorous study design.
Drummond Rennie, from another prominent medical journal, JAMA, cited the Bailar and Patterson paper as an impetus for starting the Peer Review Congress (PRC): the first was held in May 1989. The conference was widely promoted for 3 years prior, with the aim of stimulating a research community (with the lure of publication in JAMA, too). Dozens of abstracts of research on peer review and bias in papers were submitted, with most accepted. It’s been held every 4 years since. I’ve attended a few, and blogged about them (follow the link to all my peer review posts below if you’re interested).
It seems to me the first PRC was a watershed moment for research on journal peer review. Rennie reflects on his experience and the congresses here, concluding that despite the progress with research, the scientific community is in many ways still kind of in the dark ages when it comes to peer review:
A long time ago, scientists moved from alchemy to chemistry, from astrology to astronomy. But our reverence for peer review still often borders on mysticism.
The first randomized trial of journal peer review?
“Calling Medical Care reviewers first: a randomized trial” (1989)
Duncan Neuhauser and Connie Koran tested the theory that you could improve peer reviewer response by phoning them, randomizing 95 manuscripts to get a phone call to let them know a manuscript was on the way or not. It didn’t work. The call added to the costs of peer review, and seemed to lengthen peer review turnaround instead of shortening it.
More trials were on the way in the next decade . . . > Part 2: Trials at Last and Even More Questions
All my posts on peer review are here. Some key posts:
Disclosures: In the 1990s, I was the founding lead (“Coordinating”) editor for the Cochrane Collaboration’s reviews on consumers and communication. I have served on the ethics committee for the BMJ, participated in organizing special issues of the BMJ, attended conferences funded by the BMJ, and I am a contributor to BMJ Blogs. I contributed a chapter to a book on peer review. I was an associate editor for Clinical Trials, academic editor of PLOS Medicine, and am a member of the PLOS human ethics advisory committee. These days, I only peer review for open access journals.