There’s a lot of hype in scientific publications – and about them, too. Hype doesn’t help if we want to make informed decisions about where we submit our work, and where we invest our peer reviewing and community effort.
Preprints are one of many important options these days: manuscripts of articles uploaded to a preprint server without in-depth editorial peer review or editing before release. We had a blaze of advocacy for preprints to scientists in biological sciences in February and March: Nature, Wired, New York Times, Economist, and Chronicle of Higher Education all published generally enthusiastic articles. Participants and organizers have also written about it, including Needhi Bhalla, James Fraser, Mike Eisen, and Gary McDowell. And there’s an active Twitter community around the hashtag, #ASAPBio.
ASAPBio was a meeting for invited participants held at the Howard Hughes Medical Institute (HHMI) in February. This round of activism to ratchet up interest in preprints started with a publication by one of the organizers, Ron Vale [preprint version], and the initiative, Rescuing Biomedical Research. Preprints were also part of lively discussion about the future of scholarly scientific publication at the Royal Society last year.
The ASAPBio strategies are to push for journals, funders, and academic institutions to give preprints formal recognition and status – and get scientists to be preprint trendsetters, especially famous successful ones. There will be a second, smaller, closed meeting in June.
The preprint case relies heavily on the precedent set by arXiv, the enormously successful open access preprint server that enabled the crossover of the physics community’s manuscript-sharing practices to the internet, starting in 1991. A version for the biological sciences, bioRxiv, launched at the end of 2013. Both are critical free science infrastructure for users, operated by academic institutions (Cornell University and Cold Spring Harbor Laboratory). There are other venues for preprints, but these are the biggest relevant ones.
Do preprints “accelerate scientific progress by speeding up access to new findings”?
Improving the efficiency of a complex ecosystem like science is an incredibly daunting challenge. I didn’t see this issue studied directly in the literature on preprint servers. I think the conclusion by Vincent Larivière and colleagues in 2014 [preprint] still holds: “All in all, this literature shows that e-prints are having an effect on how scientists communicate the findings of their research. However, the precise nature of the effect(s) remains fuzzy”.
There are 3 major links in the chain of presumed effect on accelerated progress:
- Do preprints have an appreciable net impact on the speed of findings getting used (not just being available), across science disciplines and across scientists’ working lives?
- If so, does that accelerate scientific progress in those fields?
- If so, will it do so in biomedicine?
In terms of the first of these, speed of use, Larivière found that papers that are released as preprints and published in a journal, are cited more quickly. But the journal version is the version of record: it’s not the preprint that is cited. Preprints aren’t cited much in journals in any discipline: the range Larivière found went from 0.2% of journal citations to 6.6% – and citation rates for preprints are declining. While the arXiv links to journal versions, the journals aren’t a road back to the arXiv. That doesn’t mean, of course, that the preprint wasn’t used.
On average across disciplines, more than 60% of all arXiv preprints appear in a journal, and that’s been increasing. The time to appearance in a journal has been reducing. They’re not sure why. Now, most arXiv preprints appear in a journal in the same year they’re released on the server. The time is shorter for physics, longer for mathematics.
For the 3 disciplines that are the most heavy arXiv-users, a third of published journal articles have still not been preprints.
So what about quality of preprints, a key requirement for genuine progress? Martin Klein and colleagues recently uploaded a preprint that compares preprints with their journal alter egos. That study has serious limitations. Of the more than 1 million arXiv preprints they downloaded, they were only able to match 1.2%. They aren’t a representative subset: 96% came from a single publishing house, and multiple versions were far more common than average. The authors assessed text similarity quantitatively, concluding differences were not major. Before and after content quality was not analyzed.
What about biology and medicine, then? I don’t think there is enough evidence to be sure what would happen if biology or medicine adopted preprints on a large scale.
To put this in perspective: Larivière found that less than 4% of the articles in Web of Science were ever preprints. And 3 disciplines account for over 90% of the arXiv. So low-scale adoption remains the most common experience in a science discipline or specialty.
Although it’s early days, a similar pattern might be emerging on bioRxiv: by my count, 5 disciplines account for 66% of bioRxiv’s content on 30 April. (Data are below this post.) In its biggest month so far, March 2016, the number of preprints added to bioRxiv was 0.4% of the number of publications added to PubMed.
Biology and medicine are vastly larger publishing communities than any of those with high preprint uptake so far. Their communication culture is different, and they are hyper-competitive. Fast-tracking publications of clinical importance is common. And uptake on a high enough scale to accelerate progress would mean a higher order of challenge on the system side.
For preclinical and clinical research, the impact of the higher scale of media interest in results changes the stakes, too. The impact of error could be far greater. I couldn’t find studies of the impact of error in preprints.
There’s a high level of correction or revision of preprints – about 30% on bioRxiv. I couldn’t find an evaluation of these changes for arXiv or bioRxiv, how much impact mistakes have, or the impact of preprints that were later withdrawn. Articles can’t be retracted by the authors, although they can add a “withdrawn” notice as their most recent version. (When you search for withdrawn articles on arXiv, it shows the first 1,000, with no total.)
Do preprint servers “give the authors feedback from a larger group of people than the anonymous peer reviewers”?
This definitely happens for many papers, although what proportion of preprints could expect a reaction is unclear. About 10% of bioRxiv preprints get a “public comment” of some kind (including tweets).
That said, preprint servers are not unique in attracting attention. There are many ways of doing science in the open. Some of them may have more of an impact on science and science publishing than preprints – like publishing protocols before starting, fully open peer reviewing, or enabling peer reviews to travel between journals.
Do preprints advance open access?
Having a version available if you know to go look for it definitely lands on the “pros” column. Many wouldn’t know they could go looking for it, though, if journals don’t link back prominently at the abstract level. It doesn’t overcome the drawbacks of publishing in a closed access journal.
[Update 5 December 2016: Daniel Himmelstein’s analysis of bioRxiv preprints found that only 18% were fully open, with the proportion declining over time.]
Could preprints enable sharing work and “serve as interim evidence for productivity”, helping scientists get jobs, grants, and other benefits?
Other ways of showing productivity than the validation of a journal article could recalibrate the odds stacked against many early career researchers in particular, in hyper-competitive fields. That’s the hope here. Preprints will do that for some people. But it might be risky to count on that.
Self-publishing is likely to be less of a risk for those who are already advantaged by their circumstances. When it comes to benefits like publications and grants, institutional prestige makes a difference, as well as being in the US and having English as a first language.
Part of the territory, then, is a preprint that’s more like to be highly polished – perhaps even by in-house professional writing support. Encouraging people to self-publish without enough editorial or critical support could entrench disadvantage that is not related to scientific merit. Speed isn’t going to advantage everybody equally.
Other strategies can address this issue as well. For example, publication articles of protocols, methods report articles, and registered reports for major projects are other important strategies that demonstrate good scientific practice, productivity, and article production skills.
Can preprints help scientists “establish priority of their work”?
Ron Vale and Anthony Hyman wrote a commentary for the February ASAPBio meeting about this. “It’s complicated”, is their basic message. Exactly what contribution different scientists make often isn’t easy to establish, and pegging a public stake on some ground isn’t all there is to it.
The pros and cons on this are arguably different for physics and biomedicine. It might be easier to copy or fold in someone else’s insights into an experiment or paper and beat them to press, so the argument goes. Perhaps this is in part a concern about losing out on a citation in a higher impact journal if your work is no longer seen as exciting. If it’s a common concern, then it’s a serious hurdle for preprint acceptance.
It’s a chicken and egg situation in many ways, isn’t it? There are underlying issues around science’s culture and values, that people hope to influence by changing incentives and processes around publishing. But swimming against the tide of culture and psychological drivers makes that harder. Using preprint servers might, in the end, be more a sign of change in a science community, than a cause of it. There may be no shortcut.
Do we need to edit science’s DNA? An Absolutely Maybe post on priority, credit, and values in science.
Disclosure: My day job includes work on PubMed Commons, the post-publication commenting system in PubMed.
I gathered these data from bioRxiv manually, last checked around noon (US EDT) on 30 April 2016. (Click on the image to see full size.)
[Update] On 1 May, a tweet from Nicola Low reminded me that I had meant to list publishing a protocol as one of the potentially more powerful ways to improve science. It also raised another point: that a protocol publication, study methods article, or registered report shows productivity. She also reminded me about the importance of fast-tracked articles for research of clinical importance. Both added to the post – thanks, Nicola!
And a tweet from Stephen Curry reminded me that preprints were also much discussed last year at a Royal Society meeting – added to post. Thanks, Stephen!
* The thoughts Hilda Bastian expresses here at Absolutely Maybe are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services.