The stick, the carrots and the bear trap

Bookmark and Share

UK HEFCE Public Access policy is a shift in the landscape

After a one-year consultation, the four UK higher education funding bodies (HEFCE) have issued the first national policy to explicitly link public access with research evaluation. Any research article or conference proceeding accepted after 1st April 2016 that does not comply with their public access policy will be inadmissible to the UK Research Excellence Framework (REF), the system in the UK by which all government funded higher education institutions are assessed. The policy is a significant landmark in the global transition to Open Access and, in parallel with the UK Research Funders (RCUK) OA policy, will make research from the UK the most publicly available in the world.

The UK’s Higher Education Funding Councils are direct funders of UK research institutions (HEIs). They distribute large sums of money to UK universities based on an assessment exercise that occurs roughly every seven years. The funding council money can be used as institutions see fit to support research and can make the difference between being in the black or the red. The funding is distributed largely on the basis of an assessment of four outputs submitted by each assessed researcher. It is these outputs that will need to be free to read, linking public access to research assessment at a national scale for the first time.

At the same time the Funding Councils are indirect funders from the perspective of researchers. Because they do not have a direct relationship with researchers they are more limited in their policy options. Within the limitations this imposes, this is therefore a good policy. Yes, there are loopholes and some of the provisions for text mining are disappointing – the minimum acceptable licence restricts any derivative reuse for example – but overall the policy provides a strong signal about the direction of travel. Moreover, it contains not just a hefty stick, but also some well-aimed carrots and a bear trap if things go off track.

Crucially, HEFCE’s aim is full compliance. There are no soft percentage targets here. They also note that 96% of submitted outputs to the 2014 REF could have complied with this policy if authors had deposited whenever they were able (Appendix B paragraph 54).

Here’s the basics:

  • The policy applies to all research articles and conference proceedings (with an International Standard Serial Number) that list a UK HEI in the address field, regardless of subject area. It doesn’t apply to data, monographs, book chapters or other reports that might have security or commercial implications.
  • The final peer-reviewed accepted article must be deposited on acceptance (with a 3 months grace period) in an institutional repository, a repository service shared between multiple institutions, or in a subject repository such as ArXiv or Pubmed Central.
  • Deposited material should be discoverable, and free to read and download, in perpetuity for anyone with an internet connection.
  • They don’t recommend any specific licence but note that outputs licensed under a Creative Commons Attribution Non-Commercial Non-Derivative (CC BY-NC-ND) licence would be compliant.
  • Embargoes on access to the deposited article are permitted – up to 12 months for STEM and 24 months for AHSS – but articles must still be discoverable from the repository during this period even if the full text is not accessible.
  • Note that deposition and access are treated separately in the policy: immediate deposition is mandatory but immediate access is not if there is an embargo (which they term ‘closed deposit’).
  • All articles published in a PLOS journal can be compliant through deposition of the final manuscript or the final published version in any appropriate repository. All PLOS articles are also deposited in a subject repository, Pubmed Central, as part of the publication process.

The problems and the loopholes

The problems with the policy are the embargo times permitted, the weakness of the licensing requirements, a lack of requirements for the ability to mine repositories and the set of exemptions. These exemptions mean that research outputs that are not available could still be submitted to the REF if 1) the output depends on third party material that is not legally accessible or 2) the embargo required by the publisher is longer than HEFCE has stipulated or 3), worse still, the publisher just doesn’t permit deposition in a repository.

These exemptions raise concerns that publishers will simply impose more stringent licensing conditions, restrict deposition and/or start seeing the embargo periods specified by HEFCE as targets, rather than limits. But this is also where the judicious use of sticks and carrots (and bear traps) makes the policy more effective than you might first assume.

Here’s the stick

The stick is aimed at institutions – any research output that does not meet HEFCE’s requirements or exemptions will be treated as non-compliant. “Non-compliant outputs will be given an unclassified score and will not be assessed in the REF”. That’s a major incentive to be compliant and it’s powerful because it is aimed at institutions, rather than at researchers. Each exemption will have to be individually justified and this will require time and effort. Institutions will have a strong incentive to reduce as far as possible the number of exemptions that they want to claim.

Here’s the carrots

The policy has two important carrots which create incentives to increase the type of research outputs deposited and to encourage more liberal licencing for reuse. Specifically, they state that any higher education institution that can demonstrate it has taken steps towards enabling open access for outputs above and beyond the basic requirements, or that can demonstrate that outputs are presented in a form that allows re-use of the work, including via text-mining, will be given ‘credit’ in the Research Environment component of the post-2014 REF.

It’s not clear exactly what that credit will be although they note in Appendix B (paragraph 34) that further details of this will be developed in the coming years as part of their planning work for the next REF. Nevertheless, it seems likely that it will provide a way for a department to increase its research rating. That’s a powerful incentive to be more open.

And here’s the bear trap

A significant implication of the policy is only made clear in Appendix B (paragraph 67). The policy aims to enable researchers to have the academic freedom to publish where they choose. And this is why HEFCE allowed exemptions to the policy. However, as noted above, they expect exemptions to be applied in only a small proportion of cases. And what happens next depends on how different ‘actors in the system’ behave. So watch out for the bear trap: “If it becomes apparent that our policy is having a neutral, or detrimental, effect on the UK’s journey towards open access, we will revisit this policy to reconsider these provisions.”

The take home message

This policy is a game changer. It will result in a substantial increase in the proportion of UK research that is free to read. The UK will take a strong lead compared to other countries in making research accessible. When we talk about Open Access at PLOS we mean access to read with generous re-use rights. In this light there are some disappointing aspects but these are primarily a result of the Funding Councils being indirect funders – their funding is not tied to specific projects that generate specific outputs.

It is precisely due to the indirect nature of this funding, and its importance in the UK system, that the reach of the policy is so great. Every paper published with a UK university affiliation will be subject to this policy. The disappointments we may have with the details therefore needs to be tempered with an appreciation of how substantial its effect will be. This policy represents an important step forward in the transition to full Open Access.

Category: Open Access, Open Access policy | Tagged , , , , , , , | Leave a comment

Opens Roundup (March 25)

Bookmark and Share


In this issue, the US FIRST act meets opposition on all fronts, David Wiley (champion of open education) on the 5Rs of openness, Wellcome releases its report on how to promote an effective market for APCs, PeerJ one year on, ten tips for navigating publishing ethics, the power of post-publication review, altmetrics for developing countries, the pros and cons of publishing in arXiv, a fantastic crowdsourcing project in the humanities, John Ioannidis on the look-out for a journal near you…., and more

With thanks to Allison Hawxhurst and Katie Fleeman for tips and links.


US FlagUS: Why FIRST is a Trojan Horse

March 11: If you haven’t read PLOS’s reaction to the introduction of the FIRST Bill then here is another opportunity to read Cameron’s post (main link) as well as that of EFF published the same day and SPARC a couple of days before that. But it wasn’t just open access advocates objecting to the public access language in Section 303 (which could see embargo lengths increase to three years after publication). Science insider also lists the growing opposition coming from scholarly societies and various research and university groups noting that the overwhelming feeling is that the bill represents a missed opportunity for the US to maintain its scientific lead, in particular because of constraints on the amount of funding for NSF and the restrictions the bill imposes on the type of research it would fund. The bill, introduced by republicans, has been particularly divisive and at odds with the original  America Competes Act introduced by democrats. The ensuing markup of the FIRST Bill on the 12th was lively but an impassioned plea by Rep. Zoe Lofgren to strike the public access language in Section 303 was narrowly defeated by 9-8 along party lines. The bill still has a long way to go.


Legal Dispute between a Professional Association and Press

March 16: On H Net, a listserv for the humanities, Phil Brown from H-Asia calls attention to an article in the Chronicle of Higher Education (available to subscribers only) discussing a dispute between the Social Science History Association and Duke University Press over control of the Association’s journal, Social Science History. Here’s the nub of what he says in full:

(After notifying Duke of its intent to solicit bids for a new publication contract):

In June 2012, not having gotten a bid from Duke, the association sent the press a letter saying it would be moving on and ending the contract; Duke disputed its right to do that. According to the association, Duke interprets that phrase “participation in the journal” to mean that the association has only two choices:  “continue with Duke as the sole publisher in perpetuity or give up its ownership of the journal altogether and allow Duke to continue to publish the journal,” even though the university “has no ownership or intellectual-property rights” in it. “

The full details of the suit are available.

Clarifying the 5th R

Image by Sean MacEntee (CC-BY 2.0)

Image by Sean MacEntee (CC-BY 2.0)

March 15: David Wiley, a proponent for open educational resources (OER), introduced the concept of the 4Rs as a way to think about openness in 2007. He described the four main activities enabled by open content as Reuse, Rework, Remix and Redistribute, which is what the creative commons attribution license permits. The 4Rs have been influential as a framework for educational resources  and what being open means more generally . On March the 4th he added a 5th R – Retain – to the mix. This is the right to make, own and control a copy of the work. This addition was in response to the fact that while many OER publishers permit access to their content they “do not make it easy to grab a copy of their OER that you can own and control forever”. “How many OER publishers enable access but go out of their way to frustrate copying? How many more OER publishers seem to have never given a second thought to users wanting to own copies, seeing no need to offer anything beyond access?”

Some (e.g. David Draper and Mike Caulfield) have subsequently questioned the need for a 5th R – surely being able to own and control a copy of the work is implicit in any licence that permits the 4Rs in the first place? Wiley agrees with this but believes that ownership per se has never been explicitly addressed in the discussion of open content (main link) and this has direct consequences: “If the right to Retain is a fundamental right, we should be building systems that specifically enable it. When you specifically add ‘Enable users to easily download copies of the OER in our system’ to your feature list, you make different kinds of design choices” – ones that most designers of OER systems have failed to think about.

The implications of the 5th R are fundamentally important as they change the context of what we’re doing – it’s no longer enough to state that others can reuse, remix or redistribute your work. It suggests that those involved in providing open content also need to help build or facilitate the infrastructure that enables the content to be easily reused by others. In other words, we have a responsibility to reduce the friction around the content we make open (e.g.  Cameron’s article in PLOS Biology about this). And this applies not just to libraries and institutions with an educational remit but to funders and publishers as well.

Metaphysicians: Sloppy researchers beware. A new institute has you in its sights

March 15: Article in the economist about a new institution at Stanford called the Meta-Research Innovation Center (Metrics) to be run by John Ioannidis (author of the famous ‘Why most published findings are false’ paper in PLOS Medicine, currently viewed almost 980,000 times). The new laboratory aims to find out whether attempts to reproduce studies actually work and to guide policy on how to improve the validation of research more generally. Among the initiatives is also “a ‘journal watch’ to monitor scientific publishers’ work and to shame laggards into better behaviour. And they will spread the message to policymakers, governments and other interested parties, “in an effort to stop them making decisions on the basis of flaky studies.”

PeerJ’s $99 open access model one year on

PeerJMarch 13: A brief review in Times higher of how PeerJ is doing. “PeerJ co-founder Jason Hoyt said that the journal had fulfilled its year-one aims of “staying alive”, starting to grow and “laying the groundwork to show this business model could be sustainable”. It is on track to be self-sustaining by early 2015, and there are no plans to raise prices. “If anything we would like to lower them if we can figure out some other revenue stream,” Dr Hoyt said.” PeerJ was also explicitly references in the Wellcome Trust report into and effective APC market by Björk and Solomon mentioned below. That report concludes that “PeerJ has published around 220 articles in the first nine months of operation. It is not clear at this point if the journal will scale up to a level that will make it financially sustainable but it offers an innovative funding model”.

Institutional repositories provide an ideal medium for scholars to move beyond the journal article

Academic Commons Use-per-item graph (CC BY 3.0)

Academic Commons Use-per-item graph (CC BY 3.0)

March 12: Leyla Williams, Kathryn Pope, and Brian Luna Lucero from Columbia University make the case for institutional repositories to collect all the work of their scholars rather than focusing on only peer reviewed journal articles or monographs. They discuss how their IR ‘Academic Commons’ hosts conference videos, presentations, and technical reports and other “grey literature” (as the figure shows). “IRs are crucial for authors whose work may not fit within the scope of any one scholarly journal.”, they note. “They are also vital for researchers with data that lies outside the parameters of disciplinary data repositories, for dissertation authors who want to make supplemental materials available, and for undergraduates.” They discuss examples where deposition in their repository has helped young researchers find a job and how the lively twitter feed around what’s deposited helps disseminate the work.

Top 10 tips for navigating ethical challenges in scholarly publishing

Plaigiarism By Brett jordan

Image by Brett Jordan (CC BY)

March 12: Jackie Jones, Executive Journals Editor at Wiley provides her top tips on publication ethics to coincide with the 2nd edition of Wiley’s “Best Practice Guidelines on Publishing Ethics: A Publisher’s Perspective”.  The guidelines (available online or as a PDF) are published under a Creative Commons Non-Commercial license and have drawn on a range of expertise including from COPE. They form a useful addition to the more in-depth guidelines and forums that COPE already provides and signal the increasing accountability that all reputable publishers have to ensure high ethical standards in scholarly publishing. In a subsequent post, Udo Schuklenk an editor of the journal Bioethics and Developing World Bioethics lists some of the infringements he’s seen as editor with respect to plagiarism:  “Over the years you begin to delude yourself into thinking that you have seen the full range of ethics infringements.  It’s particularly ironic, I guess, when you edit bioethics journals: you would hope that your authors would be clued in to publication ethics issues.” Unfortunately no journal can escape these issues.

Fostering a transparent and competitive market for open access publishing

Image by 401(K) 2013 (CC-BY-SA 2.0)

Image by 401(K) 2013 (CC-BY-SA 2.0)

March 12: A range of funders, led by the Wellcome Trust, released a report by Bo-Christer Björk of the Hanken School of Economics, Finland, and Professor David Solomon of Michigan State University on the structure and shaping of the APC market. It’s worth skimming the whole report as it has a lot of good information as well as giving a sense of the directional thinking of funders. The report contains useful figures and data.

The key conclusions are that the full OA market is competitive and functioning with varied pricing depending on field and impact. The market for APCs of articles in hybrid journals run by subscription publishers is dysfunctional with relatively flat pricing (and low uptake). A set of analyses of the report have started to appear and look out for a some comments and a summary here over the next few days.

Science self-corrects – instantly

Image by Boston Public Library (CC BY 2.0)

Image by Boston Public Library (CC BY 2.0)

March 11:  The  blog for PubPeer, an online post-publication commenting service, discusses two papers published in Nature in January purporting to provide a revolutionary simple technique for producing pluripotent stem cells, termed STAP cells. A post by Paul Knoepfler on his own blog expressed initial doubts which were fuelled by comments on PubPeer exposing problems with the figures in the paper. Knoepfler then hosted a post to crowdsource researchers trying to replicate the apparently simple production of mouse STAP cells – so far with little success. And then comments posted last week suggested that some of the figures in one of the Nature papers were duplicated from seemingly different experiments in the lead author’s doctoral thesis. Not long after, the Wall Street journal reported that the senior co- author from RiKEN (a leading Japanese research institute) asked for the papers to be retracted though a co-author at Harvard continues to support the work. Nature also reported on the press conference with RIKEN, who announced the findings of an interim investigation but have not made any decision about retracting the papers. The story rumbles on – this week, RIKEN withdrew the original press release about the paper stating “The reports described in the CDB news story “STAP cells overturn the pluripotency paradigm” published on Jan 30, 2014 are currently under investigation. Due to a loss of confidence in the integrity of these publications, RIKEN CDB has withdrawn press release from its website.”

The moral of the story is not just whether the papers in Nature hold up but whether commenting platforms like PubPeer and blogs provide a valid means to scrutinise bold claims. While there are cases where issues are identified there is also concern about the potential for personal attacks, particularly given the anonymity that some of these platforms provide.

This is especially important given the increasing number of times pre-publication review actually fails. In a recent and related post in Nature, Richard van Noorden discusses how to choose among the many venues now available for researchers to discuss work after publication, focusing on a discussion of the same papers by Kenneth Lee on ResearchGate. ResearchGate provided Lee with a more structured review form, they’re calling Open Review. PLOS Labs has also recently set up a similar innovative initiative, called ‘Open Evaluation’. As PubPeer conclude at the end of the post (main link) “Science is now able to self-correct instantly. Post-publication peer review is here to stay”

How is it possible that Elsevier are still charging for copies of open-access articles?

March 11: Mike Taylor provides a run-down of the charges that Elsevier still try to apply for reusing their ‘Open Access’ Articles. Apparently, it’s all to do with a problematic technical fix that Elsevier has been trying to solve for a couple of years now. And this week Peter Murray Rust publishes a letter he received from Elsevier’s Director of Access and Policy Alicia Wise, about how they have now taken steps to fix the problem and compensate individuals who have been mis-sold Open Access products….

Should we eliminate the Impact Factor?

March Issue: An interesting take on the impact factor by the Editor in Chief of Biotechniques, Nathan Blow. He reviews the pros and cons of the impact factor versus other article level metrics and concludes that there is an equal danger of misuse if researchers get wedded to any single alternative metric. And he thinks they will because scientists need something on which to base their publishing decisions. What he ends up calling for is a bit more sense in the way we use metrics, and a bit less laziness: “we need to change the way scientists view such metrics: While it might be good to publish in a top tier journal with an Impact Factor of 30—if your article only gets 2 citations, what does this mean? And the opposite is also true—if the journal has an Impact Factor of 2, but your article receives 500 citations in 2 years, should you be penalized for where you publish? And fundamentally, what does it mean to get 2 versus 500 citations? The validity of any statistic or analysis tool depends on careful and appropriate application by an informed user. Maybe scientists need to look beyond sheer numbers towards the “community” impact of their studies. Here, network analysis showing the reach of an article based on a deeper citation analysis might provide stronger insights into its impact. Tenure committee members also need to look beyond the simple “30-versus-2” Impact Factor debate and use their experience and knowledge to see the true contribution that a scientist is making to their field and beyond—you cannot ask a young scientist to do something that you are not willing to do yourself! In the end, measures such as the Impact Factor are only “lazy” statistics because we make them lazy.”

While I agree with much of what he says, I think he omits another factor that scientists should consider when they choose where to publish and that’s the service the journal (or platform) provides Are there the checks and balances that ensure your work is properly reported, what sort of  peer- review service do they have, does the publisher help ensure that the data underlying the paper’s findings are available, is the metadata in a form that means your article or the components within it can be found by anyone interested regardless of subject or geographic location, and can the content be reused easily if others do find it. Making sure your work can be validated, disseminated and is searchable and reusable is what really counts. The metrics will follow.

Is a Rational Discussion of Open Access Possible?

Image by Vaguery (CC-BY 2.0)

Image by Vaguery (CC BY 2.0)

March 10: A dedicated blog set up by Rick Anderson to host the slides and transcripts of a talk he gave at the Smithsonian Libraries. Rick is perhaps better known as a chef for Scholarly Kitchen (where he’s posted a link to the video of his talk). Both blogs have a lively set of comments – largely supportive ones from Mike Taylor for example even though he was criticised in the talk and some insights from Jan Velterop (who started BMC with Vitek Tracz).

Altmetrics could enable scholarship from developing countries to receive due recognition

March 10: Fantastic post by Juan Pablo Alperin on the potential impact of altmetrics for researchers in developing countries. One of the issues he raises is the perception that researchers in developing countries don’t produce as much research but this is largely because the research they do produce is not represented in abstracting and indexing services such as Thomson Reuters’ Web of Science. This means that the work doesn’t get cited as much and the journals don’t even gain entry into the notorious impact factor game. Resources like SciELO are trying to redress this balance but are still working with only a subset of the 5000+regional journals (largely from S. America). He provides a striking image (below) of the world scaled by the number of papers in  Web of Science by authors actually living there which puts the lack of representation in these countries into stark relief.

World Scaled  Image by Juan Pablo Alperin (cC-BY)

World Scaled Image by Juan Pablo Alperin (CC BY)

But whether altmetrics can help redress this balance is open to question. The potential is huge but to realise the promise, he argues, altmetrics (and the ALM community more generally) need to engage with scholars from developing regions. He cautions that if altmetrics are used as just another means to rank scholars then the danger is that they will evolve to cater only for those in locations where they are most heavily used (i.e. not in developing countries). However he is part of the Public Knowledge Project working with the PLOS’ ALM application to provide a free altmetrics service to journals being run and hosted in developing countries (via the OJS platform). “As the field begins starting to consolidate, I remain optimistically pro-altmetrics for developing regions, and I have faith in the altmetrics community to serve all scholars. Which directions altmetrics should go, how they should be used, or how the tools should be implemented is not for me to prescribe, but if we exclude (or do not seek to include) scholars from developing regions, altmetrics will become another measure from the North, for the North. And we already know that story.”

Dubiously online

March 08: Article about the need to police open access journals in India by an Indian Academic, Rudrashis Datta: “Unless the higher education authorities step in to stem the rot, serious open access publishing, so important in the Indian academic context, runs the risk of dying a natural death, leaving serious researchers and academics without the advantage of having to showcase their scholarly contributions to readers across the world.“

The price of publishing with arXiv

March 05: Mathematician discussing the advantages and disadvantages of publishing in arXiv: “The advantage: I had a lot of fun. I wrote articles which contain more than one idea, or which use more than one field of research. I wrote articles on subjects which genuinely interest me, or articles which contain more questions than answers. I wrote articles which were not especially designed to solve problems, but to open ones. I changed fields, once about 3-4 years.”….”The price: I was told that I don’t have enough published articles…”

“But, let me stress this, I survived. And I still have lots of ideas, better than before, and I’m using dissemination tools (like this blog) and I am still having a lot of fun.”

Making it Free, Making it Open: Crowdsourced transcription project leads to unexpected benefits to digital research

Image by Ewan Munro (CC BY-SA)

Image by Ewan Munro (CC BY-SA)

March 03: Melissa Terras Professor of Digital Humanities at University college London.discusses how they used crowdsourcing to transcribe all of philosopher and reformer, Jeremy Bentham’s writings.  The ‘Trancscribe Bentham’ site is hosted by UCL. It is a fantastic project and seems likely to follow the success of similar crowdsourcing initiatives in the sciences like GalaxyZoo. As Mellisa notes, “This week we hit over 7000 manuscripts transcribed via the Transcription Desk, and a few months ago we passed the 3 million words of transcribed material mark. So we now have a body of digital material with which to work, and make available, and to a certain extent play with. We’re pursuing various research aims here – from both a Digital Humanities side, and a Bentham studies side, and a Library side, and  Publishing side. We’re working on making canonical versions of all images and transcribed texts available online.  Students in UCL Centre for Publishing are (quite literally) cooking up plans from what has been found in the previously untranscribed Bentham material, unearthed via Transcribe Bentham. What else can we do with this material?” And there are lots of doors opening for them too – such as looking into Handwritten Text Recognition (HTR) technologies.

Experiment in open peer review for books suggests increased fairness and transparency in feedback process

Feb 28: Hazel Newton, the Head of Digital Publishing at Palgrave Macmillan describes their current peer review pilot investigating how open feedback functions in monograph publishing and gets feedback from authors involved in project. Great to see open peer review experiments in the humanities as well as the sciences.

Category: altmetrics, News, Open Access, Open Access policy, Uncategorized | Tagged , , , , , | Leave a comment

Why FIRST is a Trojan Horse

Bookmark and Share

Bill would be a major setback to progress on public access to US federally funded research

PLOS opposes the public access language set out within a bill introduced to the US House of Representatives on Monday, March 10. Section 303 of H.R. 4186, the Frontiers in Innovation, Research, Science and Technology (FIRST) Act would undercut the ability of federal agencies to effectively implement the widely supported White House Directive on Public Access to the Results of Federally Funded Research and undermine the successful public access program pioneered by the National Institutes of Health (NIH) – recently expanded through the FY14 Omnibus Appropriations Act to include the Departments Labor, Education and Health and Human Services.  Adoption of Section 303 would be a step backward from existing federal policy in the directive, and put the U.S. at a disadvantage among its global competitors.

PLOS has never previously opposed public access provisions in US legislation but the passage of FIRST as currently written would reduce access to tax-payer funded publications and data, restrict searching, text-mining and crowdsourcing and place US scientists and businesses at a competitive disadvantage.

“PLOS stands firmly alongside those seeking to advance public access to publicly funded knowledge”, said PLOS Chief Executive Officer Elizabeth Marincola. “This legislation would be a substantial step backwards compared to the existing U.S. policy as set out by the White House and in the recent Omnibus Bill.”

As the Scholarly Publishing and Academic Resources Coalition (SPARC) outlines, Section 303 would:

  • Slow the pace of scientific discovery by restricting public access to articles reporting on federally funded research for up to three years after initial publication.  This stands in stark contrast to the policies in use around the world, which call for maximum embargo periods of no more than six to 12 months.
  • Fail to support provisions that allow for shorter embargo periods to publicly funded research results.  This provision ignores the potential harm to stakeholders that can accrue through unnecessarily long delays.
  • Fail to ensure that federal agencies have full text copies of their funded research articles to archive and provide to the public for full use, and for long-term archiving.  By condoning a link to an article on a publisher’s website as an acceptable compliance mechanism, this provision puts the long term accessibility and utility of federally funded research articles at serious risk.
  • Stifle researchers’ ability to share their own research and to access the works of others, slowing progress towards scientific discoveries, medical breakthroughs, treatments and cures.
  • Make it harder for U.S. companies – especially small businesses and start-ups – to access cutting-edge research, thereby slowing their ability to innovate, create new products and services and generate new jobs.
  • Waste further time and taxpayer dollars by calling for a needless, additional 18-month delay while agencies “develop plans for” policies.  This is a duplication of federal agency work that was required by the White House Directive and has, in large part, already been completed.
  • Impose unnecessary costs on federal agency public access programs by conflating access and preservation policies as applied to articles and data.  The legislation does not make clear enough what data must be made accessible, nor adequately articulate the location of where such data would reside, or its terms of use.

The FIRST Act was introduced in the House of Representatives by Chairman Lamar Smith (R-TX) and Rep. Larry Bucshon (R-IN). It is expected to be referred to the House Committee on Science, Space, and Technology.

Take Action Before Thursday, March 13:

Encourage federal agencies to implement the White House Directive and ensure the passage of the bipartisan, bicameral Fair Access to Science and Technology Research (FASTR) Act.

Category: Open Access policy | Tagged , , , | 2 Comments

Best Practice in Enabling Content Mining

Bookmark and Share

This is a document prepared to support the European Commission’s ongoing discussion on Content Mining. In particular it is a discussion of publisher best practice in terms of enabling content mining and the challenges that can arise when particular types of traffic reach high levels from the perspective of a purely Open Access publisher.


Enabling the discovery and creative re-use of content is a core aim of Open Access and of Open Access publishers. For those offering Open Access publication services enabling downstream users to discover and use published research is a crucial part of the value offering for customers. Content mining is an essential emerging means of supporting discovery of research content and of creating new derivative works that enhance the value of that content.

Content mining generally involves the computational reading of content, either by obtaining specific articles from a publisher website or by working on a downloaded corpus. Computational access to a publisher website has the potential in theory to create load issues that may degrade performance for human or other machine users.

In practice downloads that result from crawling and content mining contribute a trivial amount to the overall traffic at one of the largest Open Access publisher sites and are irrelevant compared to other sources of traffic. This is true both of average traffic levels and of unexpected spikes.

Managing high traffic users is a standard part of running a modern web service and there are a range of technical and social approaches to take in managing that use. For large scale analysis a data dump is almost always going to be the preferred means of accessing data and removes traffic issues. Mechanisms exist to request automated traffic be kept at certain levels and these requests are widely followed – where they are not technical measures are available to manage these problematic users.

Scale and scope of the problem

PLOS receives around 5 million page views per month users to a corpus of 100,000 articles as reported by Google Analytics. This is a small proportion of the total traffic as it does not include automated agents such as the Google bot. The total number of page views per month is over 60 million for PLOS ONE alone. Scaling this up to the whole literature suggests that there might be a total of 500 million to 5 billion page views per month across the industry, or up to seven million an hour from human users. As noted below the largest traffic websites in the world provide guidance that automated agents should limit retrieving pages to a specified rate. Wikipedia suggests one page per second. PLOS requests a delay of 30s between downloading pages.

PLOS infrastructure routinely deals with spikes of activity that are ten times the average traffic and is designed to manage loads of over 100 times average traffic without suffering performance problems. Thus it would require hundreds of thousands of simultaneously operating agents to even begin to degrade performance.

Content mining is a trivial and easily managed source of traffic compared to other sources, particularly coverage on popular social media sites. Coverage of an article on a site like Reddit often leads to tens of thousands of requests for a single page within an hour. By contrast automated crawling usually leads to a smaller number of overall downloads and is spread out over longer time periods making it much easier to manage. As an example there are attempts made to artificially inflate article download counts, which involve tens of thousands of requests for the same article. We do not even attempt to catch these at the level of traffic spikes because they would be undetectable, they are detected through later analysis of the article usage data.

Sources of traffic that do cause problems are generally rogue agents and distributed denial of service attacks where hundreds of thousands or millions of requests occur per second. These sources of traffic are the main source of service degradation and need to be managed based on the scale of traffic and the likelihood of being a target for such attacks. The scale of content mining traffic for any given publisher will be dependent on the scale of interest in the content that publisher is providing.

Management approaches

There are broadly three complementary approaches to supporting content mining in a way that does not have any impact on user experience. While all of these approaches are implemented by effective scholarly publishers it is worth examining these approaches in the context of a truly high traffic site. Wikipedia is an excellent example of an extremely high traffic site that is also subject to large scale mining, scraping, and analysis.

Providing a data dump

The first and simplest approach is to provide a means of accessing a dump of all the content where it can be obtained for off line analysis. Generally speaking the aim of analysis is to mine a whole corpus and enabling the user to obtain a single dump and process this offline improves the experience for the miner while removing any risk of impact to website performance. Wikipedia provides a regular full dump of all content for all language versions and recommends that this be the first source of content for analysis. Many Open Access publishers adopt a similar strategy utilising deposition at Pubmed Central or on their own websites as a means of providing access to a full dump of content. PLOS recommends that those wishing to parse the full corpus use PMC or EuropePMC as the source of that content.

This approach is especially useful for smaller publishers running their own infrastructure as it means they can use a larger third party to handle dumps. Of course for smaller publishers with a relatively small corpus the scale of such a data dump may also be such that readily available file sharing technologies suffice. For a small publisher with a very large backfile the imperative to ensure persistence and archiving for the future would be further supported by working with appropriate deposit sites to provide both access for content miners and preservation. Data dumps of raw content files are also unlikely to provide a viable alternative to access for human readers so need not concern subscription publishers.

Agreed rates of crawling

It is standard best practice for any high traffic website to provide a “robots.txt” file that include information on which parts of the sites may be accessed by machine agents, or robots, and at what rate. These files should always include a ‘crawl-delay’ which indicates the time in seconds that an agent should wait before downloading a new page. Wikipedia’s robot.txt file says for instance “Friendly, low-speed bots are welcome viewing article pages, but not dynamically-generated pages please” and suggests a delay of at least one second between retrieving pages. This is not enforced technically but is a widely recognised mechanism that is respected by all major players – not following this is generally regarded as grounds for taking technical measures as described below.

PLOS request a crawl delay of 30 seconds currently, Biomed Central asks for one second, eLife for ten. When working with content from a large publisher crawl delays of this magnitude means that it is more sensible for large scale work to obtain a full data dump. Where a smaller number of papers are of interest, perhaps a few hundred or a few thousand then the level of traffic that results from even large numbers of content mining agents that respect the crawl delay is trivial compared to human and automated traffic from other sources.

Technical measures

It is however the case that some actors will not respect crawl-delays and other restrictions in robots.txt. In our experience this is rarely the case with content miners and much more frequently the result of malicious online activity, rogue automated agents, or in several cases the result of security testing at research institution which sometimes involves attempts to overload local network systems.

Whether the source is a spike in human traffic, malicious agents, or other sources of heavy traffic maintaining a good service requires that these issues be managed. The robots.txt restrictions become useful here as when it is clear that an agent is exceeding those recommendations it can be shut down. The basic approach here is to “throttle” access from the specific IP address that is causing problems. This can be automated, although care is required because in some cases a single IP may represent a large number of users, for instance a research institution proxy. For PLOS such throttling is therefore only activated manually at present. This has been done in a handful of cases, none of which related to text mining.

At larger scale automated systems are needed but again this is part of running any highly used website. Load-balancing, monitoring incoming requests and managing the activity of automated is a standard part of running a good website. Identifying and throttling rogue activity is just one part of the suite of measures required.


Enabling content mining is a core part of the value offering for Open Access publication services. Downloads that result from crawling and content mining contribute a trivial amount to the overall traffic at one of the largest Open Access publisher sites and are irrelevant compared to other sources of traffic. This is true both of average traffic levels and of unexpected spikes.

Managing high traffic users is a standard part of running a modern web service and there are a range of technical and social approaches to take in managing that use. For large scale analysis a data dump is almost always going to be the preferred means of accessing data and removes traffic issues. Mechanisms exist to request automated traffic be kept at certain levels and these requests are widely followed – and where they are not technical measures are available to manage these problematic users.

There are sources of traffic to publisher website that can cause problems and performance degradation. These issues are part of the competent management of any modern website. Content mining, even if it occurred at volumes orders of magnitude above what we see currently, would not be a significant source of issues.


Category: RFI | Tagged , , , , | 1 Comment

PLOS Opens Roundup (March 7)

Bookmark and Share


In this issue, Obama signals a commitment to open access and the Dutch libraries start cloudsharing, whilst in other news there is a new science magazine published by the Wellcome Trust, a round-up of the posts surrounding the withdrawal of nonsense papers published by Springer and IEEE, a curious case of the London mathematical society,  an interview with Sydney Brenner discussing why the current culture of some labs stifles innovation, a thought experiment of why publishers don’t need embargoes, a brief review of the cost of hybrid publishing, and the odd mention of data sharing policies….[and see the update added on March 09 to the entry 'PLOS's Bold Data policy']

With thanks to Heather Joseph, Alma Swan, Ginny Barbour and Susan Au for links and tip-offs.


US: The President’s 2015 Budget Request – Public Access Language

US Flag 04 March: President Obama released his FY15 Budget request. As you may know, this request essentially represents the President’s policy “wish list” for the year, and signals the official start to the federal budget process. In the budget released on the 3rd, there is a section titled “Creating a 21st Century Government,” which includes a subsection on “Economic Growth: Open Government Assets as a Platform for Innovation and Job Creation.” It includes language discussing the need for greater public access to government-generated assets, including scientific research. The budget explicitly states:

By opening up Government-generated assets including data and the fruits of federally funded research and development (R&D)—such as intellectual property and scientific publications—to the public, Government can empower individuals and businesses to significantly increase the public’s return on investment in terms of innovation, job creation, and economic prosperity.

While it carries no executive or legislative force, the language signals a continued commitment to the issue of ensuring public access to the results of publicly funded research. For full details and context, please see pages 41-42 of the main link. (Thanks to Heather Joseph for providing this update)

Australia: The National Health and Medical Council sign up to DORA and a guide for researchers to the ARC data management plan

Feb 24: Both announcements are noted by Stephen Matchett in different issues of the ‘Campus Morning Mail’ (you need to scroll down).  On the San Francisco Declaration on Research Assessment (DORA), he notes that the Australian Research Council (ARC) have still to sign up to this (and should). He also adds that “hoping the publishers will actually acknowledge a reform not in their interests is probably too much to expect.” But there are several publishers, such as PLOS, the Royal Society and AAAS, as well as individual journals that have signed up to DORA, in addition to the growing number of funders. Lagging behind even some publishers, however, are the signatories of actual institutions. On the data front, the Australian National Data Service has produced a guide for researchers about creating the required data management plan for ARC, which has to include details for how they will store and share their data (sound familiar – see later).

Netherlands: Dutch consortium of university libraries and the National Library of the Netherlands move to cloudshare their metadata with OCLC Worldshare

Netherlands FlagFeb 19: Not strictly a policy move but a significant move by the libraries of an entire country to shift their services services to OCLC WorldShare. “WorldShare provides an open cloud-based approach for sharing metadata, applications and innovation, enabling library consortia to collaborate at a national or regional level as well as connecting globally to raise visibility and awareness of their institutions on the web, and take advantage of the economies of scale that global collaboration brings.” As Eric van Lubeek, Managing Director, OCLC EMEA notes in the press release: “UKB’s move will serve as an example for other libraries and library consortia, not just in the Netherlands, but around the world.”


Thoughts on journal embargoes

March 05: Ben Johnson provides a thoughtful thought experiment re-imagining scholarly communication and deconstructing the arguments about why publishers think they need to impose embargoes on the final version of the manuscript. He discusses what publishers add that is considered ‘essential’ – i.e. peer-review and ‘brand-recognition’ – and shows that neither has to be supplied by the publisher or subject to embargo because both are available via the accepted version of the article, which can generally be posted to a repository immediately. He goes on to argue that if embargoes were completely lifted, libraries would still continue to subscribe to journals because 1) that’s how articles get cited, 2) readers like all the peripheral content in journals (book reviews etc) , 3) it’s easier to find the article, 4) librarians cancel journals because of price, rather than embargo length, and 5) most librarians won’t cancel anyway as they are tied into big deals.

It’s an interesting analysis and I think he’s right but I think that it also omits one other service. He doesn’t discuss the role of either marking up the html or xml version of the article or ensuring that the article adheres to the appropriate standards of metadata (including metadata that enables you to know what licence is associated with the article). Again, this is not something that publishers need to do but it is currently a service that most established publishers provide because it makes their content more discoverable on the web (until you come up against that paywall). And it is also these jewels of the digital age that subscription publishers are protecting by trying to restrict text mining. They want you to find their version of the article but only to use it under the conditions they stipulate (thus protecting potential revenue stream).

A new way to explore the science of life: Mosaic launches today

Mosaics LogoMarch 04: In another pioneering move, the Wellcome Trust has launched a new #OA magazine about science called ‘Mosaic’, which will feature in-depth stories (including video) across the biosciences but will also include some topics from the humanities (reflecting Wellcome’s roots and focus of funding). A cross between a blog and more formal online magazine, with a wonderfully sleek design, it has a really strong line-up of regular contributors including people like Oliver Burkman (well known to Guardian readers) whose opening feature is an interview with Steven Pinker, as well as Emily Anthes writing on the female condom (no interest there then…) and Michael Regnier exploring Alzheimer’s Disease. Mosaic actively encourages you to not only read the written content for free but also share it and, yes, even republish it –  even commercial re-use is permitted (as long as there is appropriate attribution). Giles Nelson, the editor of Mosaic, provides the rationale for why they opted for a CC BY licence. Such a licence is still rare for this type of ‘ front-section’ content and represents a potential game changer among science writing – (although note it doesn’t apply to all of the content). For example, it is sometimes argued that the front section of journals like Science and Nature could never convert to OA because the work is often commissioned from e.g. science writers and journalists and you therefore have to charge for access because it is not feasible to recoup the cost with e.g. an APC (although this wouldn’t apply to practicing scientists writing in these sections – they generally aren’t paid a commission and often acknowledge funders). Certainly there is a cost that has to be paid for somewhere – but Wellcome have obviously decided that footing the entire bill for the sake of engaging the public with the research they are funding is worth it. Fantastic initiative.

Update (March 07): The SpotOn London conference last fall had a session on Creative Commons journalism and the video is available.

UKSG official journal fully Open Access (without publishing charges) with special issue on OA

Insights coverMarch 04: UKSG, an organisation with a mission to “connect the information community and encourage the exchange of ideas on scholarly communication”, has just flipped its official journal, Insights, to OA. To mark the occasion, they commissioned a special issue on Open Access, featuring articles from speakers at a conference they hosted last year. Among contributions are those focusing on policy (e.g. about Finch by Michael Jubb, and HEFCE from David Sweeny and Ben Johnson), on publishers (e.g. by PLOS ONE’s Damian Pattinson and myself, and by Taylor and Francis on how they’re riding out the transition), on OA in the humanities (by Caroline Edwards, co-founder and co-director of the Open Library of Humanities) as well as OA developments in China by Xiaolin Zhang (Chinese Academy of Sciences).

PLOS’ Bold Data Policy

Image by planeta (CC BY-SA 2.0)

Image by planeta (CC BY-SA 2.0)

March 04: There will be more to come on the PLOS data policy* [see update below] but in the meantime, here is a list to some of the many links and posts that have discussed it. From PLOS, there is an article in PLOS Biology detailing the policy, our FAQ page, and a  post on the EveryONE blog. As well as the numerous tweets, there are reactions from Ian Dworkin, Edmund Hart, Practical Data Management for Bug Counters, DrugMonkey, the MacManes Lab., Erin C. McKiernan, Neuropolarbear, motorcar nine, Small Pond Science and David Crotty over at Scholarly Kitchen (main link). Note that the comments are often as interesting and revealing as the posts and there is a focus on behavioural, neurological and ecological data. Be sure to check out related discussion articles by e.g.  by Joel Hartter et al , Bryan Drew et al and Dominique Roche et al all in PLOS Biology with  associated blog posts from Roli Roberts and Emma Ganley (editors on PLOS Biology). And then read Cameron Neylon’s post ‘Open is a state of mind’.

Update (March 07): Please also see this post by Björn Brembs and another by ecologist Timothée Poisot .

*Update March 09: Theo Bloom (PLOS Editorial Director, Biology) has provided a correction, apology and further clarification about our data sharing policy given “the extraordinary outpouring of discussions on open data and its place in scientific publishing”. As she notes, much of the discussion centered on a misunderstanding in a previous PLOS ONE blog post and also on our site for PLOS ONE Academic Editors: “an attempt to simplify our policy did not represent the policy correctly and we sincerely apologize for that and for the confusion it has caused”. ….”We have struck out the paragraph in the original PLOS ONE blog post headed “What do we mean by data”, as we think it led to much of the confusion.”

As Ivan Oransky reports on Retraction Watch: “The move looks like the right thing to do. The problem seemed to have stemmed from how the policy was communicated, rather than what PLOS actually wanted to accomplish, which is better data sharing. In a time when reproducibility is a growing concern, the latter is a must.

Here’s the salient points of the clarified policy:

“Two key things to summarize about the policy are:

  1. The policy does not aim to say anything new about what data types, forms and amounts should be shared.
  2. The policy does aim to make transparent where the data can be found, and says that it shouldn’t be just on the authors’ own hard drive.”

“We ask you to make available the data underlying the findings in the paper, which would be needed by someone wishing to understand, validate or replicate the work. Our policy has not changed in this regard. What has changed is that we now ask you to say where the data can be found.

As the PLOS data policy applies to all fields in which we publish, we recognize that we’ll need to work closely with authors in some subject areas to ensure adherence to the new policy. Some fields have very well established standards and practices around data, while others are still evolving, and we would like to work with any field that is developing data standards. We are aiming to ensure transparency about data availability.”

She then goes on to demonstrate with an example question and answer. If you have further questions you can post comments to her post, or contact PLOS by email at, and via all the usual channels.

Access and Accessibility for the London Mathematical Society Journals

March 03: Fascinating article by Susan Hezlet in the March issue of ‘Notices of the American Mathematical Society’ about whether the presence of a preprint version on the arXiv has an effect on the usage of the final published version and what the LMS is thinking about open access. The key conclusion is that there is essentially no difference between their usage figures for papers in/not in ArXiv but they still fear that their revenues will be undermined by the UK Government’s open access policies – “…it seems the danger does not lie in the ArXiv version…I believe there would be a threat to the subscription base if we were required to deposit the final published version and not just the authors accepted manuscript. I should be clear that no one is asking this…”.  It is also curious to see that even in a mathematics journal they don’t provide confidence limits on their data (e.g. Fig 1).

Opening Science

Some rights reserved by Martin Clavey. Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

Montage Photoshop by Martin Clavey. Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

March 02: A new book on open science released in January under a CC-BY-NC licence  but with the latest version just out (see menu in the top left). Each chapter has a different author and can be downloaded as a separate pdf (with its own DOI). The chapters span a wide range of topics that covers the basics, such as what ‘open science’ means by Benedikt Fecher and Sascha Friesike, and another on why impact factors and publication pressure reduce the quality of scientific publications by economist Mathias Binswanger, as well as a host of chapters on the tools and the vision for the future. It’s a euro-centric compilation that also includes chapters by many you’ll recognise including Pete Binfield, John Willinsky and Martin Fenner, who told me that they hope to keep the work updated and will consider adding chapters in the future as needed. Be sure to note the fact that you can take all the code for this from Github (‘Fork me on Github’ – top right) and remix it as you see fit (although you can’t then sell it). A great resource

Small firms lack resources to make most of open access

Feb 27: In Times Higher, Paul Jump discusses Elsevier’s contention that small firms lack resources to make the most of open access and that just providing access to the literature wasn’t going to lead to the sort of economic innovation that David Willets (the UK Minister for Sciences) thought it would. According to  David Mullen, Elsevier’s regional sales director of corporate markets in Europe, the Middle East and Asia, when Elsevier provided access to their journals to a dozen small firms in the Netherlands, it had little impact. This directly contradicts a study of small and medium enterprise firms in Denmark however, by John Houghton, Alma Swan and Sheridan Brown. Hougton et al. showed that  it would have taken an average of 2.2 years longer to develop or introduce the new products or processes in the absence of contributing academic research and would cost around DKK210 000 per firm in lost savings (DKK 10 million per year in total across the sample) – see page 47 of the pdf. Elsevier also don’t note that a 2007 study across the EU showed a very weak link between innovative enterprises and public research institutes and universities (see slide 13 and rest of talk given by Alma Swan at the UHMLG Spring Forum).

Mullen does have a point though  – open access to literature is not enough – there also needs to be the social, technological and policy infrastructure in place to ensure seamless searching and filtering between different platforms regardless of the content provider. But I don’t think that Mullen’s solution is the one to adopt. He concludes that the key would be “to provide companies with Elsevier’s entire set of tools for identifying useful research among its journals at an affordable price to help them quickly find the information they needed”…. .

Update (Mar 6): John Houghton responded to the article in Times Higher: “the proposed “tools”, in the form of paywall-controlled proprietary access silos, are the problem, not the solution

Publishers withdraw more than 120 gibberish papers

L0031339 Nonsense talked by a cobbler compared to the talk of a parso

Credit: L0031339 – Wellcome Library, London Nonsense talked by a cobbler compared to the talk of a parson and a surgeon-apothecary. Coloured etching attributed to C. Williams, ca. 1812. (CC BY)

Feb 24:  Richard van Noorden (Nature) covers the story that two publishers – IEE and Springer – have published computer generated papers and were selling them as part of their subscription services. Springer swiftly released a statement on 27th Feb stating that they were removing rather than retracting the papers ‘since they are all nonsense’. The news was covered by both the science press (not just Nature but also in Retraction Watch) and more general media, such as Slate magazine, Fox News, the Wire, and the Telegraph. Most of the posts linked the generation of fake papers with the pressure to publish (and the problems of the impact factor) while raising questions about  whether the rash of fake papers was indicative of ‘slipping standards among scientists’ or the fact that salaries of some professors are linked to the number of papers they publish.

Achilleas Kostoulas lays bare many of the underlying problems in his aptly titled post Fake Papers are not the real problem in Science where he discusses the long history of hoaxes and retractions in science, drawing on Curt Rice’s article in the Guardian about why you can’t trust research. Although papers published as conference proceedings are often not subject to the same rigour of peer-review as articles submitted to journals, there is no doubt that this mess, like others before it (e.g. the OA sting by John Bohannon or the ‘Arsenic Life’ paper in Science), is a larger symptom of a system of peer-review and research evaluation that is increasingly failing. It is not, however, an indictment of the rigour of peer-review for subscription services. Regardless of the type of publisher – OA or subscription –  there is an urgent need to research how research itself is evaluated both before and after publication. One question that remains unanswered is why these computer-generated papers were submitted in the first place. One possibility is that they come from scientists wanting to boost their publication records, although as Richard van Noorden notes (main link), some of the authors were unaware of the submissions. An alternative is that conference organisers might be trying to boost their profile although there is no direct evidence of this.

How Academia and Publishing are Destroying Scientific Innovation: A Conversation with Sydney Brenner

C0009284 Credit: Wellcome Library, London A traditional glass lightbulb with a metal filament against a glowing yellow background. Photograph 1/9/2001 Collection: Wellcome Images

C0009284 Credit: Wellcome Library, London Collection: Wellcome Images Copyrighted work available under CC BY 2.0

Feb 24: In a somewhat related article on the same day as the Nature story above, Elizabeth Dzeng discusses the state of science evaluation with Nobel prize winner Sydney Brenner. Her interview starts out with a fascinating insight into the emerging field of genetics and molecular biology when individuals at the Laboratory for Molecular Biology at Cambridge, such as Brenner, Fred Sanger and Francis Crick, were seen as extremists – part of some kind of evangelical sect. But Brenner goes on to note that the culture of innovation that facilitated their discoveries no longer exists because it has been replaced by a new culture in [US] science that relies on ‘the slavery of graduate students’ and the ‘post-doc as an indentured labourer’. And peer-review is hindering science, he says, and has become ‘completely corrupt’ – “it’s not publish or perish, it’s publish in the okay places [or perish] – while a system of publishing where the author hands over copyright to publishers perpetuates this. He concludes that the open access movement is beginning to shift the culture back  and that even journals like Cell, Nature and Science will have to bow in the end.

ALM community site launched

Feb 24: Jennifer Lin and Martin Fenner have launched a community site, including a blog, which aims to aggregate all the information about ALMs from different sites and to help showcase ALM visualizations of examples “done with d3.js and R, with source code and data openly available to make it easier for people to get started using ALM data.” A great one is the top ten most cited articles on Wikipedia (measured as the number of pages citing a particular article) which lists a paper from Science about life on Mars at the top and also features two PLOS ONE articles (e.g. one about a new species of river dolphin). Notice the source code on the main page re wikipedia citations – which you can lift and use to host the metrics on your own site (you will need an appropriate API key to make it work though).

Update (March 07) from Martin Fenner: The Wikipedia example is based on roughly 320,000 articles from January/February 2014 loaded by CrossRef Labs. So it indicates what recently published papers are popular, not all Wikipedia content (yet!)

Collaborate, co-operate, communicate!

Photo by Krista Baltroka (CC BY)

Feb 20: Over at scholarly kitchen, Alice Medows provides a refreshing post about the need for more collaboration and communication between open access advocates and those from the more traditional wing of the publishing industry with the aim of celebrating success regardless of where it comes from. This is partly in response to less than positive acclaim for initiatives like the ‘Access to Public Research’ (e.g. by Cameron). Much of the discussion has been about the tone adopted by one side or the other and it is worth distinguishing between tone and a genuine difference in substance. As Cameron  notes in his more recent response to this, “Discussion is always more useful than shouting matches. And sometimes that discussion will be robust, and sometimes people will get angry. It’s always worth trying to understand why someone has a strong response. Of course a strong response will always be better received if it focuses on issues. And that goes regardless of which side of any particular fence we might be standing on.”

Cost of hybrid

From Andrew Theo (2012), “Gold Open Access: Counting the Costs”, published in Ariadne, 3 December 2012, maps the cost of article processing fees against journal impact factor.Published under Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence.

From Andrew Theo (2012), “Gold Open Access: Counting the Costs”, published in Ariadne, 3 December 2012, maps the cost of article processing fees against journal impact factor.Published under Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence.

Feb 20: Danny Kingsley discusses the cost of publishing OA in a hybrid journal at the Australian Open Access Support Group blog, including providing this figure of the cost on publishing in a hybrid journal compared with a pure OA journal with an APC. The APC of hybrids are higher, regardless of impact factor of the journal (i.e. used as a surrogate of journal quality). She also discusses other research that shows that most hybrid journals charge about $3000 for access to OA articles. She doesn’t discuss why $3000 is a magic figure but I believe it was a legacy-based estimate that resulted from a calculation by a publisher several years ago. Note that the AOASG also has a page about how different publishers are dealing with accusations of double dipping.

March 5: In a related post Wouter Gerritsma discusses how much the Netherlands is currently spending on article processing charges based on the full list of publications with a Dutch affiliation. It’s worth following the calculation in detail because many more of these estimations will be made over the next few years as the debate on implementing Open Access increases. In particular he reaches a conclusion that a move to Open Access for all Dutch publications at the current average of €1087 per article would increase the cost over what the Netherlands pays in subscriptions by a total of €10.5M. However a similar study in the UK by Alma Swan and John Houghton showed that even the most research intensive UK institutions would save money in an Open Access world once the costs of co-authored papers are evenly distributed. Gerritsma notes that 50% of Dutch papers involve international collaboration suggesting the potential for a saving of up to €10M. Getting the details of these calculations right matters. Look back here next week for a discussion of how these calculations can vary.

Why open access should be a key issue for university leaders

Feb 18: Martin Hall (chair of Jisc and vice-chancellor of the University of Salford) makes an impassioned plea in the Guardian for Universities to be far more proactive about ‘openness’ – “the extent to which those working and studying within the university and college system can get access to any digitally-based information they need without encountering a virtual gateway: a password, subscription requirement or payment.”… “Without openness across global digital networks,” he adds “it is doubtful that large and complex problems in areas such as economics, climate change and health can be solved.” The tagline for the Guardian piece states it’s time senior leaders made openness – and its consequences – their concern.  But everyone can be a leader in this regard; individual researchers, readers and publishers all have the power to influence those in positions to take action.

Category: altmetrics, Conferences, News, Open Access, Open Access policy | 1 Comment

Imagine – What is the far future of research communications?

Bookmark and Share

This coming Saturday I will be at the annual Science Online meeting running a session called Imagine: Future of Scholarly Communications in 10-20 Years. This will be the first time I’ve been to Science Online for a few years. What motivated my original pitch for the session was my memories of the discussions at those earlier meetings that I did attend.

Some of these sessions were heated debates on whether Open Access could work. Sometimes there were just crazy ideas like publishing a single figure. Almost always these discussions started with the, usually valid, assumption that we were a fringe element advocating radical change. Fast forward a few years and making research outputs publicly accessible is the mainstream policy position for most funders. That crazy idea of publishing just single figures turned into a startup doing deals with big publishers. The young radicals have turned into entrepreneurs, community leaders, industrial researchers and tenured professors.

So for me this is a chance to reflect, to look back at what has changed, but more importantly to look forward. What does the trajectory of change tell us? What technologies are developing? And perhaps most crucially what are the aspects of our current system that we understand to be core to its value? And how has that changed over the last few years.

The session is an open ended discussion, and is at the end of the meeting so attendees will have absorbed ideas of the technical and social changes that are happening today and debated vigorously what matters about the principles of how and why we communicate research findings. This is our chance to take that and use it to debate what the far future could look like.

From the session outline:

In the early days and incarnations of Science Online we talked a lot about a future for research communication which was not just on the web, but of the web. Looking back now, many of the changes we predicted (or wished for!) have happened, or at least are happening. From our perspective of 2014, with Open Access a reality, dynamic publications appearing, and experiments in pre- and post-publication peer review gathering pace, what can we see if we look not just a few years down the road but far out into the future. What might change? What will probably not change? And how can we extrapolate from the trends we see today into the far future?

What do you think the far future of scholarly communications will look like? What can change? What should change? And what should not change? If your at Science Online I hope to see you there and if not feel free to leave your comments here.

Category: Conferences, Open Access | Tagged , , | Leave a comment

The Opens Roundup

Bookmark and Share

Photo by Benimoto via Flickr CC BY

This is the first of a regular (weekly or bi-weekly) feature of this new blog called the The Opens Roundup. In it, we will highlight initiatives and announcements relating to any aspect of  the ‘opens’ that have happened in the preceding weeks. Occasionally, we’ll also offer previews of public events where PLOS people will be participating, including members of the Advocacy team, in the near future.

PLOS People at Upcoming Events:

And now for the latest Open Access Headlines and News Summaries:

This roundup has been running as an internal newsletter for PLOS staff since March last year and its remit will stay the same – to flag some of the recent events that are of potential interest to those working in open access and open science. We do not aspire to be comprehensive (for that you should subscribe to the Open Access Tracking Project run by TagTeam) or to be neutral – it will no doubt be heavily PLOS-centric – but we do aspire to make our opinions evidence-based. The roundup is structured to have ‘Policy Developments’ at the top and ‘In Other News Below’ with the main links running chronologically in each section. If you wish to send us ideas for links then email them to but please only include posts and links that are new. Please note that we may not be able to include them all. Welcome to the roundup.


Austria: FWF supports funds for OA books

Feb 07: The Austrian Funding Agency FWF has a progressive Open Access policy that requires their grantees to publish with a CC BY or CC BY NC licence. This is their latest guideline for books – applying the same criteria and announcing that authors can use funds from their grants to support the costs. Interestingly, the guidelines stipulate how books  ought to be peer-reviewed and how authors should choose  a publisher (pointing to the same criteria that OASPA uses for membership). In a related OA move, FWF has also announced a new pilot project with the Institute of Physics to provide funds to make articles in their subscription journals Open Access. And  the Institute of Science and Technology in Austria has also just signed the Berlin Declaration and launched its own Open Access Policy in line with FWF ( and the European Research council’s) recommendations.

Australia: UQ adopts open access policy for research

07 Feb: University of Queensland has announced an OA policy to ensure their articles are freely available as soon as possible but not later than 12 months after publication. They also encourage authors to retain copyright and to use a creative commons licence (though without specifying which). They state that their policy is based on the ARC Open Access Policy and the NHMRC Dissemination of Research Findings. It’s a start.

UK: Willetts calls for publisher offsetting to encourage open access

31 Jan: In an Open Letter to Dame Janet Finch about progress towards the Finch recommendations on Gold OA in the UK, David Willets (UK Minister for Universities and Science) calls on publishers to ensure that the transition ‘involves a “meaningful proportion of an institution’s total [article charges] with a publisher” being “offset against total subscription payments with that publisher” on a sliding scale up to a set limit.’ He also warns against bundling of articles and journals by publishers and urges scholarly societies to seek help to develop new business models. The Times Higher post (main link) provides a good summary. His letter also discusses OA developments in 2013, including the strong endorsement by the Netherlands (see below) and the need for international collaboration by funders and governments.

EU launches FOSTER Factsheet

28 Jan: FOSTER (Facilitate Open Science Training for European Research) is an EU funded project run by Eloy Rodrigues (University of Minho) that aims to provide educational resources to help researchers comply with the EU’s Horizon2020 guidelines on Open Access. They will be creating a portal to support the initiative and provide content that can be reused for the training program. They are actively calling for individuals and organisations to help supply content of even host a course about open access


Why open access should be a key issue for university leaders

18 Feb:  Nice article in the Guardian by Martin Hall (Chair of JISC and vice-chancellor of the University of Salford): “For the future of research, though, the need for openness is far more than a convenience. It arises because the volume and rate of production of online publications and digital data sets has now outgrown the limits of conventional research methods and is changing the ways in which new knowledge is created. Without openness across global digital networks, it is doubtful that large and complex problems in areas such as economics, climate change and health can be solved.”

World’s first scientific publisher launches new open access journal and so does AAAS

18th Feb:  Two of the most prestigious scholarly societies in the world – The American Association for the Advancement of Science (the publisher of Science magazine) and the Royal Society of London have each announced their intention to launch a new Open Access journal, called Scientific Advances and Open Science, respectively.

The AAAS announcement was also covered in Nature. In the Science article (main link), Jocelyn Kaiser notes that Science is now ‘joining the herd’ and that the intention is to capture the many presumably decent articles rejected by the existing stable of Science’s journals.  This has now been confirmed in a more recent editorial in Science by the Editor-in-Chief Marcia McNutt and by the CEO of AAAS Alan Leshner. Scientific Advances aims to be publishing a few thousand papers annually within a few years at a price somewhere between $1200 and $5000. This is a very welcome addition. As Heather Joseph (SPARC) and Peter Suber (Harvard) note, AAAS has been slow in joining the party, but their involvement signals another landmark in progress towards OA and a means to remove the access barriers from more research. However, both the price they will charge and the licence they will offer is still to be determined – so despite getting onto the bandwagon, it remains unclear how genuinely open they intend to be…

The Royals Society’s new journal – Open Science – will be launched later this year and is their second OA journal, the other being Open Biology.  Like Science Advances, Open Science will be interdisciplinary, accept papers rejected from their other journals and charge an APC in return for providing access.  But in what seems a more progressive move than Science, it seems likely their new journal will use a CC BY licence (though I haven’t seen this confirmed), given that Open Biology does as well . They also state they will not use subjective criteria to assess the papers but publish all sound science (so more like PLOS ONE) and will be advocating both open data and open peer review.  It sounds like a great initiative. The story was also covered by grrlscientist in the Guardian.

And there have also been three other very welcome recent announcements by Societies about switching their existing titles to OA or at least providing more OA options within them. Note this is very different to the more numerous launches of new OA journals by societies, either independently such as AAAS and the Royal Society  above or in partnership with commercial publishers such as the recent announcement by Wiley and the American Geophysical Union. Societies have been a lot more wary about converting their existing titles and potentially reducing existing revenue streams. First up on the 27th Jan was the Society for Solid-State And Electrochemical Science and Technology (ECS), closely followed by  the Paleontological Society  and then in February the American Anthropological Association announced the conversion of their Cultural Anthropology Journal. Lawrence Biemiller discusses in the Chronicle of Higher Ed how the AAA hope to use their journal to show that OA can provide a viable business model in the humanities while Mike Taylor looks at the pros and cons of the Paleo Soc’s foray into OA (i.e. CC BY-NC on offer as well as CC BY, the highish APC, and an embargo if green).

Elsevier’s new text mining initiative is a step sideways

15 Feb:  Egon Willighagen discusses Elsevier’s announcement that its content on their platform ScienceDirect is now available for text and datamining via a proprietary API. He discusses how it is potentially a positive step, in that it will enable their content to be mined, but also a huge missed opportunity because they have imposed a non-commercial restriction on the mining. Researchers are likely to have the least problems, he says, but the restriction will significantly hinder all the activities of small and medium enterprise businesses throughout Europe (exactly the sort of economic innovation that governments wanted to promote via their open access policies btw …).

In fact, it will also be a problem for researchers, not least because institutions will have to decide what is a commercial re-use when their researchers start mining. Peter Murray-Rust lays out the problems in a more detailed appraisal of Elsevier’s terms and conditions (Why researchers and libraries should think very carefully and then not sign (1), Can they force us to copyright data? (2) and The small print absolutely prevents responsible science). He also raises the issue of the role of legacy publishers more generally after Richard van Noorden posted a largely positive article about Elsevier’s announcement in Nature.  Peter concludes that “Nature has a vested interest in seeing this happen. For whatever reasons it supports the STM publishers in their intention to offer licences for content mining. Note that this is not the result of a negotiation – it is a unilateral move by the publishers.”

And as Ross Mounce also noted at the end of the Nature article “Our plan is just to wait for the copyright exemption to come into law in the United Kingdom so we can do our own content-mining our own way, on our own platform, with our own tools” … “Our project plans to mine Elsevier’s content, but we neither want nor need the restricted service they are announcing here.”

CC-BY dominates under the Creative Commons licensed journals in the Directory of Open Access Journals (DOAJ)

12 Feb: Christian Heise, who is doing a PhD on open science, has looked at the distribution of creative commons licences used by journals listed on the DOAJ. “The good news is that 3,772 of these Journals (almost 38 %) use a Creative Commons license. The bad news: th[at] most of the publications listed in the DOAJ are still not “Open”.” Only 20% use a CC BY licence (and 0.5% use a CC-BY-SA), with the remainder restricting either non-commercial reuse or any derivative re-use (NC and ND respectively). Rupert Gatti (founder of Open Book Publishers) poses a good question in the comments – if 62% aren’t using a CC licence, what are they using? Unfortunately the DOAJ doesn’t collect information on other licences. It would also be interesting to look at the number of articles that use CC BY. Because CC BY is used exclusively by many pure open access publishers and journals, the % of CC BY content among articles is likely to be much higher than that among journals.

Instructions from Mike Taylor: Stop what you’re doing and go read Cameron Neylon’s blog

09 Feb: On his personal blog, Science in the Open, Cameron highlights the absurdity of a new initiative by traditional publishers that has been endorsed by the UK Govt. As part of the FINCH negotiations around Open Access, several major publishers including Elsevier, Wiley-Blackwell, Springer, Taylor & Francis and Nature Publishing Group proposed to set up a scheme to let public libraries (the kind you walk into) host their content so that members of the public could access it for free. At the time, Mike Taylor and others pointed out the limitations of the proposal but last week ‘Access to Research’ was launched with a flurry by David Willetts at Lewisham public library (in London) ,  with additional cover at Times Higher and on the BBC. In a five minute recording, Willets acclaims the initiative as a really imaginative offering. So what is on offer? On their about page you find out that  1.5 million academic articles are available but not every article in every journal (no reason given). And if you look at the terms and conditions you can’t copy, distribute, forward or store any of the content.

With the use of a wet Saturday afternoon and some judicious text mining of the locations of the public libraries in the scheme, Cameron simultaneously provides a map of access (currently limited to not much more than Kent) and a demonstration of why such a scheme actually reflects a woeful lack of imagination.  “What I have done here is Text Mining. Something these publishers claim to support, but only under their conditions and licenses. Conditions that make it effectively impossible to do anything useful. However I’ve done this without permission, without registration, and without getting a specific license to do so. All of this would be impossible if this were research that I had accessed through the scheme or if I had agreed to the conditions that legacy publishers would like to lay down for us to carry out Content Mining.”  As Mike Taylor points out in the comments below Cameron’s post, the only legitimate way of saving any of the research in the scheme is to copy it out by hand.

For more reaction and a look at the PR tussle around the project there is  a storify by IOP publishing with tweets from Elsevier and Wiley promoting it and others taking a hard look at want is really on offer (see also the announcement from Elsevier below about their new text-mining initiative…). And of course, no-one has mentioned that the UK Govt is also in the process of closing down many walk-in libraries.

Should all correction notices be open access?

04 Feb: Retraction Watch is conducting a poll as to whether corrections to an article should be open access after blogger See Arr Oh hit the American Chemical Societies’ paywall when he tried to access one. Yup, $35 to access someone’s mistake. As they note, COPE recommends that all retraction notices be open access, but doesn’t have a stance on whether corrections should be. At time of writing 82.81% say yes. Take the poll…

EDP Sciences announces the launch of EDP Open

04 Feb: This is interesting because it also reflects a growing shift to OA by Society Publishers. EDP Sciences is located in Paris and London and is a subsidiary of several learned societies. It publishes more than 55 journals across the sciences in addition to magazines and conference proceedings. Some of their journals have converted to Open Access while others have hybrid options but they don’t have a coordinated OA service. EDP OPEN is a platform for all their Open Access articles regardless of journal and a means to enable their closed journals to convert to OA. As Jean-Marc Quilbé, President of EDP Sciences notes in the press release: “EDP Open provides access to more than 70,000 Open Access scientific articles from across all EDP Sciences’ journals and also hundreds of conference proceedings. EDP Open has also launched two new Gold Open Access journals.” It’s a great initiative but with one downside – the articles are published under a range of different licences. Not all are CC BY and the publisher retains copyright in some cases and in others articles are just free to read. This will be problematic for those wanting to reuse the content and confusing for those who are unsure what Open Access is – the site really needs the ‘HowOpenIsIt’ guide on their front page (or at least on their licence page).

Johnson & Johnson Will Share Clinical Trial Data

30 Jan: Another major victory for the ALLTrials campaign, this time discussed by Larry Husten on Forbes. See also the comment from Ben Goldacre at the end: “We are seeing the beginning of the end of a dark era in medicine. Several individual companies are breaking ranks, showing leadership, and doing what PhRMA said was impossible just a year ago.”

Masters in Publishing: An Interactive online discussion with Elizabeth Marincola

28 Jan: For those who missed it on the 23rd, Copyright Clearance Center and ALPSP recorded the online session of Elizabeth (PLOS CEO) in discussion with Crispin Taylor, Executive Director of the American Society of Plant Biologists.

Open Problems in Mathematics, a new open access journal

27 Jan: Post about a new mathematics journal with a twist: “It’s a very low-stakes journal, presumably because the editors want to encourage people to actually write things for it, and nothing published there should be new work, so it won’t be controversial. Thus, the barrier to entry is low: in order for a paper to be published, the only requirements are that submissions can be at most four pages long, and they must be “sponsored” by a mathematician who’s an expert in the field being discussed. For the sake of a definition, they’ve said a sponsor must be someone who’s had their work cited more than 100 times,”. One needs to remember that 4 pages can be long in Maths – even their PhD theses can surprisingly short.

Taylor & Francis extend green Open access zero embargo pilot scheme for Library & Information science authors until end 2014

24 Jan: In one of the first pieces of real evidence about the impact of embargoes by a commercial publisher, T&F (under the auspices of their Routeledge imprint) have decided to extend their two year pilot scheme to let authors deposit articles in repositories immediately. Why? Because it is good for business. The scheme only applies to authors of their Library & Information Science journal  but the results are exactly the opposite of the doom-laden predictions normally perpetuated about embargoes. In this case: “The implementation of the author rights pilot saw the number of respondents who would recommend Routledge as a publishing outlet increase by 34% while the average willingness to publish with Routledge on a scale of 1 to 10 increased from 6.6 to 8.3. ” Isn’t evidence refreshing.

And a new blog and video…

February 13: And not to forget we released a video on the Impact of Open Access from the Accelerating Science Awards Program in our first blog post on this blog!

Category: Open Access, Open Access policy, Uncategorized | Leave a comment

Welcome to PLOS Opens…

Bookmark and Share

Welcome to a new blog from PLOS focussed on how scholarly communications is changing, and how it should be changing. The big announcements will still be on the official PLOS blog and but here we will be regularly covering policy, evidence, and opinion of how our world is changing.

Open Access will be at the center of what we discuss, but we chose deliberately not to have ‘Open Access’ in the name. The successful implementation of full Open Access is a necessary, but not sufficient condition of realizing the potential that the web brings to research communication.

We need proper (and appropriate!) sharing of research data and materials, we need to more effectively share our methods and processes, we need continuous review systems that provide the quality assurance that makes these useful. We need new infrastructures to make this possible, improved funding instruments and incentives that support an effective research community, and new means of monitoring and tracking research and how and where it is used.

So this blog will be about the “Opens”: open source, open data, open standards, open review and more. “Opens” as a noun if you like. But it will also be about “opens” as a verb. A discussion of what needs to be done to take advantage of the potential of the web. At the center of this will be Open Access as the critical step we are now negotiating. News, views and critical analysis alongside guest posts from the wider community. But always with an eye to the future; a view of how the world could be if we choose to make it.

William Gibson said “The future is already here – it’s just not very evenly distributed.” Like all the best cliches there is a nugget of real truth here, an admonition to open our eyes, to truly look for and to see the way the world is changing around us, rather than to filter everything through our existing world view. What we aim to do here is to tell the stories, critically analyse the evidence and to use these to suggest a path forwards.

And what better way to kick this off than with this video of Open Access advocates, funders, publishers, and above all those using the research to advance understanding, health and education in the wider world.

Accelerating Impact: View exceptional real-world applications of Open Access research. Video features six teams of scientists whose innovative reuse of existing research enabled important advances in medical treatment and detection, ecology and science education. These examples demonstrate how the reuse of Open Access research can accelerate scientific progress and benefit society as a whole. Includes comments from Open Access advocates from publishing, academia and industry and features finalists, winners and sponsors from the Accelerating Science Awards Program (ASAP)

Category: Open Access | 1 Comment