PLOS Response to the HEFCE RFI on Metrics in Research Assessment

Bookmark and Share

The Higher Education Funding Council for England, the body that manages the UK’s Research Excellence Framework recently announced an enquiry on the use of metrics in research assessment. HEFCE’s views on research assessment matter a great deal to UK Universities because the REF distributes a substantial proportion of the UK’s research funding as block grants on the basis of that assessment. As part of this process the enquiry committee issued a call for evidence. The covering letter and summary of the PLOS submission are provided below, you can find the full PLOS RFI response at Figshare.

Dear Committee Members

Thank you for the opportunity to respond to your call for evidence. PLOS has been at the forefront of experimenting with and advocating for new modes of research assessment for a decade. Recent developments such as DORA and your own enquiry suggest that the time is appropriate for a substantial consideration of our approaches and tools for research assessment.

Our ability to track the use of research through online interactions has increased at an unprecedented rate providing new forms of data that might be used to inform resource allocation. At the same time the research community has profound misgivings, as demonstrated by submissions to your enquiry by e.g. David Colquhoun or by Meera Sabaratnam and Paul Kirby with the “metrication” of research evaluation (although see also a response by Steve Fuller. These disparate strands do not however need to be in direct opposition.

As a research community we are experienced with working with imperfect and limited evidence. Neither extreme uncritical adoption of data, nor wholesale rejection of potentially useful evidence should be countenanced. Rather we should use all the critical faculties that we bring to research itself to gather and critique evidence that is relevant to the question at hand. We would argue that determining the usefulness of any given indicator or proxy, whether qualitative or quantitative depends on the question or decision at hand.

In establishing the value of any given indicator or proxy for assisting in answering a specific question we should therefore bring a critical scholarly perspective to the quality of data, the appropriateness of any analysis framework or model as well as to how the question is framed. Such considerations may draw on approaches from the quantitative sciences, social sciences or the humanities or ideally a combination of all of them. And in doing so they must adhere to scholarly standards of transparency and data availability.

In summary, therefore, we will argue in answers to the questions you pose that there are many new (and old) sources of data that will be valuable in providing quantitative and qualitative evidence in supporting evaluative and resource allocation decisions associated with research assessment. The application of this data and its analysis to date has been both naive and limited by issues of access to underlying data and proprietary control. To enable a rich critical analysis requires that we work to ensure that data is openly available, that its analysis is transparent and reproducible, and that its production and use is subject to full scholarly critique.

Yours truly,
Cameron Neylon
Advocacy Director
PLOS

Summary of Submission

  1. The increasing availability of data on the use and impact of research outputs as a result of the movement of scholarship online offers an unprecedented opportunity to support evidence-based decision-making in research resource allocation decisions.
  2. The use of quantitative or metrics-based assessment across the whole research enterprise (e.g. in a future REF) is premature, because both our access to data and our understanding of its quality and the tools for its analysis are limited. In addition, it is unclear whether any unique quality of research influence or impact is sufficiently general to be measured.
  3. To support the improvement of data quality, sophisticated and appropriate analysis and scholarly critique of the analysis and application of data, it is crucial that the underlying usage data used to support decision making be open.
  4. To gain acceptance of the use of this evidence in resource allocation decisions, it is crucial that the various stakeholder communities be engaged in a discussion of the quality, analysis and application of such data. Such a discussion must be underpinned by transparent approaches and systems that support the community engagement that will lead to trust.
  5. HEFCE should take a global leadership position in supporting the creation of a future data and analysis environment in which a wide range of indicators acting as proxies for many diverse forms of research impact (in its broadest sense) are openly available for community analysis, use and critique. HEFCE is well placed, alongside other key stakeholders to support pilots and community development towards trusted community observatories of the research enterprise.
Category: RFI | Tagged , , , , , | Leave a comment

From Open Buttons to OpenCon – Building a student community

Bookmark and Share

This is a guest post from Joe McArthur, one of the founders of the OA Button project and newly appointed Assistant Director of the Right to Research Coalition.

Seven months ago, after little sleep I boarded a plane to Berlin to attend a conference and launch a project I’d been working tirelessly on for five months. That project was the Open Access Button, a browser plug-in which visualises when paywalls stop people reading research. Since the launch, which was covered in the Guardian, Scientific American and got the attention of EU science ministers the project has continued to progress. As the co-founder normally I’d now go on to talk all about it.  Today is different though, I’m going to briefly tell the story of the conferences which launched, grew and gave birth to the Button and why we, as a community should support a new one, OpenCon 2014, which will do the same for many other ideas.

The Berlin 11 Satellite Conference for Students and Early Stage Researchers conference, which brought together more than 70 participants from 35 countries (and was webcast to many more around the world) to engage on Open Access was the stage for the Button’s launch. We launched the Button on stage with a timed-social media push (Thunderclap) which reached over 800,000 people. Without this platform we’d have never been able to obtain the level of publicity or move the project forward at the pace we have since.

The story of instrumental conferences goes back further though. Months before our launch we met with organisational leaders from across the globe at the Right to Research Coalition general assembly. This was the first time we truly were able to talk about the Button with our peers. We sought feedback, buyin and help moving the project forwards – all of which we got in spades. An afternoon training session then used the Button as a case study and the ideas from student leaders all then fed into what we did.

The final conference worth highlighting, is the one where it all began. While attending a conference of the International Federation of Medical Students I and my co-founder (David Carroll) got talking to Nick Shockey, Director of the Right to Research Coalition. Prior to that conversation, David and I knew no alternative to the system of publishing that frustrated us both. After it, well, the Open Access Button was born.

These three events provided us with a launching venue, a place to develop our ideas, raised our awareness and inspired us to act. In-between each is hundreds of hours of work, but these were each transformative points in our journey. We’re not alone in this experience though, at each event we were just one of many projects doing the same. I’m now, along with a student team from across the global working to make a conference which will do this for many others.

OpenCon 2014: is a unique Student and Early Career Researcher Conference on Open Access, Open Education and Open Data. On November 15-17 in Washington, DC, the event will bring together attendees from across the world to learn, develop critical skills, and return home ready to catalyze action toward a more open system for sharing the world’s information — from scholarly research, to educational materials, to digital data.

OpenCon 2014’s three day program will begin with two days of conference-style keynotes, panels, and interactive workshops, from leaders in the Open Access, Open Education and Open Data movements and participants who have led successful projects. The final day will be a half-day of advocacy training followed by the opportunity for in-person meetings with relevant policymakers, ranging from members of the U.S. Congress to representatives from national embassies and NGOs. Participants will arrive with plans of action or projects they’d like to take forwards and leave with a deeper understanding of the conference’s three issue areas, stronger skills in organizing projects, and connections with policymakers and prominent leaders.

Plans this ambitious though come with a price tag. To help support the travel of students from across the globe, feed them and provide them with the vital lifeblood of conferences (coffee) and put on the best conference possible we need the support of the Open Access, Open Education and Open Data movements. There are a huge variety of sponsorship opportunities, each with it’s own unique benefits which can be found here, but equally we appreciate to help of anyone in draw attention to the event or of course attending.

Author:
Joe McArthur
Assistant Director at the Right to Research Coalition
Co-founder of the Open Access Button
Joe@righttoresearch.org
@mcarthur_joe

The content of guest posts is always the view of the authors and not the position of the PLOS Opens Blog or PLOS.

Category: Guest Posts, News | Tagged , , , | Leave a comment

Opens Roundup (May)

Bookmark and Share

To help navigate the content in this issue of the roundup, here’s an index of the topics covered with links to the items below:

POLICY DEVELOPMENTS:

AND IN OTHER NEWS…

and thanks to Adrian Aldcroft and Allison Hawxhurst for tips and links.

POLICY DEVELOPMENTS

US: House Committee Amends FIRST ACT to reduce embargo length

May 22: The US FIRST ACT (which we discussed previously on PLOS OPENs) has been amended to reduce the embargo period for articles from 24 to 12 months. This is definitely an improvement over the draconian embargo periods the act initially stipulated (up to three years in some cases) but still falls short. SPARC (main link),  PLOS and EFF all support the stronger Open Access Language ( e.g. around reuse) in the White House Directive and the bipartisan, bicameral Fair Access to Science and Technology Research (FASTR) act. [Back]

 

UK: Connecting knowledge to power: the future of digital democracy in the UK

May 22:Wikimedia UK and Demos are encouraging participation in an attempt to crowdsource a submission to a call for evidence on digital democracy from the Speaker of the House of Commons.” What is digital democracy you might ask – see this article from Wired for context. [Back]

Mexico: Open access and repository legislation in Mexico

May 21: More landmark legislation (text in Spanish) in South America that mandates all research funded by the Mexican Government to be deposited into Open Access Repositories. This puts it in line with the national mandates of Peru and Argentina. [Back]

Chinese agencies announce open-access policies

May 19 : Two of the major Chinese research funders (the National Science Foundation and the National Science Library of the Chinese Academy of Sciences) have mandated that all their researchers deposit their papers into online repositories and make them publicly available within 12 months of publication. While largely a repository-based legislation, funds will also be made available to grantees to cover Article Processing Charges to make articles immediately available. Richard van Noorden outlines the implications of this important legislation in Nature (main link). [Back]

Europe: European Research Council signs deal with Elsevier

May: Elsevier has agreed to make all the ERC funded papers they publish immediately available in return for an Article Processing Fee (APC), paid for by the council. Unfortunately, however, the policy on the Elsevier websites stipulates that articles will only be deposited in a repository (Europe PubMedCentral) or made available to reuse with an attribution-only licence ( CC BY) if requested by the author. If the authors don’t make the requests, the articles will be archived within Elsevier’s own portal, Science Direct, under more restrictive licences that prohibit some or all reuse. [Back]

Global: Major International Associations Underscore Their Support for Immediate Open Access to Research Articles

May 14:  LIBER (the Association of European Research Libraries), COAR (Confederation of Open Access Repositories) and others including, interestingly, the National Science Library of the Chinese Academy of Sciences have co-signed a statement that makes a commitment to reduce and eliminate embargo periods: “We consider the use of embargo periods as an acceptable transitional mechanism to help facilitate a wholesale shift towards Open Access. However, embargo periods dilute the benefits of open access policies and we believe that, if they are adopted, they should be no more than 6 months for the life and physical sciences, 12 months for social sciences and humanities.  We further believe that mechanisms for reducing – or eliminating – embargo periods should be included in any Open Access policy.” [Back]

AND IN OTHER NEWS…

His life is Open Access and Open Data: meet Mark Thorley of RCUK

June 02: Wiley’s Fiona Murphy interviews Mark Thorley from the UK’s Natural Environment Research Council (and chair of RCUK Research Outputs Network) about the shift to open access and the future of publishing. [Back]

USF Professor creates OA textbook for students

May 30: A University of South Florida Professor has created an open access textbook for the social sciences under the remit of the University’s textbook affordability project. [Back]

Dinosaurs go Open Access

May 30: Andy Farke, an Academic Editor for PLOS ONE shows that 42% of new Dinosaur species in 2013 were described in free-to-read journals & almost half of these were fully Open Access in PLOS ONE. [Back]

Frederick Friend, 1941-2014

May 30: Very sad news about the death of Fred Friend, a librarian and leading proponent of Open Access. Fred provided a huge amount of support and encouragement to the UK wing of PLOS in 2003 (Mark Patterson and me) when PLOS first got off the ground, helping to navigate some of the policy landscape. For more on Fred’s background and views, see the 2013  in-depth interview with Richard Poynder. [Back]

Copyright Clearance Center Launches RightsLink for Open Access

May  28: The copyright clearance centre now offers a service to handle processing fees (APCs) for open access articles between publishers and authors at institutions. The service will be integrated with Aries Systems’ Editorial Manager. Such 3rd party intermediaries are beginning to compete for the role of managing APCs. [Back]

Watchmen creator Alan Moore announces open-access indie comics app

May 28: A bit of an aside but is OA publishing spreading to mainstream comics (although Open Access to the scholarship about comics isn’t new – see The Comics Grid[Back]

EDP Open survey reveals Learned society attitudes towards Open Access

May 27: Depressing survey of 33 unnamed learned societies conducted by Publisher EDP about what societies think of open access – focusing only on what they might lose and the need to retain their existing revenue rather than the new opportunities that Open Access offers their members and the benefits to science and the wider public of opening up research. But, as Peter Suber notes, the survey doesn’t cite the Societies and Open Access Research (SOAR) project which is cataloguing the many societies that are actively publishing new OA journals. “..SOAR identifies 868 societies publishing 827 full (non-hybrid) OA journals. It names the societies, names the journals, and provides links to facilitate confirmation. Read the EDP report to see what 33 society publishers said about OA in a survey. But consult the SOAR catalog to see how 868 society publishers have already embraced OA in practice.” [Back]

The Dawn of Open Access to Phylogenetic Data

23 May: Preprint paper by Andrew F. Magee, Michael R. May, Brian R. Moore showing that ~60% of phylogenetic studies are effectively lost to science because the data are unavailable. These results support the conclusions of related studies (e.g. in PLOS Biology) but interestingly, they also show that “data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students.” [Back]

Institute of Physics launches ‘offsetting’ scheme to cut cost of open access

23 May: IOP Publishing has introduced a scheme whereby authors publishing in their hybrid journals can offset the cost of Article Processing Charges with their libraries subscription cost of the journal. The scheme has a ‘sliding scale’ such that an increasing proportion of savings will be passed onto subscribers as OA uptake increases. The first initiative of its kind, it came about after discussion between  IOP, Research Libraries UK (RLUK) and the UK’s Russell Group of universities (the 24 leading research institutes). [Back]

Royal Society Open Science now open

22 May: Over on the Guardian blog, Grrrlscientist plugs the new interdisciplinary open access journal launched by the Royal Society. This is not only another endorsement of open access by a significant learned society but also an endorsement of the PLOS ONE model it explicitly follows (e.g. ensuring that there is a place to publish negative results and that peer review is not based on subjective measures of importance or impact). [Back]

Growth of Fully OA Journals Using a CC-BY License

May 21: OASPA has released the number of articles published by some of its members that are fully CC BY and within fully OA journals. A total of 399,854 articles were published from 2000-20103, with 120,972 published in 2013 alone.  The growth of hybrid publishing and the fact that some of the larger members (e.g. Wiley) did not release their data means this is an underestimate. [Back]

Wellcome Trust – funding opportunities for (Open Access) researchers based outside of the UK

May 20: Two less well known facts about the Wellcome Trust pointed out by Witold Kieńć at OpenScience are 1) that they fund research not just in the sciences but also in the medical humanities and 2) that funds are available to researchers outside the UK (in North Africa, parts of Asia and Eastern Europe). But like their UK grantees, all the research outputs must be made Open Access either through a repository or via an OA journal. [Back]

The cost of scientific publishing: update and call for action

May 16: Stimulated by Tim Gower’s post (also covered by PLOS Opens), Tom Olijhoek from the Open Access Working Group summarises the flurry of activity and discussions around the cost of publishing on the open access list of the Open Knowledge Foundation. His aim is to encourage others to help fill out a google spreadsheet tracking the cost of publishing of many different publishers. He covers fair use – or the lack of it – for educational purposes, the cost of subscription versus OA publishing, and open Data. If you get hold of relevant data you can add it into a spreadsheet on GoogleDocs or report it on their WIKI. [Back]

A new model for Open Access in the Humanities and Social Sciences?

May 15: Jean Harris from UCL flags a white paper published in April by Rebecca Kennison (who was PLOS employee No. 1 way back then!) and Lisa Norberg. Disclaimer: I haven’t read the 69 page report in full. Quoting from the report, Harris notes that the model “encourages partnerships among scholarly societies, research libraries, and other institutional partners (e.g.,collaborative e-archives and university presses) who share a common mission to support the creation and distribution of research and scholarship to improve society and to help solve the world’s most challenging problems”. Carl Straumsheim provides much more background on the report at Inside Higher Ed. [Back]

Global-level data sets may be more highly cited than most journal articles

May 15: Chris Belter on the LSE blog about his analysis (in PLOS ONE) of the impact of data sets curated by the US National Oceanographic Data Center, showing that the data sets are cited more often than most journal articles. “One data set in particular, the World Ocean Atlas and World Ocean Database, has been cited or referenced in over 8,500 journal articles since it was first released in 1982. To put that into perspective, this data set has a citation count over six times higher than any single journal article in oceanography from 1982 to the present.” [Back]

The Embargoes Don’t Work: The British Academy provides the best evidence yet

May 14: For those that missed it, Cameron pulls apart the evidence from the British Academy report on Open Access to come to the opposite conclusion from the report itself. “To my mind the British Academy report shows pretty strongly that embargoes don’t help. They offer essentially no business benefit while reducing the focus on the one thing that matters – creating the value that should differentiate scholarly communications from just slapping something up on the web.” See also the response to the BA report on Open Access by HEFCE’s Ben Johnston (HEFCE commissioned the report): “the very idea of the open academy challenges the assumptions and motivations of some scholars, and open access is perhaps resisted so vociferously precisely because it is seen as disruptive to these. In my view, academics must move beyond this resistance: they have so much to gain from greater openness, and so much to lose by staying closed off from the world.” [Back]

Category: APCs, News, Open Access, Open Access policy | Tagged , , , , | Leave a comment

The Embargoes Don’t Work: The British Academy provides the best evidence yet

Bookmark and Share

The debates today on implementing Open Access pivot around two key points. The first is the perceived cost of a transition to a fully funded Open Access publishing environment. The second is the question of delaying access to copies of research outputs made available through repositories – how long should embargoes be? In both cases the question of what data and evidence is reliable has been contentious but information is accumulating.

Given the obsession with embargoes from everyone interested in Open Access it would be easy to think that they logically sit at the core of the debate. Traditional publishers and their lobbyists spend a lot of time and money lobbying for policy statements and legislation that include longer embargoes. And in the interest of full disclosure PLOS and others spend quite a bit of time (although not so much money) advocating for shorter, preferably zero, embargoes. They must surely be important!

But in actual fact they are the residue of a messy compromise. The assumption that lies at the heart of the embargo argument was that if some version of a peer reviewed article is made available through a repository then a subscription publisher needs some period of exclusivity to recover their costs. The evidence that this is the best way to protect publishers while widening access is thin to say the least.

The basic assumption is that having an author manuscript freely available online poses a risk to the subscription business of the publisher. This is a matter of  controversy and I am on the record (and oral comments) as saying that I see substantial evidence that no damage is done and no credible evidence of any damage – a conclusion endorsed by a UK Parliamentary report. But it is the flip side of the argument that is perhaps more important. That is the idea that embargoes help traditional publishers maintain their business while sustainably allowing wider access through repositories.

When the NIH Public Access Policy was implemented the idea of a 12 month embargo on access to the content in Pubmed Central was a compromise. The figure of 12 months wasn’t really evidence based but was seen as ‘safe’. But it was always a messy compromise. For advocates of public access it was a high water mark, a figure to be reduced over time, eventually to zero. For traditional publishers the narrative was built of “well its ok for biomedicine – it’s a fast moving field”. You can draw a straight line from that narrative to the bizarre situation we have today in which many access policies allow embargoes to differ from discipline to discipline.

The latest salvo in this debate comes from a report commissioned by the UK’s HEFCE from the British Academy, the UK’s National Academy body for the Humanities and Social Sciences scholars. The first thing to say is that the report brings a valuable set of information on the landscape of scholarly publishing in H&SS to the table – in particular the data reported in Chapters 2 and 3 are going to be very helpful. In this post I want to focus on the conclusions the report comes to on embargoes and the work reported in Chapters 4 and 5.

The report sought to provide new data on potential risks to the sustainability of the H&SS journals business. This data comes in two forms, the first (Chapter 4) is an analysis of the online usage patterns of a set of H&SS journals – which is intended to inform a discussion of the business risks to  journals of their articles appearing in repositories. The second (Chapter 5) is survey data on UK collections librarians asking what the most important factors are in cancelling journal subscriptions.

The usage data is problematic for two reasons. Firstly while much data has been presented on the way article usage declines over time, with claims that it is relevant to discussions of embargoes, no link has ever been established between usage patterns, free online access and damage to the business of a journal. Second, there are some serious issues with the maths at the heart of the analysis. For example while the mean half lives (putting aside a curve shape for which the term ‘half life’ is meaningless) differ between disciplines the 95% confidence intervals are huge (Table 5). No testing of whether these differences are statistically significant is reported but a line in the report suggests using this data to support the report’s conclusion is untenable: ‘…the main differences are within disciplines, not between them’ [p59]. I will dissect those issues in detail at a later date but for the purpose of this discussion I’m going to take the usage conclusions of the report at face value. On the basis of reported mean half lives the report states that H&SS, along with physics and maths, form one cluster with biomedical sciences being different.

The survey data from Chapter 5 is also useful.  While all surveys have their limitations the results confirm the anecdotal evidence that librarians regard availability of non-version of record article copies and the timing of availability as amongst the least important information in deciding on journal cancellations. Usage by their institution, the requirements of research staff to access the version of record and cost are rated as more important. As an aside there is an oft-referred to ALPSP survey that is cited as coming to the opposite conclusion to the BA results. That survey was deeply flawed (see my Commons testimony for details) but even putting aside those issues it asked a different question: essentially ‘all other things being equal…would shorter embargos lead to cancellations’. The BA survey shows that in the real world, where things are never equal, that this is an unimportant factor.

The conclusion I draw from these two sets of data is that there is no value in longer embargoes for H&SS – indeed that there is no need for embargoes at all. H&SS cluster with physics and maths, disciplines where substantial, and importantly concentrated, portions of the literature have been available prior to publication for over 20 years and where there is no evidence of a systemic failure in the running of sustainable publishing businesses. A recent report from the London Mathematical Society (LMS) stated that they saw no risk to their business from the availability of author manuscripts online and that online availability of author manuscripts had no significant effect on traffic to their journal sites (they note that the situation might be different if the version of record is online but no public access policy requires this).

This runs counter to the standard narrative to justify embargoes. Why do institutions continue to subscribe to journals when the ‘same’ content is available online for free? This would only be the case if factors others than online availability of manuscripts drove subscription decisions. This might be the case if other factors, such as overall cost, scholar demand or access to the version of record were more important factors. This is exactly what the survey data in the BA report shows supporting the view that short embargoes are not a risk to the sustainability of subscription journals in H&SS.

The report itself however comes to the opposite conclusion. It does this by creating a narrative of increasing risk based on potential loss, that things might change in the future, particularly if the degree of access rises, that the survey can only ask about the current environment and hypothetical decisions. We can, and no doubt will, continue to argue about the degree of this risk and what evidence can be brought to bear. For me decades of the ArXiv and Astronomy Data Service and seven years of mandated deposit to Pubmed Central with no evidence of linked subscription cancellations seem like strong evidence. Remember that the report states that physics and maths are similar to H&SS. But increasingly I’m feeling this whole argument is rather sterile.

The more productive approach is to ask question from a positive perspective. Do embargoes help a traditional publisher or journal to navigate the rapidly changing environment? If so what embargo is optimal? Many publishers are choosing to offer an APC based Open Access publishing offering. We know this works in many (well funded) STM disciplines. However we also know that in H&SS there is less free cash and a transition to a directly funded system will be more challenging. Different approaches may well be required. A real alternative for a publisher is to take a strong stance on the additional value they offer in the version of record by supporting author archiving.

If the final published product is easier to read, easier to find, better formatted or better integrated with a scholar’s workflow and they find the value add sufficient then they or the institution will be willing to pay for it. The survey data, the LMS report and in a different way the SCOAP3 program all show that where customers see added value created by a publisher over the manuscript they are willing to pay for it. Any publisher, if they want to survive, has to compete with the free circulation of scholarly documents – both ‘grey’ literature and the circulation of copies of the version of record. That competition just got a bit tougher with the advent of the internet.

Embargoes are an artificial monopoly created to make the competition a bit less fierce. But truly, if a publisher believes that they add value and wants to be competitive then why should they fear a Word doc sitting on the web? Indeed if they do it suggests a lack of confidence in the additional value that they offer in the version of record. The best way to give yourself that confidence is to be tough on yourself and take a good look at how and where you add value. And the best way to do that is to compete successfully with “free”.

The best way to stay competitive is to be prepared and ready to adapt when the competition gets tougher, not to collude on retaining monopolies that make that competition less intense. (As an aside I wonder from time to time whether the push from large publishers for longer embargoes is in part a strategy to make smaller publishers more complacent on this issue and easier to pick off as margins shrink and scale comes to matters even more than it already does)

No doubt we will continue to argue for some time yet whether short embargoes cause harm. But for those traditional publishers with a subscription business seeking to navigate our rapidly changing world I think it’s precisely the wrong question. Ask whether longer embargoes actually help your business over the short, medium and longer term. Really ask that question, because if the answer is yes, you have a problem with your own confidence in your own value creation.

To my mind the British Academy report shows pretty strongly that embargoes don’t help. They offer essentially no business benefit while reducing the focus on the one thing that matters – creating the value that should differentiate scholarly communications from just slapping something up on the web. Clearly the authors of the report differ.

But rather than just take my summary, or that of the report, at face value my suggestion would be to do what any scholar should do: read the report in full, critique the evidence and come to your own conclusion. What this report does bring is new data and in particular data focussed on the questions of managing a transition to wider access for the H&SS literature. We need more of this data and we need to focus all of our critical faculties, and all of our various disciplinary approaches, to understanding how to use that data to effectively plan and manage a transition to the Open Access future.

Category: Open Access, Open Access policy | 1 Comment

Opens Roundup (April)

Bookmark and Share

This issue sees policy moves by the EU commission, the UK, Nordic countries and the World Health Organisation. In other news, OASPA investigates different publishers, Wellcome releases data on their APC spend, Tim Gowers issues a freedom of information request to UK institutions about Elsevier, there’s a survey of megajournals, a confusion of licensing at Nature, a new ‘Journal Openesss Index’ (leading to the JOI Factor), encouragement from Stephen Curry and more…

With thanks for links and tips to Rosie Dickin, Matt Hodgkinson, Theo Bloom, Katie Fleeman, and Ginny Barbour

POLICY DEVELOPMENTS

UK flagUK: Using metrics to assess and manage research

May 1: “HEFCE has launched a call for evidence to gather views and evidence relating to the use of metrics in research assessment and management….The review of metrics will explore the current use of metrics for research assessment, consider their robustness across different disciplines, and assess their potential contribution to the development of research excellence and impact.” PLOS will be submitting a report.

UK: HEFCE Public Access policy is a shift in the landscape

April 1: Another chance to read the PLOS OPENs analysis of HEFCE’s policy announcement if you missed it but many others also commented on the significance of the policy, including Alma Swan, Mike TaylorNature, JISC, UKCorr (UK Council of Research Repositories) and the Wellcome Trust.

UK: GMC asked to call doctors to account for withholding trial information

April 2: Part of a new report from a UK House of commons enquiry stipulates that the General Medical council (which regulates doctors) should  make it clear to all registered doctors  that withholding trial results is misconduct and actionable. They also note that journals are no longer the barrier to accessing trial results. Ben Goldacre (doctor and author of Bad Pharma), Ginny Barbour (Editorial Director, Medicine, PLOS) and Fiona Godlee (Editor of the BMJ)  made a joint submission to the enquiry about the need to publish the outcomes of research and randomised controlled trials. This follows on from a UK Govt report last year on access to clinical trial information on access to clinical trial information and is yet another vindication of  the ALLTrials campaign coordinated by the campaign group Sense About Science that Ben Goldacre, the BMJ  and PLOS  have all played a key part in.

UK: Wellcome Trust: The cost of open access publishing: a progress report

March 28: In an unprecedented move, Wellcome released all the data on what they had paid to publishers for Article Processing Charges (APCs) from 2012-2013 in figshare. And in a fantastic demonstration of reuse, the data were posted to a gdoc, where they were cleaned up and enhanced (e.g. adding the licencing information) through crowdsourcing. This was coordinated by Michelle Brook from the Open Knowledge Foundation with help from a host of others including Cameron (Michelle provides a summary on her blog). Read more about the implications of this on PLOS Opens. See also Neil Jacob’s (JISC) post on indicator’s for a competitive market.

WHO image (public domain)WHO commits to open access by joining Europe PubMed Central

May 1: The World Health Organisation joins 25 other research funders at Europe PMC. And look out for their OA policy announcement on July 1st.

EU logoC4C’s initial reaction to the European Commission Impact Assessment on Copyright Review – Part 1 and Part 2

April 28 & 30:  Parts of the EU commission’s impact assessment of its proposed new copyright framework have been leaked.  The issues are complex, but Copyright for Creativity (C4C), a consortium of stakeholders concerned about copyright (including the Association of European Research Libraries and Research Libraries UK), have released their initial reactions to this. Although the report is to be finalised, it seems the commission has still not clarified that TDM does not fall under copyright provisions and that an exception allowing the copying of content for the purpose of text and data mining is necessary. They point to Cameron’s recent post about best practices in the area of TDM as an example for the EU to follow.

The draft impact assessment also goes against the recommendations of an expert group, commissioned by the EU and published on the 4th April, that there should be a TDM exception for scientific purposes. The group went further and stated that this should only be an interim measure and that there should be wholesale reform aimed at establishing “ a durable distinction in European law between copyright’s longstanding and legitimate role in protecting the rights of authors of ‘expressive’ works and copyright’s questionable role in the digital age of presenting a barrier to modern research techniques and so to the pursuit of new knowlede”.

Nordic Image innovationNordic publications made available for free

March 25:  All publications funded by the Nordic Council of Ministers (representing Denmark, Finland, Iceland, Norway, Sweden and the Faroe Islands, Greenland and Åland) will be made freely available from 1 June 2014. Although not required, the mandate recommends that a Creative Commons attribution licence is used (including the share-alike licence). The policy also stipulates  deposition in full-text form with the requested descriptive information (metadata) to the NCM repository held at Uppsala University. Another landmark policy.

AND IN OTHER NEWS

OASPA reinstates Sage membership (and investigates several others)

April 29: OASPA has been undergoing several investigations lately, which ended not only with SAGE’s reinstatement as a member but also gave MDPI the all clear after concerns had been raised about them. There are as yet, however, unanswered questions by Springer about why and how they accepted 16 nonsense papers. And in a defence of one of their members, OASPA has made public a letter the president Paul Peters (Hindawi) wrote to a bogus website that has hijacked the name and print ISSN number of the reputable journal Bothalia, published by AOSIS. Curiously, the bogus journal is also indexed by Thomson Reuters, who have not yet responded to enquiries by OASPA. (Disclaimer: I am a member of the OASPA board.)

Let’s shine a light on paywalls that deny open access to scientific research

April 29: Tania Browne on paywalls and the OA button in the guardian, with shoutouts for OA publishers like PLOS. “The ethos of these journals is that science is not a privilege but a right. It’s something for all of us.” Note also that the OA button have released version 2.0.

Elsevier Journals – some facts

April 24: Really insightful 10,000 word post by Tim Gowers on the results of his Freedom of Information request to the 24 Russell Group UK Universities about what they pay Elsevier for their ‘big-deal’ subscription package.  In addition to the recent PLOS Opens post about this, see also comments from Peter Murray Rust, Michelle Brook and Research Libraries UK (Phil Sykes and David Prosser). Required reading.

What is open access?

April 23: Eva Amsen from F1000Research gives a concise primer on Open Access, debunking the three most common myths  (Open Access just means free to read, Gold means ‘author pays’ and Open Access implies bad quality). Worth a read

A survey of authors publishing in four megajournals

April 22: David Solomon surveyed more than 2000 authors who recently published in the ‘megajournals’ BMJ Open, PeerJ, PLOS ONE and SAGE Open. It’s an interesting analysis that tries to determine who is publishing in these journals and why. The article is published in PeerJ and the anonymised data underlying the paper are available on figshare.

Creating an efficient workflow for publishing scholarly papers on Wikipedia

April 17: Martin Poulter on the LSE blog explains how Wikipedia can be populated with high quality content about research. He describes an initiative from PLOS Computational Biology and Daniel Mietchen, whereby PLOS Computational Biology commissions articles on specific topics that wikipedia don’t have covered well and in a format that is appropriate for Wikipedia. These peer-reviewed ‘Topic Pages’ can be copied straight over to Wikipedia (e.g. “the Topic Page on Circular permutation in proteinspublished in PLOS Computational Biology and visible in the relevant citation databases, can also be read on Wikipedia.”). Poulter discusses the changes the journal had to make to enable this (e.g. figures in Scalable Vector Graphics format) and how this initiative might be extended to other subject areas.

British Academy fears for humanities in open access world

April 17: A brief news item in Times Higher about a report published by the British Academy. We will come back to this in more detail as the PLOS Advocacy team feels there are real issues with the data, the arguments and approach. And if you want evidence about open access already working for the humanities, look no further than the OpenBook Publishers or the Open Library of the Humanities and the fantastic project from knowledge Unlatched that promotes collaboration between a network of libraries and publishers to make existing monographs Open Access. Ernest Priego also gave an inspiring talk about Open Access and ‘impact’ at the UKSG conference recently (see also an article based on his presentation).

Open Access – yes you can

April 20: A lovely call to arms from Stephen Curry about why it’s not so hard for researchers to get involved with Open Access advocacy if you know how and when to push, and those who do can have a huge and positive influence. He goes through a series of examples to demonstrate how easy it is to take part and manages to provide a lucid and authoritative take on recent issues that are relevant to many events in this round-up and the one last month, including posting of the Wellcome APC data, NPG licence concerns at Duke and tackling Elsevier (over access to his own papers). More required reading I think.

Journal Openness Index proposed by librarians

10 April: An article by librarians Micah Vandegrift and Chealsye Bowley in the wonderfully named journal In the Library with the Lead Pipe that proposes a new index of openness with which to rank journals. This is based on an analysis of 111 journals in library and information science (the data are available on figshare as well) and uses a simplified version of the open access spectrum (developed by SPARC, OASPA and PLOS) to create the “JOI factor “. This uses scoring based on three of the 6 categories in the spectrum (copyright, reuse rights and author posting rights) but does not factor in reader rights, automatic posting and machine readability (and the latter is also crucial for reuse). Although just a proof of concept at this stage, it’s an interesting initiative. Most revealing is the extent to which the ‘open access’ journals they evaluated had restrictions on reuse or very poorly defined reuse policies and how few of the journals traditionally ranked as top in their field were open at all. As they conclude “are these the journals we want on a top tier list, and what measure of openness will we define as acceptable for our prestigious journals?”

New Nature Journals to Launch

April 2: Tracy Vence in the Scientist briefly mentions Nature’s latest open access venture. They intend to launch a portfolio of OA journals later this year– under the name of Nature Partner journals – in partnership with institutions, foundations and academic societies. They will be charging $4000 for an APC Disappointingly, they are also offering authors the choice of more restrictive creative commons licences that prohibit commercial reuse or any derivative re-use. These more restrictive licences do not comply with the Budapest definition of open access. Such licences are also increasingly at odds with the requirements of funders such as RCUK, the EU and the Wellcome Trust.

Around the Web: April Fools’ Follies!

April 2: Over at Confessions of a Science Librarian, John Dupuis lists all the internet silliness that occurred in the publishing world on April Fools day….

Attacking academic values

27 March: Kevin Smith created a stir when he started digging into the license agreements that Nature Publishing Group ask authors to sign. It was prompted by Nature asking Duke University authors to obtain a waiver of their faculty’s open access policy when they publish in Nature. After digging deeper he realised that NPG were asking authors to also waive their moral rights when signing up to the NPG licence, which in principle could mean waiving the right to be credited for their work (attribution). NPG responded arguing that the Duke policy is too broad and that the moral rights waiver was required to enable retractions. With litigation, or at least its threat, depressingly becoming more common  in the scholarly communication space expect to see more issues arise where legal measures to protect the interests of authors, publishers and institutions run counter to the traditions of the scholarly community.

Managing Article Processing Charges

March 27: Useful essay on the complexities of managing APCS from Danny Kingsley at AOASG. This is part of their ongoing Payment for Publication Series.

DataCite, re3data.org, and Databib Announce Collaboration

March 25: Why is this good news? “”The aim of this merger is to reduce duplication of effort and to better serve the research community with a single, sustainable registry of research data repositories”

Category: News | Tagged , , , , | Leave a comment

Transparency: A bit of grin and bear it, a bit of come and share it…

Bookmark and Share

Over the past month, an unprecedented amount of data has been released that throws light on the flows of money in scholarly communication, both subscription and open access. While some of this information is depressing – there is so much wrong with the way the current system works – the very act of releasing the information is surprisingly heartening. These cracks in the publishing edifice are perhaps the first signal of a genuine shift towards price transparency. Transparency will not only throw light on the complexity of the system but will also be the means to foster real change and enable competition and market forces to act.

Opening up the big deal

On the subscription side, Tim Gowers has detailed the results of a Freedom of Information request (FOI) to the 24 Russell Group UK Universities about what they pay Elsevier for their ‘big-deal’ subscription package. An FOI is necessary because Elsevier has required institutions to sign a non-disclosure agreement (NDA). Needless to say, it is in the interests of Elsevier not to let on to anyone how much they are charging. In his insightful post, Gowers first outlines the background to his request (including the Elsevier Boycott, issues around double dipping and more recently Elsevier charging readers for their OA articles) before presenting a table of the results of the 19 Universities that agreed to release the information. This shows, as we might have expected, high variance in what different institutions pay for similar packages: “University College London pays over twice as much as King’s College London, and almost six times as much as Exeter”.

The real revelations, however, come from the correspondence he had with different institutions (especially their reasons for not giving Gowers the information), and with Alicia Wise from Elsevier and Lorraine Estelle from Jisc Collections, a UK Higher Education consortium (who are against NDAs). It turns out the system is far more complex than often assumed. Charging is largely based on ancient history – how much a given institution  paid for individual print subscriptions in the 1990s. (Note also an older analysis by UCL’s David Courant that Gowers links to, which shows how the usage of Elsevier journals is high for a few titles but tails off very rapidly, with more than 200 journals not being accessed at all by UCL in one year.) Price negotiations with Elsevier have focused on the percentage rise each year for electronic access to the historic package plus relatively small fees for their larger bundled collections. Ultimately “the system ensures that what universities pay now closely matches their historic spend” even though the universities today are largely gaining access to the same content.

Gowers’ correspondence also exposes the ‘tricksy’ relationships between Elsevier and different institutions that help maintain this status quo. Some negative responses from the Universities, for example, contained paragraphs matching almost word for word the same arguments for not complying with his request, which he suggests points to a template answer that Elsevier provided to the institutions.

Although Elsevier and the UK is singled out here (with reason), Gowers also includes a preview of what some US universities pay, with information provided by Paul Courant, Ted Bergstrom, Preston McAfee and Michael Williams. This will be part of an upcoming preprint and will include other major publishers as well, such as Wiley and Springer, but is likely to recount essentially the same story.

APC transparency

In a significant move on the Open Access side, the Wellcome Trust released article level data on what had been paid to publishers for Article Processing Charges (APCs) from 2012-2013 in figshare. In a fantastic demonstration of reuse, the data were then posted to a public  gdoc, where they were cleaned up and enhanced (e.g. adding the licencing information) through crowdsourcing. This was coordinated by Michelle Brook from the Open Knowledge Foundation with help from a host of others.

The subsequent analysis, summarised by the Trust, revealed that Elsevier received the lion’s share of the Wellcome’s APC spend and that hybrid journals charged the most for their APCs (which supports the Wellcome’s recent report suggesting the hybrid APC market is disfunctional). But the data also raised many concerns. Papers for which thousands of pounds had been paid were in some cases still behind a paywall or did not have the correct license.

Reasons to be cheerful

All these releases point to a trend for funders and research institutions to demand more transparent pricing for both subscriptions and APCs.  Gowers’ 10,000 word post goes into the nuances, complexities and publishing norms that have led the subscription business model to where it is today – it’s not just Elsevier. Gowers acknowledges that getting rid of such an entrenched monopolistic system will not be easy and that the differential pricing he has exposed is a symptom of the system, not the problem itself.

The release of APC information by funders relatively early in the evolution of this Open Access business model, however, is a signal that those footing the bill for APCs are trying to avoid the mistakes of the subscription system and ensure price transparency and competition. Following Wellcome Trust’s lead, Cambridge University, Queen’s University Belfast and the Austrian Science Fund also released data for 2013 (all links to figshare). In the past few days Wellcome released additional data from 2010-2012.

The increased transparency that these data releases bring are very welcome, but we will need more complete, and more consistent data to really understand what how the cash flows are changing at the system level. The data being released today require a great deal of further work to process and integrate – Cameron outlines some of the problems of collectively curating data in a separate post – and they only represent a small part of the whole, even the whole of the UK.  Regular, scheduled and consistent data releases will be needed if we are to manage the transition to Open Access effectively. The reporting of the first year of payments from RCUK institutional funds will be the next big step in this direction.

Ultimately as this data availability increases we can start to actively plan a shift from subscriptions to APCs. Many of the concerns raised about the move to Open Access publishing result from a lack of confidence about our ability to shift money from subscriptions to publication services.  Understanding how much institutions are spending is a first crucial step.

For publishers these data release focus the attention of our customers – funders, researchers and institutions – on the quality of service they are receiving. The data release from Cambridge University is particularly sobering. The final column, showing whether a ‘problem free OA publication’ was received for the thousands of pounds paid in APCs, shows an awful lot of red for traditional publishers (OUP appears to be an honourable exception). Those publishers offering only an OA option did much better as you might expect. BMC, PLOS or Copernicus would be unlikely to accidentally leave a paper behind  paywall. As the Wellcome post concluded “We expect every publisher who levies on open access fee to provide a first class service to our researchers and their institutions.”

Category: APCs, Open Access | Tagged , , , , , , , , | 4 Comments

The stick, the carrots and the bear trap

Bookmark and Share

UK HEFCE Public Access policy is a shift in the landscape

After a one-year consultation, the four UK higher education funding bodies (HEFCE) have issued the first national policy to explicitly link public access with research evaluation. Any research article or conference proceeding accepted after 1st April 2016 that does not comply with their public access policy will be inadmissible to the UK Research Excellence Framework (REF), the system in the UK by which all government funded higher education institutions are assessed. The policy is a significant landmark in the global transition to Open Access and, in parallel with the UK Research Funders (RCUK) OA policy, will make research from the UK the most publicly available in the world.

The UK’s Higher Education Funding Councils are direct funders of UK research institutions (HEIs). They distribute large sums of money to UK universities based on an assessment exercise that occurs roughly every seven years. The funding council money can be used as institutions see fit to support research and can make the difference between being in the black or the red. The funding is distributed largely on the basis of an assessment of four outputs submitted by each assessed researcher. It is these outputs that will need to be free to read, linking public access to research assessment at a national scale for the first time.

At the same time the Funding Councils are indirect funders from the perspective of researchers. Because they do not have a direct relationship with researchers they are more limited in their policy options. Within the limitations this imposes, this is therefore a good policy. Yes, there are loopholes and some of the provisions for text mining are disappointing – the minimum acceptable licence restricts any derivative reuse for example – but overall the policy provides a strong signal about the direction of travel. Moreover, it contains not just a hefty stick, but also some well-aimed carrots and a bear trap if things go off track.

Crucially, HEFCE’s aim is full compliance. There are no soft percentage targets here. They also note that 96% of submitted outputs to the 2014 REF could have complied with this policy if authors had deposited whenever they were able (Appendix B paragraph 54).

Here’s the basics:

  • The policy applies to all research articles and conference proceedings (with an International Standard Serial Number) that list a UK HEI in the address field, regardless of subject area. It doesn’t apply to data, monographs, book chapters or other reports that might have security or commercial implications.
  • The final peer-reviewed accepted article must be deposited on acceptance (with a 3 months grace period) in an institutional repository, a repository service shared between multiple institutions, or in a subject repository such as ArXiv or Pubmed Central.
  • Deposited material should be discoverable, and free to read and download, in perpetuity for anyone with an internet connection.
  • They don’t recommend any specific licence but note that outputs licensed under a Creative Commons Attribution Non-Commercial Non-Derivative (CC BY-NC-ND) licence would be compliant.
  • Embargoes on access to the deposited article are permitted – up to 12 months for STEM and 24 months for AHSS – but articles must still be discoverable from the repository during this period even if the full text is not accessible.
  • Note that deposition and access are treated separately in the policy: immediate deposition is mandatory but immediate access is not if there is an embargo (which they term ‘closed deposit’).
  • All articles published in a PLOS journal can be compliant through deposition of the final manuscript or the final published version in any appropriate repository. All PLOS articles are also deposited in a subject repository, Pubmed Central, as part of the publication process.

The problems and the loopholes

The problems with the policy are the embargo times permitted, the weakness of the licensing requirements, a lack of requirements for the ability to mine repositories and the set of exemptions. These exemptions mean that research outputs that are not available could still be submitted to the REF if 1) the output depends on third party material that is not legally accessible or 2) the embargo required by the publisher is longer than HEFCE has stipulated or 3), worse still, the publisher just doesn’t permit deposition in a repository.

These exemptions raise concerns that publishers will simply impose more stringent licensing conditions, restrict deposition and/or start seeing the embargo periods specified by HEFCE as targets, rather than limits. But this is also where the judicious use of sticks and carrots (and bear traps) makes the policy more effective than you might first assume.

Here’s the stick

The stick is aimed at institutions – any research output that does not meet HEFCE’s requirements or exemptions will be treated as non-compliant. “Non-compliant outputs will be given an unclassified score and will not be assessed in the REF”. That’s a major incentive to be compliant and it’s powerful because it is aimed at institutions, rather than at researchers. Each exemption will have to be individually justified and this will require time and effort. Institutions will have a strong incentive to reduce as far as possible the number of exemptions that they want to claim.

Here’s the carrots

The policy has two important carrots which create incentives to increase the type of research outputs deposited and to encourage more liberal licencing for reuse. Specifically, they state that any higher education institution that can demonstrate it has taken steps towards enabling open access for outputs above and beyond the basic requirements, or that can demonstrate that outputs are presented in a form that allows re-use of the work, including via text-mining, will be given ‘credit’ in the Research Environment component of the post-2014 REF.

It’s not clear exactly what that credit will be although they note in Appendix B (paragraph 34) that further details of this will be developed in the coming years as part of their planning work for the next REF. Nevertheless, it seems likely that it will provide a way for a department to increase its research rating. That’s a powerful incentive to be more open.

And here’s the bear trap

A significant implication of the policy is only made clear in Appendix B (paragraph 67). The policy aims to enable researchers to have the academic freedom to publish where they choose. And this is why HEFCE allowed exemptions to the policy. However, as noted above, they expect exemptions to be applied in only a small proportion of cases. And what happens next depends on how different ‘actors in the system’ behave. So watch out for the bear trap: “If it becomes apparent that our policy is having a neutral, or detrimental, effect on the UK’s journey towards open access, we will revisit this policy to reconsider these provisions.”

The take home message

This policy is a game changer. It will result in a substantial increase in the proportion of UK research that is free to read. The UK will take a strong lead compared to other countries in making research accessible. When we talk about Open Access at PLOS we mean access to read with generous re-use rights. In this light there are some disappointing aspects but these are primarily a result of the Funding Councils being indirect funders – their funding is not tied to specific projects that generate specific outputs.

It is precisely due to the indirect nature of this funding, and its importance in the UK system, that the reach of the policy is so great. Every paper published with a UK university affiliation will be subject to this policy. The disappointments we may have with the details therefore needs to be tempered with an appreciation of how substantial its effect will be. This policy represents an important step forward in the transition to full Open Access.

Category: Open Access, Open Access policy | Tagged , , , , , , , | 1 Comment

Opens Roundup (March 25)

Bookmark and Share

 

In this issue, the US FIRST act meets opposition on all fronts, David Wiley (champion of open education) on the 5Rs of openness, Wellcome releases its report on how to promote an effective market for APCs, PeerJ one year on, ten tips for navigating publishing ethics, the power of post-publication review, altmetrics for developing countries, the pros and cons of publishing in arXiv, a fantastic crowdsourcing project in the humanities, John Ioannidis on the look-out for a journal near you…., and more

With thanks to Allison Hawxhurst and Katie Fleeman for tips and links.

POLICY DEVELOPMENTS

US FlagUS: Why FIRST is a Trojan Horse

March 11: If you haven’t read PLOS’s reaction to the introduction of the FIRST Bill then here is another opportunity to read Cameron’s post (main link) as well as that of EFF published the same day and SPARC a couple of days before that. But it wasn’t just open access advocates objecting to the public access language in Section 303 (which could see embargo lengths increase to three years after publication). Science insider also lists the growing opposition coming from scholarly societies and various research and university groups noting that the overwhelming feeling is that the bill represents a missed opportunity for the US to maintain its scientific lead, in particular because of constraints on the amount of funding for NSF and the restrictions the bill imposes on the type of research it would fund. The bill, introduced by republicans, has been particularly divisive and at odds with the original  America Competes Act introduced by democrats. The ensuing markup of the FIRST Bill on the 12th was lively but an impassioned plea by Rep. Zoe Lofgren to strike the public access language in Section 303 was narrowly defeated by 9-8 along party lines. The bill still has a long way to go.

AND IN OTHER NEWS

Legal Dispute between a Professional Association and Press

March 16: On H Net, a listserv for the humanities, Phil Brown from H-Asia calls attention to an article in the Chronicle of Higher Education (available to subscribers only) discussing a dispute between the Social Science History Association and Duke University Press over control of the Association’s journal, Social Science History. Here’s the nub of what he says in full:

(After notifying Duke of its intent to solicit bids for a new publication contract):

In June 2012, not having gotten a bid from Duke, the association sent the press a letter saying it would be moving on and ending the contract; Duke disputed its right to do that. According to the association, Duke interprets that phrase “participation in the journal” to mean that the association has only two choices:  “continue with Duke as the sole publisher in perpetuity or give up its ownership of the journal altogether and allow Duke to continue to publish the journal,” even though the university “has no ownership or intellectual-property rights” in it. “

The full details of the suit are available.

Clarifying the 5th R

Image by Sean MacEntee (CC-BY 2.0)

Image by Sean MacEntee (CC-BY 2.0)

March 15: David Wiley, a proponent for open educational resources (OER), introduced the concept of the 4Rs as a way to think about openness in 2007. He described the four main activities enabled by open content as Reuse, Rework, Remix and Redistribute, which is what the creative commons attribution license permits. The 4Rs have been influential as a framework for educational resources  and what being open means more generally . On March the 4th he added a 5th R – Retain – to the mix. This is the right to make, own and control a copy of the work. This addition was in response to the fact that while many OER publishers permit access to their content they “do not make it easy to grab a copy of their OER that you can own and control forever”. “How many OER publishers enable access but go out of their way to frustrate copying? How many more OER publishers seem to have never given a second thought to users wanting to own copies, seeing no need to offer anything beyond access?”

Some (e.g. David Draper and Mike Caulfield) have subsequently questioned the need for a 5th R – surely being able to own and control a copy of the work is implicit in any licence that permits the 4Rs in the first place? Wiley agrees with this but believes that ownership per se has never been explicitly addressed in the discussion of open content (main link) and this has direct consequences: “If the right to Retain is a fundamental right, we should be building systems that specifically enable it. When you specifically add ‘Enable users to easily download copies of the OER in our system’ to your feature list, you make different kinds of design choices” – ones that most designers of OER systems have failed to think about.

The implications of the 5th R are fundamentally important as they change the context of what we’re doing – it’s no longer enough to state that others can reuse, remix or redistribute your work. It suggests that those involved in providing open content also need to help build or facilitate the infrastructure that enables the content to be easily reused by others. In other words, we have a responsibility to reduce the friction around the content we make open (e.g.  Cameron’s article in PLOS Biology about this). And this applies not just to libraries and institutions with an educational remit but to funders and publishers as well.

Metaphysicians: Sloppy researchers beware. A new institute has you in its sights

March 15: Article in the economist about a new institution at Stanford called the Meta-Research Innovation Center (Metrics) to be run by John Ioannidis (author of the famous ‘Why most published findings are false’ paper in PLOS Medicine, currently viewed almost 980,000 times). The new laboratory aims to find out whether attempts to reproduce studies actually work and to guide policy on how to improve the validation of research more generally. Among the initiatives is also “a ‘journal watch’ to monitor scientific publishers’ work and to shame laggards into better behaviour. And they will spread the message to policymakers, governments and other interested parties, “in an effort to stop them making decisions on the basis of flaky studies.”

PeerJ’s $99 open access model one year on

PeerJMarch 13: A brief review in Times higher of how PeerJ is doing. “PeerJ co-founder Jason Hoyt said that the journal had fulfilled its year-one aims of “staying alive”, starting to grow and “laying the groundwork to show this business model could be sustainable”. It is on track to be self-sustaining by early 2015, and there are no plans to raise prices. “If anything we would like to lower them if we can figure out some other revenue stream,” Dr Hoyt said.” PeerJ was also explicitly references in the Wellcome Trust report into and effective APC market by Björk and Solomon mentioned below. That report concludes that “PeerJ has published around 220 articles in the first nine months of operation. It is not clear at this point if the journal will scale up to a level that will make it financially sustainable but it offers an innovative funding model”.

Institutional repositories provide an ideal medium for scholars to move beyond the journal article

Academic Commons Use-per-item graph (CC BY 3.0)

Academic Commons Use-per-item graph (CC BY 3.0)

March 12: Leyla Williams, Kathryn Pope, and Brian Luna Lucero from Columbia University make the case for institutional repositories to collect all the work of their scholars rather than focusing on only peer reviewed journal articles or monographs. They discuss how their IR ‘Academic Commons’ hosts conference videos, presentations, and technical reports and other “grey literature” (as the figure shows). “IRs are crucial for authors whose work may not fit within the scope of any one scholarly journal.”, they note. “They are also vital for researchers with data that lies outside the parameters of disciplinary data repositories, for dissertation authors who want to make supplemental materials available, and for undergraduates.” They discuss examples where deposition in their repository has helped young researchers find a job and how the lively twitter feed around what’s deposited helps disseminate the work.

Top 10 tips for navigating ethical challenges in scholarly publishing

Plaigiarism By Brett jordan

Image by Brett Jordan (CC BY)

March 12: Jackie Jones, Executive Journals Editor at Wiley provides her top tips on publication ethics to coincide with the 2nd edition of Wiley’s “Best Practice Guidelines on Publishing Ethics: A Publisher’s Perspective”.  The guidelines (available online or as a PDF) are published under a Creative Commons Non-Commercial license and have drawn on a range of expertise including from COPE. They form a useful addition to the more in-depth guidelines and forums that COPE already provides and signal the increasing accountability that all reputable publishers have to ensure high ethical standards in scholarly publishing. In a subsequent post, Udo Schuklenk an editor of the journal Bioethics and Developing World Bioethics lists some of the infringements he’s seen as editor with respect to plagiarism:  “Over the years you begin to delude yourself into thinking that you have seen the full range of ethics infringements.  It’s particularly ironic, I guess, when you edit bioethics journals: you would hope that your authors would be clued in to publication ethics issues.” Unfortunately no journal can escape these issues.

Fostering a transparent and competitive market for open access publishing

Image by 401(K) 2013 (CC-BY-SA 2.0)

Image by 401(K) 2013 (CC-BY-SA 2.0)

March 12: A range of funders, led by the Wellcome Trust, released a report by Bo-Christer Björk of the Hanken School of Economics, Finland, and Professor David Solomon of Michigan State University on the structure and shaping of the APC market. It’s worth skimming the whole report as it has a lot of good information as well as giving a sense of the directional thinking of funders. The report contains useful figures and data.

The key conclusions are that the full OA market is competitive and functioning with varied pricing depending on field and impact. The market for APCs of articles in hybrid journals run by subscription publishers is dysfunctional with relatively flat pricing (and low uptake). A set of analyses of the report have started to appear and look out for a some comments and a summary here over the next few days.

Science self-corrects – instantly

Image by Boston Public Library (CC BY 2.0)

Image by Boston Public Library (CC BY 2.0)

March 11:  The  blog for PubPeer, an online post-publication commenting service, discusses two papers published in Nature in January purporting to provide a revolutionary simple technique for producing pluripotent stem cells, termed STAP cells. A post by Paul Knoepfler on his own blog expressed initial doubts which were fuelled by comments on PubPeer exposing problems with the figures in the paper. Knoepfler then hosted a post to crowdsource researchers trying to replicate the apparently simple production of mouse STAP cells – so far with little success. And then comments posted last week suggested that some of the figures in one of the Nature papers were duplicated from seemingly different experiments in the lead author’s doctoral thesis. Not long after, the Wall Street journal reported that the senior co- author from RiKEN (a leading Japanese research institute) asked for the papers to be retracted though a co-author at Harvard continues to support the work. Nature also reported on the press conference with RIKEN, who announced the findings of an interim investigation but have not made any decision about retracting the papers. The story rumbles on – this week, RIKEN withdrew the original press release about the paper stating “The reports described in the CDB news story “STAP cells overturn the pluripotency paradigm” published on Jan 30, 2014 are currently under investigation. Due to a loss of confidence in the integrity of these publications, RIKEN CDB has withdrawn press release from its website.”

The moral of the story is not just whether the papers in Nature hold up but whether commenting platforms like PubPeer and blogs provide a valid means to scrutinise bold claims. While there are cases where issues are identified there is also concern about the potential for personal attacks, particularly given the anonymity that some of these platforms provide.

This is especially important given the increasing number of times pre-publication review actually fails. In a recent and related post in Nature, Richard van Noorden discusses how to choose among the many venues now available for researchers to discuss work after publication, focusing on a discussion of the same papers by Kenneth Lee on ResearchGate. ResearchGate provided Lee with a more structured review form, they’re calling Open Review. PLOS Labs has also recently set up a similar innovative initiative, called ‘Open Evaluation’. As PubPeer conclude at the end of the post (main link) “Science is now able to self-correct instantly. Post-publication peer review is here to stay”

How is it possible that Elsevier are still charging for copies of open-access articles?

March 11: Mike Taylor provides a run-down of the charges that Elsevier still try to apply for reusing their ‘Open Access’ Articles. Apparently, it’s all to do with a problematic technical fix that Elsevier has been trying to solve for a couple of years now. And this week Peter Murray Rust publishes a letter he received from Elsevier’s Director of Access and Policy Alicia Wise, about how they have now taken steps to fix the problem and compensate individuals who have been mis-sold Open Access products….

Should we eliminate the Impact Factor?

March Issue: An interesting take on the impact factor by the Editor in Chief of Biotechniques, Nathan Blow. He reviews the pros and cons of the impact factor versus other article level metrics and concludes that there is an equal danger of misuse if researchers get wedded to any single alternative metric. And he thinks they will because scientists need something on which to base their publishing decisions. What he ends up calling for is a bit more sense in the way we use metrics, and a bit less laziness: “we need to change the way scientists view such metrics: While it might be good to publish in a top tier journal with an Impact Factor of 30—if your article only gets 2 citations, what does this mean? And the opposite is also true—if the journal has an Impact Factor of 2, but your article receives 500 citations in 2 years, should you be penalized for where you publish? And fundamentally, what does it mean to get 2 versus 500 citations? The validity of any statistic or analysis tool depends on careful and appropriate application by an informed user. Maybe scientists need to look beyond sheer numbers towards the “community” impact of their studies. Here, network analysis showing the reach of an article based on a deeper citation analysis might provide stronger insights into its impact. Tenure committee members also need to look beyond the simple “30-versus-2” Impact Factor debate and use their experience and knowledge to see the true contribution that a scientist is making to their field and beyond—you cannot ask a young scientist to do something that you are not willing to do yourself! In the end, measures such as the Impact Factor are only “lazy” statistics because we make them lazy.”

While I agree with much of what he says, I think he omits another factor that scientists should consider when they choose where to publish and that’s the service the journal (or platform) provides Are there the checks and balances that ensure your work is properly reported, what sort of  peer- review service do they have, does the publisher help ensure that the data underlying the paper’s findings are available, is the metadata in a form that means your article or the components within it can be found by anyone interested regardless of subject or geographic location, and can the content be reused easily if others do find it. Making sure your work can be validated, disseminated and is searchable and reusable is what really counts. The metrics will follow.

Is a Rational Discussion of Open Access Possible?

Image by Vaguery (CC-BY 2.0)

Image by Vaguery (CC BY 2.0)

March 10: A dedicated blog set up by Rick Anderson to host the slides and transcripts of a talk he gave at the Smithsonian Libraries. Rick is perhaps better known as a chef for Scholarly Kitchen (where he’s posted a link to the video of his talk). Both blogs have a lively set of comments – largely supportive ones from Mike Taylor for example even though he was criticised in the talk and some insights from Jan Velterop (who started BMC with Vitek Tracz).

Altmetrics could enable scholarship from developing countries to receive due recognition

March 10: Fantastic post by Juan Pablo Alperin on the potential impact of altmetrics for researchers in developing countries. One of the issues he raises is the perception that researchers in developing countries don’t produce as much research but this is largely because the research they do produce is not represented in abstracting and indexing services such as Thomson Reuters’ Web of Science. This means that the work doesn’t get cited as much and the journals don’t even gain entry into the notorious impact factor game. Resources like SciELO are trying to redress this balance but are still working with only a subset of the 5000+regional journals (largely from S. America). He provides a striking image (below) of the world scaled by the number of papers in  Web of Science by authors actually living there which puts the lack of representation in these countries into stark relief.

World Scaled  Image by Juan Pablo Alperin (cC-BY)

World Scaled Image by Juan Pablo Alperin (CC BY)

But whether altmetrics can help redress this balance is open to question. The potential is huge but to realise the promise, he argues, altmetrics (and the ALM community more generally) need to engage with scholars from developing regions. He cautions that if altmetrics are used as just another means to rank scholars then the danger is that they will evolve to cater only for those in locations where they are most heavily used (i.e. not in developing countries). However he is part of the Public Knowledge Project working with the PLOS’ ALM application to provide a free altmetrics service to journals being run and hosted in developing countries (via the OJS platform). “As the field begins starting to consolidate, I remain optimistically pro-altmetrics for developing regions, and I have faith in the altmetrics community to serve all scholars. Which directions altmetrics should go, how they should be used, or how the tools should be implemented is not for me to prescribe, but if we exclude (or do not seek to include) scholars from developing regions, altmetrics will become another measure from the North, for the North. And we already know that story.”

Dubiously online

March 08: Article about the need to police open access journals in India by an Indian Academic, Rudrashis Datta: “Unless the higher education authorities step in to stem the rot, serious open access publishing, so important in the Indian academic context, runs the risk of dying a natural death, leaving serious researchers and academics without the advantage of having to showcase their scholarly contributions to readers across the world.“

The price of publishing with arXiv

March 05: Mathematician discussing the advantages and disadvantages of publishing in arXiv: “The advantage: I had a lot of fun. I wrote articles which contain more than one idea, or which use more than one field of research. I wrote articles on subjects which genuinely interest me, or articles which contain more questions than answers. I wrote articles which were not especially designed to solve problems, but to open ones. I changed fields, once about 3-4 years.”….”The price: I was told that I don’t have enough published articles…”

“But, let me stress this, I survived. And I still have lots of ideas, better than before, and I’m using dissemination tools (like this blog) and I am still having a lot of fun.”

Making it Free, Making it Open: Crowdsourced transcription project leads to unexpected benefits to digital research

Image by Ewan Munro (CC BY-SA)

Image by Ewan Munro (CC BY-SA)

March 03: Melissa Terras Professor of Digital Humanities at University college London.discusses how they used crowdsourcing to transcribe all of philosopher and reformer, Jeremy Bentham’s writings.  The ‘Trancscribe Bentham’ site is hosted by UCL. It is a fantastic project and seems likely to follow the success of similar crowdsourcing initiatives in the sciences like GalaxyZoo. As Mellisa notes, “This week we hit over 7000 manuscripts transcribed via the Transcription Desk, and a few months ago we passed the 3 million words of transcribed material mark. So we now have a body of digital material with which to work, and make available, and to a certain extent play with. We’re pursuing various research aims here – from both a Digital Humanities side, and a Bentham studies side, and a Library side, and  Publishing side. We’re working on making canonical versions of all images and transcribed texts available online.  Students in UCL Centre for Publishing are (quite literally) cooking up plans from what has been found in the previously untranscribed Bentham material, unearthed via Transcribe Bentham. What else can we do with this material?” And there are lots of doors opening for them too – such as looking into Handwritten Text Recognition (HTR) technologies.

Experiment in open peer review for books suggests increased fairness and transparency in feedback process

Feb 28: Hazel Newton, the Head of Digital Publishing at Palgrave Macmillan describes their current peer review pilot investigating how open feedback functions in monograph publishing and gets feedback from authors involved in project. Great to see open peer review experiments in the humanities as well as the sciences.

Category: altmetrics, News, Open Access, Open Access policy, Uncategorized | Tagged , , , , , | Leave a comment

Why FIRST is a Trojan Horse

Bookmark and Share

Bill would be a major setback to progress on public access to US federally funded research

PLOS opposes the public access language set out within a bill introduced to the US House of Representatives on Monday, March 10. Section 303 of H.R. 4186, the Frontiers in Innovation, Research, Science and Technology (FIRST) Act would undercut the ability of federal agencies to effectively implement the widely supported White House Directive on Public Access to the Results of Federally Funded Research and undermine the successful public access program pioneered by the National Institutes of Health (NIH) – recently expanded through the FY14 Omnibus Appropriations Act to include the Departments Labor, Education and Health and Human Services.  Adoption of Section 303 would be a step backward from existing federal policy in the directive, and put the U.S. at a disadvantage among its global competitors.

PLOS has never previously opposed public access provisions in US legislation but the passage of FIRST as currently written would reduce access to tax-payer funded publications and data, restrict searching, text-mining and crowdsourcing and place US scientists and businesses at a competitive disadvantage.

“PLOS stands firmly alongside those seeking to advance public access to publicly funded knowledge”, said PLOS Chief Executive Officer Elizabeth Marincola. “This legislation would be a substantial step backwards compared to the existing U.S. policy as set out by the White House and in the recent Omnibus Bill.”

As the Scholarly Publishing and Academic Resources Coalition (SPARC) outlines, Section 303 would:

  • Slow the pace of scientific discovery by restricting public access to articles reporting on federally funded research for up to three years after initial publication.  This stands in stark contrast to the policies in use around the world, which call for maximum embargo periods of no more than six to 12 months.
  • Fail to support provisions that allow for shorter embargo periods to publicly funded research results.  This provision ignores the potential harm to stakeholders that can accrue through unnecessarily long delays.
  • Fail to ensure that federal agencies have full text copies of their funded research articles to archive and provide to the public for full use, and for long-term archiving.  By condoning a link to an article on a publisher’s website as an acceptable compliance mechanism, this provision puts the long term accessibility and utility of federally funded research articles at serious risk.
  • Stifle researchers’ ability to share their own research and to access the works of others, slowing progress towards scientific discoveries, medical breakthroughs, treatments and cures.
  • Make it harder for U.S. companies – especially small businesses and start-ups – to access cutting-edge research, thereby slowing their ability to innovate, create new products and services and generate new jobs.
  • Waste further time and taxpayer dollars by calling for a needless, additional 18-month delay while agencies “develop plans for” policies.  This is a duplication of federal agency work that was required by the White House Directive and has, in large part, already been completed.
  • Impose unnecessary costs on federal agency public access programs by conflating access and preservation policies as applied to articles and data.  The legislation does not make clear enough what data must be made accessible, nor adequately articulate the location of where such data would reside, or its terms of use.

The FIRST Act was introduced in the House of Representatives by Chairman Lamar Smith (R-TX) and Rep. Larry Bucshon (R-IN). It is expected to be referred to the House Committee on Science, Space, and Technology.

Take Action Before Thursday, March 13:

Encourage federal agencies to implement the White House Directive and ensure the passage of the bipartisan, bicameral Fair Access to Science and Technology Research (FASTR) Act.

Category: Open Access policy | Tagged , , , | 2 Comments

Best Practice in Enabling Content Mining

Bookmark and Share

This is a document prepared to support the European Commission’s ongoing discussion on Content Mining. In particular it is a discussion of publisher best practice in terms of enabling content mining and the challenges that can arise when particular types of traffic reach high levels from the perspective of a purely Open Access publisher.

Introduction

Enabling the discovery and creative re-use of content is a core aim of Open Access and of Open Access publishers. For those offering Open Access publication services enabling downstream users to discover and use published research is a crucial part of the value offering for customers. Content mining is an essential emerging means of supporting discovery of research content and of creating new derivative works that enhance the value of that content.

Content mining generally involves the computational reading of content, either by obtaining specific articles from a publisher website or by working on a downloaded corpus. Computational access to a publisher website has the potential in theory to create load issues that may degrade performance for human or other machine users.

In practice downloads that result from crawling and content mining contribute a trivial amount to the overall traffic at one of the largest Open Access publisher sites and are irrelevant compared to other sources of traffic. This is true both of average traffic levels and of unexpected spikes.

Managing high traffic users is a standard part of running a modern web service and there are a range of technical and social approaches to take in managing that use. For large scale analysis a data dump is almost always going to be the preferred means of accessing data and removes traffic issues. Mechanisms exist to request automated traffic be kept at certain levels and these requests are widely followed – where they are not technical measures are available to manage these problematic users.

Scale and scope of the problem

PLOS receives around 5 million page views per month users to a corpus of 100,000 articles as reported by Google Analytics. This is a small proportion of the total traffic as it does not include automated agents such as the Google bot. The total number of page views per month is over 60 million for PLOS ONE alone. Scaling this up to the whole literature suggests that there might be a total of 500 million to 5 billion page views per month across the industry, or up to seven million an hour from human users. As noted below the largest traffic websites in the world provide guidance that automated agents should limit retrieving pages to a specified rate. Wikipedia suggests one page per second. PLOS requests a delay of 30s between downloading pages.

PLOS infrastructure routinely deals with spikes of activity that are ten times the average traffic and is designed to manage loads of over 100 times average traffic without suffering performance problems. Thus it would require hundreds of thousands of simultaneously operating agents to even begin to degrade performance.

Content mining is a trivial and easily managed source of traffic compared to other sources, particularly coverage on popular social media sites. Coverage of an article on a site like Reddit often leads to tens of thousands of requests for a single page within an hour. By contrast automated crawling usually leads to a smaller number of overall downloads and is spread out over longer time periods making it much easier to manage. As an example there are attempts made to artificially inflate article download counts, which involve tens of thousands of requests for the same article. We do not even attempt to catch these at the level of traffic spikes because they would be undetectable, they are detected through later analysis of the article usage data.

Sources of traffic that do cause problems are generally rogue agents and distributed denial of service attacks where hundreds of thousands or millions of requests occur per second. These sources of traffic are the main source of service degradation and need to be managed based on the scale of traffic and the likelihood of being a target for such attacks. The scale of content mining traffic for any given publisher will be dependent on the scale of interest in the content that publisher is providing.

Management approaches

There are broadly three complementary approaches to supporting content mining in a way that does not have any impact on user experience. While all of these approaches are implemented by effective scholarly publishers it is worth examining these approaches in the context of a truly high traffic site. Wikipedia is an excellent example of an extremely high traffic site that is also subject to large scale mining, scraping, and analysis.

Providing a data dump

The first and simplest approach is to provide a means of accessing a dump of all the content where it can be obtained for off line analysis. Generally speaking the aim of analysis is to mine a whole corpus and enabling the user to obtain a single dump and process this offline improves the experience for the miner while removing any risk of impact to website performance. Wikipedia provides a regular full dump of all content for all language versions and recommends that this be the first source of content for analysis. Many Open Access publishers adopt a similar strategy utilising deposition at Pubmed Central or on their own websites as a means of providing access to a full dump of content. PLOS recommends that those wishing to parse the full corpus use PMC or EuropePMC as the source of that content.

This approach is especially useful for smaller publishers running their own infrastructure as it means they can use a larger third party to handle dumps. Of course for smaller publishers with a relatively small corpus the scale of such a data dump may also be such that readily available file sharing technologies suffice. For a small publisher with a very large backfile the imperative to ensure persistence and archiving for the future would be further supported by working with appropriate deposit sites to provide both access for content miners and preservation. Data dumps of raw content files are also unlikely to provide a viable alternative to access for human readers so need not concern subscription publishers.

Agreed rates of crawling

It is standard best practice for any high traffic website to provide a “robots.txt” file that include information on which parts of the sites may be accessed by machine agents, or robots, and at what rate. These files should always include a ‘crawl-delay’ which indicates the time in seconds that an agent should wait before downloading a new page. Wikipedia’s robot.txt file says for instance “Friendly, low-speed bots are welcome viewing article pages, but not dynamically-generated pages please” and suggests a delay of at least one second between retrieving pages. This is not enforced technically but is a widely recognised mechanism that is respected by all major players – not following this is generally regarded as grounds for taking technical measures as described below.

PLOS request a crawl delay of 30 seconds currently, Biomed Central asks for one second, eLife for ten. When working with content from a large publisher crawl delays of this magnitude means that it is more sensible for large scale work to obtain a full data dump. Where a smaller number of papers are of interest, perhaps a few hundred or a few thousand then the level of traffic that results from even large numbers of content mining agents that respect the crawl delay is trivial compared to human and automated traffic from other sources.

Technical measures

It is however the case that some actors will not respect crawl-delays and other restrictions in robots.txt. In our experience this is rarely the case with content miners and much more frequently the result of malicious online activity, rogue automated agents, or in several cases the result of security testing at research institution which sometimes involves attempts to overload local network systems.

Whether the source is a spike in human traffic, malicious agents, or other sources of heavy traffic maintaining a good service requires that these issues be managed. The robots.txt restrictions become useful here as when it is clear that an agent is exceeding those recommendations it can be shut down. The basic approach here is to “throttle” access from the specific IP address that is causing problems. This can be automated, although care is required because in some cases a single IP may represent a large number of users, for instance a research institution proxy. For PLOS such throttling is therefore only activated manually at present. This has been done in a handful of cases, none of which related to text mining.

At larger scale automated systems are needed but again this is part of running any highly used website. Load-balancing, monitoring incoming requests and managing the activity of automated is a standard part of running a good website. Identifying and throttling rogue activity is just one part of the suite of measures required.

Summary

Enabling content mining is a core part of the value offering for Open Access publication services. Downloads that result from crawling and content mining contribute a trivial amount to the overall traffic at one of the largest Open Access publisher sites and are irrelevant compared to other sources of traffic. This is true both of average traffic levels and of unexpected spikes.

Managing high traffic users is a standard part of running a modern web service and there are a range of technical and social approaches to take in managing that use. For large scale analysis a data dump is almost always going to be the preferred means of accessing data and removes traffic issues. Mechanisms exist to request automated traffic be kept at certain levels and these requests are widely followed – and where they are not technical measures are available to manage these problematic users.

There are sources of traffic to publisher website that can cause problems and performance degradation. These issues are part of the competent management of any modern website. Content mining, even if it occurred at volumes orders of magnitude above what we see currently, would not be a significant source of issues.

 

Category: RFI | Tagged , , , , | 4 Comments