Feedback Wanted: Publishers & Data Access

Short Version: We need your help!

We have generated a set of recommendations for publishers to help increase access to data in partnership with libraries, funders, information technologists, and other stakeholders. Please read and comment on the report (Google Doc), and help us to identify concrete action items for each of the recommendations here (EtherPad).


Background and Impetus

The recent governmental policies addressing access to research data from publicly funded research across the US, UK, and EU reflect the growing need for us to revisit the way that research outputs are handled. These recent policies have implications for many different stakeholders (institutions, funders, researchers) who will need to consider the best mechanisms for preserving and providing access to the outputs of government-funded research.

The infrastructure for providing access to data is largely still being architected and built. In this context, PLOS and the UC Curation Center hosted a set of leaders in data stewardship issues for an evening of brainstorming to re-envision data access and academic publishing. A diverse group of individuals from institutions, repositories, and infrastructure development collectively explored the question:

“What should publishers do to promote the work of libraries and IRs in advancing data access and availability?”

We collected the themes and suggestions from that evening in a report: The Role of Publishers in Access to Data. The report contains a collective call to action from this group for publishers to participate as informed stakeholders in building the new data ecosystem. It also enumerates a list of high-level recommendations for how to effect social and technical change as critical actors in the research ecosystem.

We welcome the community to comment on this report. Furthermore, the high-level recommendations need concrete details for implementation. How will they be realized? What specific policies and technologies are required for this? We have created an open forum for the community to contribute their ideas. We will then incorporate the catalog of listings into a final report for publication. Please participate in this collective discussion with your thoughts and feedback by April 24, 2014.

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

(Blog post is cross-posted on Data Pub)

Category: Tech | Leave a comment

The Mystery of the Missing ALM

What is ten years old and has 1.2 billion interactions a day?

No, it’s not the Large Hadron Collider. Nor is it a philosophical riddle, but rather a commonly known fact in the tech world (news here). Facebook, the largest social network, is bigger than LinkedIn, Twitter, and Google+ – combined. At the same time, it is largely absent in the scholarly communications conversations, which focus on Twitter and Mendeley.

Continue reading »

Category: Tech | Leave a comment

Disambiguation with ORCID at PLOS

Here at PLOS, linking authors and publications continues to be a persistent problem. Trying to find that one paper by “John Kim” that you wanted to reference? The prospect of wading through numerous “John Kim’s” can seem daunting. At PLOS ONE when searching for John Kim as an author you get four different results. Combing through search results and DOIs is not ideal. Throw ambiguous reviewer and academic editor identities into the mix and things only get worse.

Enter Open Researcher & Contributor ID (ORCID). ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. The ORCID service launched in October 2012, and to date more than half a million researchers have registered for an ORCID identifier. During that time, a number of integrations were launched that supported the visibility and adoption of ORCID identifiers by the community, and we at PLOS are proud to be a platinum sponsor of ORCID and one of the more than 100 organizations who have become ORCID members.


ORCID integration chart at

Integration with the ORCID service can happen in a variety of ways, and that of course depends on the type of organization, e.g. publisher, academic institution or funder. As a publisher, the following integrations are important at PLOS:

  1. allow authors to provide their ORCID identifier when they submit a manuscript;
  2. allow authors and other contributors to link their ORCID identifier to their PLOS profile page;
  3. include the ORCID identifiers for authors in the manuscript metadata sent to services such as CrossRef and PubMed;
  4. import and verify the authorship claims made by authors in their ORCID profile for already published PLOS papers (and again send this information to CrossRef and PubMed);
  5. import other information from the ORCID service into the PLOS profile such as affiliation, publications and other research outputs not published with PLOS, and grant information.

In September 2013 PLOS announced integration #1 allowing authors to include their ORCID identifier with manuscript submissions. Today we are happy to announce integration #2, allowing everyone with a PLOS account to link their account to their ORCID identifier. This is done via the OAuth protocol and works via a three-step authentication process:

  • First, we (PLOS) authenticate ourselves with ORCID
  • Second, you (the user) authenticate yourself with ORCID, and grant PLOS the authority to read data from your ORCID profile on your behalf.
  • Finally, PLOS queries ORCID for metadata about you (the user).


Authenticating with ORCID and allowing PLOS to read profile information. 

At any point in time our users will have the authority to revoke this access by de-linking their PLOS account from the ORCID service, again via OAuth.


PLOS account with linked ORCID identifier. 

Linking the PLOS account with the ORCID identifier is a very important integration step and we encourage all users to update their accounts. And if you don’t have an ORCID ID yet, you can easily create an ORCID ID in the process.

As you can see from the list of needed integrations above, there is still a lot of work to do. Not only do we have to encourage everyone to add their ORCID ID, but we also have to exchange this information with external services. It is not enough that PLOS knows what ORCID identifiers are associated with what PLOS publications, but this information has to be shared with external services such as CrossRef, PubMed, commercial indexing services such as Web of Science and Scopus, and of course ORCID. Additionally, PLOS wants to know about the authorship claims to PLOS papers made in other places, e.g. the ORCID registry.

As of last Friday, there were 551,203 live ORCID IDs, but ONLY 121,529 ORCID IDs with at least one linked work (e.g. publication). Do your part : The more works that are linked  to ORCID IDs, the more rich the ORCID profiles (associated publications, affiliation and (coming soon) grants) and assurance that you are recognized for your hard work and successful disambiguation.

Category: Tech | 2 Comments

One Step Closer to Article-Level Metrics openly available for all Scholarly Content


Since PLOS embarked on the Article-Level Metrics (ALM) work in 2009, we have always imagined a future in which ALMs would be freely available regardless of publisher. Metrics would be compiled to facilitate comparisons between articles and add even greater value to the scholarly community.

PLOS built the ALM application to handle our own substantial publishing enterprise, and it has been running well for us for almost exactly 5 years now. We made the ALM software available as open source software in 2011, and last year a number of publishers have started to use the ALM software for their own journals. As more publishers are expressing interest in collecting and displaying the data, as well as the ongoing efforts to discuss altmetrics standards and best practices, in particular the NISO Altmetrics Initiative, we have seen the discussion shift from WHAT (are ALMs and should we care) to HOW (can we implement ALM)?

As part of this increased interest in ALM we today are launching the ALM community site. This site contains a lot of content previously available in a number of separate places, and we plan to add more content in the coming months. We like the Examples section which showcases ALM visualizations of done with d3.js and R, with source code and data openly available to make it easier for people to get started using ALM data.

With the launch of the CrossRef Labs ALM application today, we get another big step closer to that goal. For the first time, the latest ALMs will be available for journal articles and other content items (books chapters, data, theses, technical reports) from thousands of scholarly and professional publishers around the globe. The publications span the entire spectrum of scholarly research, including life sciences, physical sciences, humanities, social science, etc. As this CrossRef Labs experiment is just getting started, it will take a couple of months to begin collecting data for the 11 million+ publications from 2010 onwards, currently included. But there’s no need to wait until then. We encourage everyone to start using the data that has already been harvested, which is expanding to cover more of CrossRef’s collection on a daily basis. All data are freely available online or via API for customized, bulk requests.

We invite everyone across the scholarly research ecosystem – researchers, bibliometricians, institutions, funders, librarians, technology providers, etc. – to think big with us. Now that we have systematic data about the activity surrounding research publications, how do we turn it into useful information for discovery and evaluation? This is a work in progress, not yet a formal service. But the launch of the ALM application by CrossRef Labs is a monumental step towards making ALMs an underlying and integral part of the infrastructure that supports the research process and facilitates its progress.

When testing out the CrossRef Labs ALM application, please keep in mind that this a lab experiment. Not only are there some differences in the available data sources for a variety of reasons, but the number of articles managed by the CrossRef Labs ALM application is larger by orders of magnitude. So please be patient as you try out this new resource. As an open source project, it is under active development, so look out for continued improvements and new types of article activity soon coming.

Category: Tech | 5 Comments

A new way to explore figures

We’re thrilled to announce another improvement to our journal websites at PLOS. Head over to your favorite article and open any figure to see our brand new figure lightbox that makes it easy to explore the details of very large figures. I recommend this fantastic article about thresher sharks.

You’ll notice that figures open to take advantage of your full browser window. If you use a large monitor, maximize your browser window to see the new lightbox in full effect.




For large, detailed figures you can now zoom in and pan around to study every data point or nuanced tail slap.




The full caption is still easily accessible in the lower right, while simple navigation options will help you quickly skim through a figure-heavy paper.




And to keep things snappy, the lightbox will initially load a smaller version of the figure while the full resolution image loads in the background. Once the larger version is fully loaded, it will automatically replace the smaller version, enabling the zoom and pan options mentioned above.

We’ve also added some new functionality to help you quickly assess whether an article is relevant to your research. When you’re browsing the PLOS ONE homepage, subject area browse pages, figure-based search results, or monthly issue pages, you can launch the figure lightbox for research articles without first loading the full article page. From there, you can toggle between an abstract view, the figure browser, or a reference view using the links in the upper right corner, enabling you to quickly judge whether an article is worth further reading.

We hope you find this to be helpful to your research process. As always, we’d love to hear your feedback–feel free to write us or leave your thoughts as a comment below.

Category: Tech | Leave a comment

Connecting all the dots – looking at the research paper & beyond


Research catalyzes research, which catalyzes more research…  



At PLOS, we’re connecting the dots between our publications and the plethora of outside content discussing the research.  On every article, we now showcase the rich tapestry of activity surrounding the research paper.  They are pulled together by our staff and are now collectively shared with the article.  We surface stories about the research covered in news media, blog commentary, as well as a diverse set of contexts.  As a whole, these stories enrich the research narrative and offer a vivid view into broad scholarly and public activity surrounding the article.

Continue reading »

Category: Tech | 4 Comments

Research findings: going deeper than the article

We have recently released two new Article-Level Metrics (ALM) data sources: Europe PMC Database Citations and DataCite. The data from both sources are displayed on the metrics tab of PLOS articles, are available through the PLOS ALM API, and are available to all users of the open source ALM application. These new sources are both related to research data, and they represent a new breed of metrics in our ALM suite.

Continue reading »

Category: Tech | Leave a comment

Creative Commons for Science: Interview with Puneet Kishor

Creative Commons provides copyright licenses to help standardize and simplify the sharing of scientific content and other creative works. PLOS applies the Creative Commons attribution license to all published works, and Creative Commons licenses are essential for Open Access publications. The long-awaited version 4.0 of the Creative Commons licenses was released last week. This release provided a perfect opportunity to ask Puneet Kishor from Creative Commons a few questions.

Continue reading »

Category: Tech | Tagged | 3 Comments

Evaluating Impact: What’s your number?

Flickr photo by szczel.

Flickr photo by szczel.

What’s your number? This Saturday, we (MF) will be addressing this very question at the SpotOn London conference with Marie Boran, David Colquhoun, Jean Liu and Stephen Curry. In advance of the discussion on the role of altmetrics in evaluating scientific research, we offer our views from a historical perspective.

Continue reading »

Category: Tech | Tagged , | Leave a comment

How to find an appropriate research data repository?

As more and more funders and journals adopt data policies that require researchers to deposit underlying research data in a data repository, the question how to choose a repository becomes more and more important. Heinz Pampel is one of the people behind, an Open Science tool that helps researchers to easily identify a suitable repository for their data and thus comply to requirements set out in data policies.


The debate on open access to research data is gaining relevance. This February, the federal agencies in the U.S. have been told by the Office of Science and Technology Policy (OSTP) to maximize access to data from publicly funded research. In June, the G8 science ministers published a set of principles for open scientific research data. The ministers declared that, if possible, „publicly funded scientific research data should be open“. And already last year, the European Commission announced a pilot framework in Horizon 2020, the coming EU framework programme for research and innovation, to promote open access to research data.

Although scientists agree with the potential benefit of data sharing for the scientific progress, the majority is reserved when it comes to practical implementations. One reason for the reluctance is a lack of reliable “systems that make it quick and easy to share data” (Tenopir et al. 2011).

The current landscape of data repositories is heterogeneous. Some initiatives like the Data Seal of Approval (DSA) and the World Data System (WDS) are working on the standardization of data repositories. And there are already certification and auditing procedures for data repositories. Two examples are the DIN 31644 and the ISO 16363 standards. But these standards are not widely used yet. Research data repositories and their services are mostly characterized by the scientific discipline in which they work. They store a wide variety of file formats under different conditions for access and reuse. In many cases it is difficult for researchers to find an appropriate repository for the storage of their data. To overcome these shortcomings we started – Registry of Research Data Repositories. – Registry of Research Data Repositories

Launched in 2012, provides an overview of existing research data repositories. In September 2013, lists 600 research data repositories, 400 of these are described in detail by a comprehensive vocabulary. The registry covers data repositories from all academic disciplines.

In researchers can easily see the terms of access and use of each data repositories and other characteristics. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data.


Aspects of a Research Data Repository with the corresponding icons used in

Aspects of a Research Data Repository with the corresponding icons used in covers the following aspects of a research data repository:

  • general information (e.g. short description of the repository, content types, keywords),
  • responsibilities (e.g. institutions responsible for funding, content or technical issues),
  • policies (e.g. guidelines and policies of the repository),
  • legal aspects (e.g. licenses of the database and datasets),
  • technical standards (e.g. APIs, versioning of datasets, software of the repository),
  • quality standards (e.g. certificates, audit processes).

The portal offers two search possibilities: (1) free text search through a simple search box, and (2) filters for more specific searches. In the list of results each record includes the name of the repository, the subjects covered, a brief description of the content and a set of icons visualizing key properties of the repository. A comprehensive view of the descriptive record of the repository can be obtained by clicking on the name of the repository in the search results.  It is also possible to simply browse through the list of indexed data repositories.


Example screenshot of search results for geosciences data repositories using persistent identifiers.

Operators of data repositories can suggest their infrastructures to be listed in by filling in an online application form. A repository is indexed when the minimum requirements for inclusion in are met. These requirements are described in the vocabulary. The project team reviews each repository and reviewed repositories are identified by a green check mark.

The project cooperates with other Open Science initiatives like BioSharing, DataCite and OpenAIRE. Some publishers already refer to in their Editorial Policies as a tool for the identification of suitable data repositories.

Next Steps

In the upcoming project phase the focus will be on improving usability and implementing new features. Among other things, the dialog with repositories operators will be supported by a workflow system. Beyond the development of the registry, the project will promote the standardization of research data repositories. is funded by the German Research Foundation (DFG). Project partners are GFZ German Research Centre for Geosciences,  Humboldt-Universität zu Berlin and Karlsruhe Institute of Technology (KIT).  These three partners, with their expertise in information infrastructures, guarantee the sustainability of the registry.

Further information on can be found in a recently published article in PLOS ONE:

Pampel, H., et al. (2013). Making Research Data Repositories Visible: The Registry. PLOS ONE. doi: 10.1371/journal.pone.0078080

Category: Tech | Tagged | 8 Comments