Disambiguation with ORCID at PLOS

Here at PLOS, linking authors and publications continues to be a persistent problem. Trying to find that one paper by “John Kim” that you wanted to reference? The prospect of wading through numerous “John Kim’s” can seem daunting. At PLOS ONE when searching for John Kim as an author you get four different results. Combing through search results and DOIs is not ideal. Throw ambiguous reviewer and academic editor identities into the mix and things only get worse.

Enter Open Researcher & Contributor ID (ORCID). ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. The ORCID service launched in October 2012, and to date more than half a million researchers have registered for an ORCID identifier. During that time, a number of integrations were launched that supported the visibility and adoption of ORCID identifiers by the community, and we at PLOS are proud to be a platinum sponsor of ORCID and one of the more than 100 organizations who have become ORCID members.


ORCID integration chart at https://orcid.org/organizations/integrators/integration-chart.

Integration with the ORCID service can happen in a variety of ways, and that of course depends on the type of organization, e.g. publisher, academic institution or funder. As a publisher, the following integrations are important at PLOS:

  1. allow authors to provide their ORCID identifier when they submit a manuscript;
  2. allow authors and other contributors to link their ORCID identifier to their PLOS profile page;
  3. include the ORCID identifiers for authors in the manuscript metadata sent to services such as CrossRef and PubMed;
  4. import and verify the authorship claims made by authors in their ORCID profile for already published PLOS papers (and again send this information to CrossRef and PubMed);
  5. import other information from the ORCID service into the PLOS profile such as affiliation, publications and other research outputs not published with PLOS, and grant information.

In September 2013 PLOS announced integration #1 allowing authors to include their ORCID identifier with manuscript submissions. Today we are happy to announce integration #2, allowing everyone with a PLOS account to link their account to their ORCID identifier. This is done via the OAuth protocol and works via a three-step authentication process:

  • First, we (PLOS) authenticate ourselves with ORCID
  • Second, you (the user) authenticate yourself with ORCID, and grant PLOS the authority to read data from your ORCID profile on your behalf.
  • Finally, PLOS queries ORCID for metadata about you (the user).


Authenticating with ORCID and allowing PLOS to read profile information. 

At any point in time our users will have the authority to revoke this access by de-linking their PLOS account from the ORCID service, again via OAuth.


PLOS account with linked ORCID identifier. 

Linking the PLOS account with the ORCID identifier is a very important integration step and we encourage all users to update their accounts. And if you don’t have an ORCID ID yet, you can easily create an ORCID ID in the process.

As you can see from the list of needed integrations above, there is still a lot of work to do. Not only do we have to encourage everyone to add their ORCID ID, but we also have to exchange this information with external services. It is not enough that PLOS knows what ORCID identifiers are associated with what PLOS publications, but this information has to be shared with external services such as CrossRef, PubMed, commercial indexing services such as Web of Science and Scopus, and of course ORCID. Additionally, PLOS wants to know about the authorship claims to PLOS papers made in other places, e.g. the ORCID registry.

As of last Friday, there were 551,203 live ORCID IDs, but ONLY 121,529 ORCID IDs with at least one linked work (e.g. publication). Do your part : The more works that are linked  to ORCID IDs, the more rich the ORCID profiles (associated publications, affiliation and (coming soon) grants) and assurance that you are recognized for your hard work and successful disambiguation.

VN:F [1.9.22_1171]
Rating: 9.3/10 (4 votes cast)
Category: Tech | 2 Comments

One Step Closer to Article-Level Metrics openly available for all Scholarly Content


Since PLOS embarked on the Article-Level Metrics (ALM) work in 2009, we have always imagined a future in which ALMs would be freely available regardless of publisher. Metrics would be compiled to facilitate comparisons between articles and add even greater value to the scholarly community.

PLOS built the ALM application to handle our own substantial publishing enterprise, and it has been running well for us for almost exactly 5 years now. We made the ALM software available as open source software in 2011, and last year a number of publishers have started to use the ALM software for their own journals. As more publishers are expressing interest in collecting and displaying the data, as well as the ongoing efforts to discuss altmetrics standards and best practices, in particular the NISO Altmetrics Initiative, we have seen the discussion shift from WHAT (are ALMs and should we care) to HOW (can we implement ALM)?

As part of this increased interest in ALM we today are launching the ALM community site. This site contains a lot of content previously available in a number of separate places, and we plan to add more content in the coming months. We like the Examples section which showcases ALM visualizations of done with d3.js and R, with source code and data openly available to make it easier for people to get started using ALM data.

With the launch of the CrossRef Labs ALM application today, we get another big step closer to that goal. For the first time, the latest ALMs will be available for journal articles and other content items (books chapters, data, theses, technical reports) from thousands of scholarly and professional publishers around the globe. The publications span the entire spectrum of scholarly research, including life sciences, physical sciences, humanities, social science, etc. As this CrossRef Labs experiment is just getting started, it will take a couple of months to begin collecting data for the 11 million+ publications from 2010 onwards, currently included. But there’s no need to wait until then. We encourage everyone to start using the data that has already been harvested, which is expanding to cover more of CrossRef’s collection on a daily basis. All data are freely available online or via API for customized, bulk requests.

We invite everyone across the scholarly research ecosystem – researchers, bibliometricians, institutions, funders, librarians, technology providers, etc. – to think big with us. Now that we have systematic data about the activity surrounding research publications, how do we turn it into useful information for discovery and evaluation? This is a work in progress, not yet a formal service. But the launch of the ALM application by CrossRef Labs is a monumental step towards making ALMs an underlying and integral part of the infrastructure that supports the research process and facilitates its progress.

When testing out the CrossRef Labs ALM application, please keep in mind that this a lab experiment. Not only are there some differences in the available data sources for a variety of reasons, but the number of articles managed by the CrossRef Labs ALM application is larger by orders of magnitude. So please be patient as you try out this new resource. As an open source project, it is under active development, so look out for continued improvements and new types of article activity soon coming.

VN:F [1.9.22_1171]
Rating: 9.0/10 (1 vote cast)
Category: Tech | 5 Comments

A new way to explore figures

We’re thrilled to announce another improvement to our journal websites at PLOS. Head over to your favorite article and open any figure to see our brand new figure lightbox that makes it easy to explore the details of very large figures. I recommend this fantastic article about thresher sharks.

You’ll notice that figures open to take advantage of your full browser window. If you use a large monitor, maximize your browser window to see the new lightbox in full effect.




For large, detailed figures you can now zoom in and pan around to study every data point or nuanced tail slap.




The full caption is still easily accessible in the lower right, while simple navigation options will help you quickly skim through a figure-heavy paper.




And to keep things snappy, the lightbox will initially load a smaller version of the figure while the full resolution image loads in the background. Once the larger version is fully loaded, it will automatically replace the smaller version, enabling the zoom and pan options mentioned above.

We’ve also added some new functionality to help you quickly assess whether an article is relevant to your research. When you’re browsing the PLOS ONE homepage, subject area browse pages, figure-based search results, or monthly issue pages, you can launch the figure lightbox for research articles without first loading the full article page. From there, you can toggle between an abstract view, the figure browser, or a reference view using the links in the upper right corner, enabling you to quickly judge whether an article is worth further reading.

We hope you find this to be helpful to your research process. As always, we’d love to hear your feedback–feel free to write us or leave your thoughts as a comment below.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Leave a comment

Connecting all the dots – looking at the research paper & beyond


Research catalyzes research, which catalyzes more research…  



At PLOS, we’re connecting the dots between our publications and the plethora of outside content discussing the research.  On every article, we now showcase the rich tapestry of activity surrounding the research paper.  They are pulled together by our staff and are now collectively shared with the article.  We surface stories about the research covered in news media, blog commentary, as well as a diverse set of contexts.  As a whole, these stories enrich the research narrative and offer a vivid view into broad scholarly and public activity surrounding the article.

Continue reading »

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | 4 Comments

Research findings: going deeper than the article

We have recently released two new Article-Level Metrics (ALM) data sources: Europe PMC Database Citations and DataCite. The data from both sources are displayed on the metrics tab of PLOS articles, are available through the PLOS ALM API, and are available to all users of the open source ALM application. These new sources are both related to research data, and they represent a new breed of metrics in our ALM suite.

Continue reading »

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Leave a comment

Creative Commons for Science: Interview with Puneet Kishor

Creative Commons provides copyright licenses to help standardize and simplify the sharing of scientific content and other creative works. PLOS applies the Creative Commons attribution license to all published works, and Creative Commons licenses are essential for Open Access publications. The long-awaited version 4.0 of the Creative Commons licenses was released last week. This release provided a perfect opportunity to ask Puneet Kishor from Creative Commons a few questions.

Continue reading »

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Tagged | 3 Comments

Evaluating Impact: What’s your number?

Flickr photo by szczel.

Flickr photo by szczel.

What’s your number? This Saturday, we (MF) will be addressing this very question at the SpotOn London conference with Marie Boran, David Colquhoun, Jean Liu and Stephen Curry. In advance of the discussion on the role of altmetrics in evaluating scientific research, we offer our views from a historical perspective.

Continue reading »

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Tagged , | Leave a comment

How to find an appropriate research data repository?

As more and more funders and journals adopt data policies that require researchers to deposit underlying research data in a data repository, the question how to choose a repository becomes more and more important. Heinz Pampel is one of the people behind re3data.org, an Open Science tool that helps researchers to easily identify a suitable repository for their data and thus comply to requirements set out in data policies.


The debate on open access to research data is gaining relevance. This February, the federal agencies in the U.S. have been told by the Office of Science and Technology Policy (OSTP) to maximize access to data from publicly funded research. In June, the G8 science ministers published a set of principles for open scientific research data. The ministers declared that, if possible, „publicly funded scientific research data should be open“. And already last year, the European Commission announced a pilot framework in Horizon 2020, the coming EU framework programme for research and innovation, to promote open access to research data.

Although scientists agree with the potential benefit of data sharing for the scientific progress, the majority is reserved when it comes to practical implementations. One reason for the reluctance is a lack of reliable “systems that make it quick and easy to share data” (Tenopir et al. 2011).

The current landscape of data repositories is heterogeneous. Some initiatives like the Data Seal of Approval (DSA) and the World Data System (WDS) are working on the standardization of data repositories. And there are already certification and auditing procedures for data repositories. Two examples are the DIN 31644 and the ISO 16363 standards. But these standards are not widely used yet. Research data repositories and their services are mostly characterized by the scientific discipline in which they work. They store a wide variety of file formats under different conditions for access and reuse. In many cases it is difficult for researchers to find an appropriate repository for the storage of their data. To overcome these shortcomings we started re3data.org – Registry of Research Data Repositories.

re3data.org – Registry of Research Data Repositories

Launched in 2012, re3data.org provides an overview of existing research data repositories. In September 2013, re3data.org lists 600 research data repositories, 400 of these are described in detail by a comprehensive vocabulary. The registry covers data repositories from all academic disciplines.

In re3data.org researchers can easily see the terms of access and use of each data repositories and other characteristics. Information icons help researchers to easily identify an adequate repository for the storage and reuse of their data.


Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

Aspects of a Research Data Repository with the corresponding icons used in re3data.org.

re3data.org covers the following aspects of a research data repository:

  • general information (e.g. short description of the repository, content types, keywords),
  • responsibilities (e.g. institutions responsible for funding, content or technical issues),
  • policies (e.g. guidelines and policies of the repository),
  • legal aspects (e.g. licenses of the database and datasets),
  • technical standards (e.g. APIs, versioning of datasets, software of the repository),
  • quality standards (e.g. certificates, audit processes).

The re3data.org portal offers two search possibilities: (1) free text search through a simple search box, and (2) filters for more specific searches. In the list of results each record includes the name of the repository, the subjects covered, a brief description of the content and a set of icons visualizing key properties of the repository. A comprehensive view of the descriptive record of the repository can be obtained by clicking on the name of the repository in the search results.  It is also possible to simply browse through the list of indexed data repositories.


Example screenshot of search results for geosciences data repositories using persistent identifiers.

Operators of data repositories can suggest their infrastructures to be listed in re3data.org by filling in an online application form. A repository is indexed when the minimum requirements for inclusion in re3data.org are met. These requirements are described in the re3data.org vocabulary. The project team reviews each repository and reviewed repositories are identified by a green check mark.

The project cooperates with other Open Science initiatives like BioSharing, DataCite and OpenAIRE. Some publishers already refer to re3data.org in their Editorial Policies as a tool for the identification of suitable data repositories.

Next Steps

In the upcoming project phase the focus will be on improving usability and implementing new features. Among other things, the dialog with repositories operators will be supported by a workflow system. Beyond the development of the registry, the project will promote the standardization of research data repositories.

re3data.org is funded by the German Research Foundation (DFG). Project partners are GFZ German Research Centre for Geosciences,  Humboldt-Universität zu Berlin and Karlsruhe Institute of Technology (KIT).  These three partners, with their expertise in information infrastructures, guarantee the sustainability of the registry.

Further information on re3data.org can be found in a recently published article in PLOS ONE:

Pampel, H., et al. (2013). Making Research Data Repositories Visible: The re3data.org Registry. PLOS ONE. doi: 10.1371/journal.pone.0078080

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Tagged | 8 Comments

ALM Data Challenge: Metrics for a Standard Set of DOIs

As part of the ALM workshop described in more detail in the previous post we also met for an ALM Data Challenge which took place in the PLOS offices on October 12.

Many readers are probably familiar with a hackathon, where a group of people collaborate on one or more software projects for a day (or a few days). This is what we did at the ALM workshop last year, but this year we wanted to focus on the data, and the interesting things we could do with them, rather than the software development aspect. We thought that this makes it easier for people who are not software developers to get involved, to get something done in the limited time available, and to have something that can be continued after the workshop.

Continue reading »

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Tagged , , | 1 Comment

Advanced search API capabilties

Most consumers use the search functionality on our journals to find PLOS articles by keyword, subject term, author, and title. However, we offer many more features via our search API. We use Apache SOLR exposed as a read only service for public consumption that can do a lot more than our simple and advanced search available via the journal websites. While our search API can do a lot more than is sensible to fit into a single blog post, I want to walk you through three areas of our implementation: ALM statistics, word proximity searching and search result response types.

ALM Statistics

ALM statistics allows our users to sort their search results based on popularity.  However, there is a lot more you can do with this data besides just sorting.  I always felt that the ability to only show research based on popularity in particular networks could be very powerful.

Here are all of the fields in our search server that contain ALM statistics for you to search on.

Field Name Field Description
counter_total_all Total article views, all time
counter_total_month Total article views, last 30 days
alm_scopusCiteCount Count of cites, as per scopus
alm_citeulikeCount Count of bookmarks on citeulike
alm_connoteaCount Count of bookmarks on connotea
alm_mendeleyCount Count of bookmarks in Mendeley
alm_twitterCount Count of tweets
alm_facebookCount Count of Facebook Likes
alm_pmc_usage_total_all Total article views, from PMC

Example Query: (note: while the below API key works, you are required to get your own key prior to use)

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all, counter_total_month,alm_scopusCiteCount,alm_citeulikeCount, alm_connoteaCount,alm_mendeleyCount,alm_twitterCount, alm_facebookCount,alm_pmc_usage_total_all&api_key=DEMO

The default response format is XML, more on this below.  Note the parameters used with this query.

  • “fl”, this specifies what fields you want the API to return.  A full list is a available at: http://api.plos.org/solr/search-fields/.
  • The doc_type field, specifically “fq=doc_type:full” for the context of this posting will always be in use.  We’ll spare the details for now.
  • q=*:* The meat of the search query, *:* is a shortcut for show me everything

Now, if I wanted to sort on article usage data to get the most viewed articles of all time.  I would add the sort parameter.  Like so: “sort=counter_total_all desc” to the query.

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all&sort=counter_total_all desc&api_key=DEMO

It’s worth noting, your browser may be encoding special characters in the URL.  More details can be found here.  I’m letting your browser worry about this, if you write a tool, you will have to do this encoding yourself.

Next, lets talk about ranged and filter queries.  Below I’m using two new parameters of type “fq”, a Filter Query.  Filter query means “Only give me documents that match THIS filter”.  “[100 TO *]” matches values in a range.

  • fq=subject:”Social networks”  — Only show me articles that are tagged with the subject “Social networks”
  • fq=alm_twitterCount:[100 TO 10000] — Only show me articles that have between 100 and 10000 tweets.

Now the query.  A list of articles about social networks that are popular on a social network:

http://api.plos.org/search?q=*:*&fq=subject:”Social networks”&fq=alm_twitterCount:[100 TO 10000]&fq=doc_type:full&fl=id,title,alm_twitterCount &sort=counter_total_month desc&api_key=DEMO

Fun stuff.  You can find available subject areas by going to the PLOS One home page and clicking “Subject Areas”.  Any of the terms you see are valid.

Word proximity search

Now let’s try a word proximity search.  Lets say I want to find all research that involve sports and alcohol.

http://api.plos.org/search?q=everything:”sports and alcohol”&fq=doc_type:full&fl=id,title&api_key=DEMO

That yields a few results, but not so great.  Now lets try a proximity search.

I’m adding the “~15″ text to the query so it will look like so: q=everything:”sports alcohol”~15 — Show me all articles that have these two words less then about 15 words apart.

http://api.plos.org/search?q=everything:”sports alcohol”~15…

Many more results!  Now let’s try to narrow our results to 7 words apart.  Here I’m changing the “~15″ to “~7″

http://api.plos.org/search?q=everything:”sports alcohol”~7…

Now, lets also only look at articles that have seen some activity on twitter.  Now I’ve added “fq=alm_twitterCount:[1 TO *]” as a parameter.

http://api.plos.org/search?fq=alm_twitterCount:[1 TO *]&q=…

So, I hope you can see where I’m going with this.  I’m sure you can combine these parameters for some interesting results.

Response Types

Lastly, I wanted to touch on the various result types for your search queries.  We currently support.

JSON A javascript based syntax
CSV Formatted for consumption directly into a spreadsheet (like Excel)
ATOM Formatted for consumption by a news reader
RSS Formatted for consumption by a news reader
HTML HTML (As a beta)

You can change the result type by adding the wt parameter.  For example:

JSON: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=json

CSV: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=csv

ATOM/RSS/HTML are special types and you’ll need to specify an additonal paremeter.  These response types are also dependent on a set of terms to be defined as part of the field list.  Namely: id, title_display, abstract, publication_date this is because we use an xsl transformation to render these response types.

RSS: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=rss.xsl

ATOM: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=atom.xsl

HTML: http://api.plos.org/search?q=*:*… …&wt=xslt

This is only the beginning.  Please join the discussion, we love feedback!



VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
Category: Tech | Tagged , , | Leave a comment