Author: Martin Fenner

Disambiguation with ORCID at PLOS

Here at PLOS, linking authors and publications continues to be a persistent problem. Trying to find that one paper by “John Kim” that you wanted to reference? The prospect of wading through numerous “John Kim’s” can seem daunting. At PLOS ONE when searching for John Kim as an author you get four different results. Combing through search results and DOIs is not ideal. Throw ambiguous reviewer and academic editor identities into the mix and things only get worse.

Enter Open Researcher & Contributor ID (ORCID). ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. The ORCID service launched in October 2012, and to date more than half a million researchers have registered for an ORCID identifier. During that time, a number of integrations were launched that supported the visibility and adoption of ORCID identifiers by the community, and we at PLOS are proud to be a platinum sponsor of ORCID and one of the more than 100 organizations who have become ORCID members.

integrations

ORCID integration chart at https://orcid.org/organizations/integrators/integration-chart.

Integration with the ORCID service can happen in a variety of ways, and that of course depends on the type of organization, e.g. publisher, academic institution or funder. As a publisher, the following integrations are important at PLOS:

  1. allow authors to provide their ORCID identifier when they submit a manuscript;
  2. allow authors and other contributors to link their ORCID identifier to their PLOS profile page;
  3. include the ORCID identifiers for authors in the manuscript metadata sent to services such as CrossRef and PubMed;
  4. import and verify the authorship claims made by authors in their ORCID profile for already published PLOS papers (and again send this information to CrossRef and PubMed);
  5. import other information from the ORCID service into the PLOS profile such as affiliation, publications and other research outputs not published with PLOS, and grant information.

In September 2013 PLOS announced integration #1 allowing authors to include their ORCID identifier with manuscript submissions. Today we are happy to announce integration #2, allowing everyone with a PLOS account to link their account to their ORCID identifier. This is done via the OAuth protocol and works via a three-step authentication process:

  • First, we (PLOS) authenticate ourselves with ORCID
  • Second, you (the user) authenticate yourself with ORCID, and grant PLOS the authority to read data from your ORCID profile on your behalf.
  • Finally, PLOS queries ORCID for metadata about you (the user).

oauth

Authenticating with ORCID and allowing PLOS to read profile information. 

At any point in time our users will have the authority to revoke this access by de-linking their PLOS account from the ORCID service, again via OAuth.

confirmed

PLOS account with linked ORCID identifier. 

Linking the PLOS account with the ORCID identifier is a very important integration step and we encourage all users to update their accounts. And if you don’t have an ORCID ID yet, you can easily create an ORCID ID in the process.

As you can see from the list of needed integrations above, there is still a lot of work to do. Not only do we have to encourage everyone to add their ORCID ID, but we also have to exchange this information with external services. It is not enough that PLOS knows what ORCID identifiers are associated with what PLOS publications, but this information has to be shared with external services such as CrossRef, PubMed, commercial indexing services such as Web of Science and Scopus, and of course ORCID. Additionally, PLOS wants to know about the authorship claims to PLOS papers made in other places, e.g. the ORCID registry.

As of last Friday, there were 551,203 live ORCID IDs, but ONLY 121,529 ORCID IDs with at least one linked work (e.g. publication). Do your part : The more works that are linked  to ORCID IDs, the more rich the ORCID profiles (associated publications, affiliation and (coming soon) grants) and assurance that you are recognized for your hard work and successful disambiguation.

Category: Tech | 2 Comments

Advanced search API capabilties

Most consumers use the search functionality on our journals to find PLOS articles by keyword, subject term, author, and title. However, we offer many more features via our search API. We use Apache SOLR exposed as a read only service for public consumption that can do a lot more than our simple and advanced search available via the journal websites. While our search API can do a lot more than is sensible to fit into a single blog post, I want to walk you through three areas of our implementation: ALM statistics, word proximity searching and search result response types.

ALM Statistics

ALM statistics allows our users to sort their search results based on popularity.  However, there is a lot more you can do with this data besides just sorting.  I always felt that the ability to only show research based on popularity in particular networks could be very powerful.

Here are all of the fields in our search server that contain ALM statistics for you to search on.

Field Name Field Description
counter_total_all Total article views, all time
counter_total_month Total article views, last 30 days
alm_scopusCiteCount Count of cites, as per scopus
alm_citeulikeCount Count of bookmarks on citeulike
alm_connoteaCount Count of bookmarks on connotea
alm_mendeleyCount Count of bookmarks in Mendeley
alm_twitterCount Count of tweets
alm_facebookCount Count of Facebook Likes
alm_pmc_usage_total_all Total article views, from PMC

Example Query: (note: while the below API key works, you are required to get your own key prior to use)

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all, counter_total_month,alm_scopusCiteCount,alm_citeulikeCount, alm_connoteaCount,alm_mendeleyCount,alm_twitterCount, alm_facebookCount,alm_pmc_usage_total_all&api_key=DEMO

The default response format is XML, more on this below.  Note the parameters used with this query.

  • “fl”, this specifies what fields you want the API to return.  A full list is a available at: http://api.plos.org/solr/search-fields/.
  • The doc_type field, specifically “fq=doc_type:full” for the context of this posting will always be in use.  We’ll spare the details for now.
  • q=*:* The meat of the search query, *:* is a shortcut for show me everything

Now, if I wanted to sort on article usage data to get the most viewed articles of all time.  I would add the sort parameter.  Like so: “sort=counter_total_all desc” to the query.

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all&sort=counter_total_all desc&api_key=DEMO

It’s worth noting, your browser may be encoding special characters in the URL.  More details can be found here.  I’m letting your browser worry about this, if you write a tool, you will have to do this encoding yourself.

Next, lets talk about ranged and filter queries.  Below I’m using two new parameters of type “fq”, a Filter Query.  Filter query means “Only give me documents that match THIS filter”.  “[100 TO *]” matches values in a range.

  • fq=subject:”Social networks”  — Only show me articles that are tagged with the subject “Social networks”
  • fq=alm_twitterCount:[100 TO 10000] — Only show me articles that have between 100 and 10000 tweets.

Now the query.  A list of articles about social networks that are popular on a social network:

http://api.plos.org/search?q=*:*&fq=subject:”Social networks”&fq=alm_twitterCount:[100 TO 10000]&fq=doc_type:full&fl=id,title,alm_twitterCount &sort=counter_total_month desc&api_key=DEMO

Fun stuff.  You can find available subject areas by going to the PLOS One home page and clicking “Subject Areas”.  Any of the terms you see are valid.

Word proximity search

Now let’s try a word proximity search.  Lets say I want to find all research that involve sports and alcohol.

http://api.plos.org/search?q=everything:”sports and alcohol”&fq=doc_type:full&fl=id,title&api_key=DEMO

That yields a few results, but not so great.  Now lets try a proximity search.

I’m adding the “~15″ text to the query so it will look like so: q=everything:”sports alcohol”~15 — Show me all articles that have these two words less then about 15 words apart.

http://api.plos.org/search?q=everything:”sports alcohol”~15…

Many more results!  Now let’s try to narrow our results to 7 words apart.  Here I’m changing the “~15″ to “~7″

http://api.plos.org/search?q=everything:”sports alcohol”~7…

Now, lets also only look at articles that have seen some activity on twitter.  Now I’ve added “fq=alm_twitterCount:[1 TO *]” as a parameter.

http://api.plos.org/search?fq=alm_twitterCount:[1 TO *]&q=…

So, I hope you can see where I’m going with this.  I’m sure you can combine these parameters for some interesting results.

Response Types

Lastly, I wanted to touch on the various result types for your search queries.  We currently support.

JSON A javascript based syntax
XML
CSV Formatted for consumption directly into a spreadsheet (like Excel)
ATOM Formatted for consumption by a news reader
RSS Formatted for consumption by a news reader
HTML HTML (As a beta)

You can change the result type by adding the wt parameter.  For example:

JSON: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=json

CSV: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=csv

ATOM/RSS/HTML are special types and you’ll need to specify an additonal paremeter.  These response types are also dependent on a set of terms to be defined as part of the field list.  Namely: id, title_display, abstract, publication_date this is because we use an xsl transformation to render these response types.

RSS: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=rss.xsl

ATOM: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=atom.xsl

HTML: http://api.plos.org/search?q=*:*… …&wt=xslt

This is only the beginning.  Please join the discussion, we love feedback!

https://groups.google.com/forum/#!forum/plos-api-developers

-Joe

Category: Tech | Tagged , , | Leave a comment