Advanced search API capabilties

Most consumers use the search functionality on our journals to find PLOS articles by keyword, subject term, author, and title. However, we offer many more features via our search API. We use Apache SOLR exposed as a read only service for public consumption that can do a lot more than our simple and advanced search available via the journal websites. While our search API can do a lot more than is sensible to fit into a single blog post, I want to walk you through three areas of our implementation: ALM statistics, word proximity searching and search result response types.

ALM Statistics

ALM statistics allows our users to sort their search results based on popularity.  However, there is a lot more you can do with this data besides just sorting.  I always felt that the ability to only show research based on popularity in particular networks could be very powerful.

Here are all of the fields in our search server that contain ALM statistics for you to search on.

Field Name Field Description
counter_total_all Total article views, all time
counter_total_month Total article views, last 30 days
alm_scopusCiteCount Count of cites, as per scopus
alm_citeulikeCount Count of bookmarks on citeulike
alm_connoteaCount Count of bookmarks on connotea
alm_mendeleyCount Count of bookmarks in Mendeley
alm_twitterCount Count of tweets
alm_facebookCount Count of Facebook Likes
alm_pmc_usage_total_all Total article views, from PMC

Example Query: (note: while the below API key works, you are required to get your own key prior to use)

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all, counter_total_month,alm_scopusCiteCount,alm_citeulikeCount, alm_connoteaCount,alm_mendeleyCount,alm_twitterCount, alm_facebookCount,alm_pmc_usage_total_all&api_key=DEMO

The default response format is XML, more on this below.  Note the parameters used with this query.

  • “fl”, this specifies what fields you want the API to return.  A full list is a available at: http://api.plos.org/solr/search-fields/.
  • The doc_type field, specifically “fq=doc_type:full” for the context of this posting will always be in use.  We’ll spare the details for now.
  • q=*:* The meat of the search query, *:* is a shortcut for show me everything

Now, if I wanted to sort on article usage data to get the most viewed articles of all time.  I would add the sort parameter.  Like so: “sort=counter_total_all desc” to the query.

http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all&sort=counter_total_all desc&api_key=DEMO

It’s worth noting, your browser may be encoding special characters in the URL.  More details can be found here.  I’m letting your browser worry about this, if you write a tool, you will have to do this encoding yourself.

Next, lets talk about ranged and filter queries.  Below I’m using two new parameters of type “fq”, a Filter Query.  Filter query means “Only give me documents that match THIS filter”.  “[100 TO *]” matches values in a range.

  • fq=subject:”Social networks”  — Only show me articles that are tagged with the subject “Social networks”
  • fq=alm_twitterCount:[100 TO 10000] — Only show me articles that have between 100 and 10000 tweets.

Now the query.  A list of articles about social networks that are popular on a social network:

http://api.plos.org/search?q=*:*&fq=subject:”Social networks”&fq=alm_twitterCount:[100 TO 10000]&fq=doc_type:full&fl=id,title,alm_twitterCount &sort=counter_total_month desc&api_key=DEMO

Fun stuff.  You can find available subject areas by going to the PLOS One home page and clicking “Subject Areas”.  Any of the terms you see are valid.

Word proximity search

Now let’s try a word proximity search.  Lets say I want to find all research that involve sports and alcohol.

http://api.plos.org/search?q=everything:”sports and alcohol”&fq=doc_type:full&fl=id,title&api_key=DEMO

That yields a few results, but not so great.  Now lets try a proximity search.

I’m adding the “~15″ text to the query so it will look like so: q=everything:”sports alcohol”~15 — Show me all articles that have these two words less then about 15 words apart.

http://api.plos.org/search?q=everything:”sports alcohol”~15…

Many more results!  Now let’s try to narrow our results to 7 words apart.  Here I’m changing the “~15″ to “~7″

http://api.plos.org/search?q=everything:”sports alcohol”~7…

Now, lets also only look at articles that have seen some activity on twitter.  Now I’ve added “fq=alm_twitterCount:[1 TO *]” as a parameter.

http://api.plos.org/search?fq=alm_twitterCount:[1 TO *]&q=…

So, I hope you can see where I’m going with this.  I’m sure you can combine these parameters for some interesting results.

Response Types

Lastly, I wanted to touch on the various result types for your search queries.  We currently support.

JSON A javascript based syntax
XML
CSV Formatted for consumption directly into a spreadsheet (like Excel)
ATOM Formatted for consumption by a news reader
RSS Formatted for consumption by a news reader
HTML HTML (As a beta)

You can change the result type by adding the wt parameter.  For example:

JSON: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=json

CSV: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=csv

ATOM/RSS/HTML are special types and you’ll need to specify an additonal paremeter.  These response types are also dependent on a set of terms to be defined as part of the field list.  Namely: id, title_display, abstract, publication_date this is because we use an xsl transformation to render these response types.

RSS: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=rss.xsl

ATOM: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=atom.xsl

HTML: http://api.plos.org/search?q=*:*… …&wt=xslt

This is only the beginning.  Please join the discussion, we love feedback!

https://groups.google.com/forum/#!forum/plos-api-developers

-Joe

This entry was posted in Tech and tagged , , . Bookmark the permalink.
Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>