Most consumers use the search functionality on our journals to find PLOS articles by keyword, subject term, author, and title. However, we offer many more features via our search API. We use Apache SOLR exposed as a read only service for public consumption that can do a lot more than our simple and advanced search available via the journal websites. While our search API can do a lot more than is sensible to fit into a single blog post, I want to walk you through three areas of our implementation: ALM statistics, word proximity searching and search result response types.
ALM statistics allows our users to sort their search results based on popularity. However, there is a lot more you can do with this data besides just sorting. I always felt that the ability to only show research based on popularity in particular networks could be very powerful.
Here are all of the fields in our search server that contain ALM statistics for you to search on.
||Total article views, all time
||Total article views, last 30 days
||Count of cites, as per scopus
||Count of bookmarks on citeulike
||Count of bookmarks on connotea
||Count of bookmarks in Mendeley
||Count of tweets
||Count of Facebook Likes
||Total article views, from PMC
Example Query: (note: while the below API key works, you are required to get your own key prior to use)
http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title,counter_total_all, counter_total_month,alm_scopusCiteCount,alm_citeulikeCount, alm_connoteaCount,alm_mendeleyCount,alm_twitterCount, alm_facebookCount,alm_pmc_usage_total_all&api_key=DEMO
The default response format is XML, more on this below. Note the parameters used with this query.
- “fl”, this specifies what fields you want the API to return. A full list is a available at: http://api.plos.org/solr/search-fields/.
- The doc_type field, specifically “fq=doc_type:full” for the context of this posting will always be in use. We’ll spare the details for now.
- q=*:* The meat of the search query, *:* is a shortcut for show me everything
Now, if I wanted to sort on article usage data to get the most viewed articles of all time. I would add the sort parameter. Like so: “sort=counter_total_all desc” to the query.
It’s worth noting, your browser may be encoding special characters in the URL. More details can be found here. I’m letting your browser worry about this, if you write a tool, you will have to do this encoding yourself.
Next, lets talk about ranged and filter queries. Below I’m using two new parameters of type “fq”, a Filter Query. Filter query means “Only give me documents that match THIS filter”. “[100 TO *]” matches values in a range.
- fq=subject:”Social networks” — Only show me articles that are tagged with the subject “Social networks”
- fq=alm_twitterCount:[100 TO 10000] — Only show me articles that have between 100 and 10000 tweets.
Now the query. A list of articles about social networks that are popular on a social network:
http://api.plos.org/search?q=*:*&fq=subject:”Social networks”&fq=alm_twitterCount:[100 TO 10000]&fq=doc_type:full&fl=id,title,alm_twitterCount &sort=counter_total_month desc&api_key=DEMO
Fun stuff. You can find available subject areas by going to the PLOS One home page and clicking “Subject Areas”. Any of the terms you see are valid.
Word proximity search
Now let’s try a word proximity search. Lets say I want to find all research that involve sports and alcohol.
http://api.plos.org/search?q=everything:”sports and alcohol”&fq=doc_type:full&fl=id,title&api_key=DEMO
That yields a few results, but not so great. Now lets try a proximity search.
I’m adding the “~15″ text to the query so it will look like so: q=everything:”sports alcohol”~15 — Show me all articles that have these two words less then about 15 words apart.
Many more results! Now let’s try to narrow our results to 7 words apart. Here I’m changing the “~15″ to “~7″
Now, lets also only look at articles that have seen some activity on twitter. Now I’ve added “fq=alm_twitterCount:[1 TO *]” as a parameter.
http://api.plos.org/search?fq=alm_twitterCount:[1 TO *]&q=…
So, I hope you can see where I’m going with this. I’m sure you can combine these parameters for some interesting results.
Lastly, I wanted to touch on the various result types for your search queries. We currently support.
||Formatted for consumption directly into a spreadsheet (like Excel)
||Formatted for consumption by a news reader
||Formatted for consumption by a news reader
||HTML (As a beta)
You can change the result type by adding the wt parameter. For example:
JSON: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=json
CSV: http://api.plos.org/search?q=*:*&fq=doc_type:full&fl=id,title_display&sort=counter_total_month%20desc &api_key=DEMO&wt=csv
ATOM/RSS/HTML are special types and you’ll need to specify an additonal paremeter. These response types are also dependent on a set of terms to be defined as part of the field list. Namely: id, title_display, abstract, publication_date this is because we use an xsl transformation to render these response types.
RSS: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=rss.xsl
ATOM: http://api.plos.org/search?q=*:*&… …&wt=xslt&tr=atom.xsl
HTML: http://api.plos.org/search?q=*:*… …&wt=xslt
This is only the beginning. Please join the discussion, we love feedback!