Delving into subject areas with PLOS Cloud Explorer

As discussed in a previous post in the PLOS Tech blog, PLOS uses a sophisticated approach to classify research articles according to what they are about. Using machine-aided indexing, articles are associated with subject areas from a thesaurus containing over ten thousand terms. You can now explore an interactive visualization of the entire thesaurus, which uses article data from PLOS journals to show how different fields of research are interrelated, and how that has changed over time. Check it out: PLOS Cloud Explorer.

We made this web app while we were students together in a course on working with open data at the UC Berkeley School of Information last spring. We were interested in doing something with open data pertaining to scholarly literature, which would enable both researchers and curious members of the general public to explore trends in research and interactions between research topics. Naturally, we looked to PLOS as a source for open data about scientific research. As a publisher of open access journals, PLOS articles and metadata are all Creative Commons-Attribution licensed. PLOS has an open search API as well, which provides access to full article data and metadata—including sets of subject area terms for each article, which specify the position of each term within the polyhierarchy of the thesaurus. We wanted to build a tool that would allow users to navigate across fields and reach real articles by harnessing this rich, faceted representation of research areas that is bound to PLOS article data. When we asked about the thesaurus, Rachel Drysdale kindly provided us with a full copy—it’s now also available on GitHub.


The fabulous complexity of PLOS’s classification of research articles hasn’t really been surfaced on the PLOS website. Although PLOS ONE has a subject area browser as part of its search interface, we found this difficult to navigate as part of an exploratory search, and started thinking of ways to add context to this kind of experience. We decided to create an interactive tree, using D3.js, that illuminates the larger structure of the relationships between research areas. As you browse the tree, graphs in the dashboard show how many articles have been published each year within the current field, and which other major disciplines those articles are also associated with. The word cloud shows which specific subject terms (the leaves of the tree) are most prevalent among articles in the selected field, and clicking on a word in the cloud takes you directly to a query of that term on the PLOS website. Early on in the making of this tool, we were inspired by a word cloud example of a specific query, and built PLOS Cloud Explorer around this notion of using a dynamic word cloud, filtered on interactive charts that provide context, to reach real documents of interest.

PLOS Cloud Explorer reveals the interconnectedness of research areas that are represented and developed in PLOS journals. The word cloud and the histogram visualizations show that many fields of study are highly interconnected: PLOS articles tend to be associated with interdisciplinary research, such as combining Medicine and Physical Sciences. You can also observe and explore trends in the number of articles over time for a given field (using the time series graphs), and also trends in the collaborations among research areas (using the histogram and word cloud). We hope you enjoy exploring!

What you see in PLOS Cloud Explorer is based on data about all the 126718 articles published in PLOS journals up until July 21, 2014, and represents a snapshot of the PLOS Thesaurus in its current state of evolution. You can find our source code and documentation on GitHub.

About the authors: Anna Swigart and Colin Gerber are graduate students in the UC Berkeley School of Information. Akos Kokai is a graduate student in the Department of Environmental Science, Policy, and Management at UC Berkeley.

This entry was posted in Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>