Late last year we published a blog post in these pages – “PLOS and DBpedia – an experiment towards Linked Data” – where we recounted our first cautious steps in exploring how we might relate PLOS Subject Area terms to related concepts outwith the PLOS publishing platform. Delightfully, several people were interested in this attempt and contacted us with thought-provoking questions and suggestions.
Meanwhile, as a consequence of a meet-up of the Crossref Taxonomy Interest Group ….
….we became aware of the nature.com ontologies publishing portal and, having made contact with Tony Hammond (Data Architect, Springer Nature) we met to discuss possible ways forward.
Out of this meeting a plan arose:
• The NPG ontologies are already part of the semantic web by virtue of their relationship to DBpedia and MeSH, as shown in their Subjects Ontology diagram.
• NPG wish to explore establishing linking relationships between their Subject Area terms and ours at PLOS.
• If we were to enable this link to NPG to be made, not only would that would be useful in itself but, by virtue of those links, we could become part of the semantic web due to NPG links-through to DBpedia and MeSH.
• The world would then have another route into PLOS content.
So whereas our original approach explored how to relate URLs from an external database – DBpedia – to the corresponding PLOS Subject Area terms, this second approach focussed on making our own Subject Area Landing Page URLs easily accessible in machine readable form, by explicitly including them within the PLOS thesaurus data structure and posting that data structure in open repositories in the public domain.
How did we enable this? We embarked on a two-step process to surface the Subject Area Landing Pages in association with the PLOS thesaurus.
The first step was to add the Subject Area Landing Page URLs explicitly to the thesaurus:
Subject Area Landing Pages exist for all Subject Area terms that are EITHER specifically associated with a PLOS ONE article, OR lie as a Broader Term in the hierarchical path through the PLOS Taxonomy to those specific terms. This first step itself had three parts: generating the list of Subject Area Landing Page URLs, modifying the PLOS thesaurus data structure such that it could accommodate the URL data, and loading the Subject Area Landing Page URLs into the PLOS thesaurus.
Upon completing this phase we had loaded 9336 Subject Area Landing Pages into the PLOS thesaurus, where they could be viewed in the new “foaf:homepage” field (shown here in the MAIstro taxonomy thesaurus management application).
Having this relationship encoded in the thesaurus makes explicit the link between Subject Areas e.g. “Circadian oscillators” and their corresponding Landing Page, e.g. http://journals.plos.org/plosone/browse/circadian_oscillators
The second step was to generate an export of the thesaurus to be posted in the public domain in a format that is of use to those wishing to capitalize on the content or links. We chose to work with SKOS (“Simple Knowledge Organization System”) format – a standard way to represent knowledge organization systems using Resource Description Framework.
The main relationships in this SKOS representation of the PLOS thesaurus are:
<foaf:homepage rdf:resource=”Subject Area Landing Page URL“/>
<skos:broader rdf:resource=”Broader Term URL“/>
<skos:narrower rdf:resource=”Narrower Term URL“/>
And from the original Proof of Concept exercise:
<skos:exactMatch rdf:resource=”DBpedia URL“/>
The SKOS version of our enhanced thesaurus has now been posted at both the PLOS thesaurus github site and at Bioportal and this first attempt at liberating the Subject Area Landing Pages alongside the thesaurus, and posting SKOS format as well as the time-honored xml and spreadsheet versions (all in github), has drawn to a close.
As ever, there is still work to do. No well cared-for thesaurus stands still so although we have a strategy for making additions to the foaf:homepage fields as new Subject Areas are added, we’ll need to establish a checking protocol to accommodate the implications for terms that are Broader to the additions. Some of these Broader terms will stand to gain Subject Area Landing Pages as a consequence of having the new Narrower Terms. Other pre-existing but un-applied Subject Areas may also gain Landing Pages as a consequence of improvements to the RuleBase behind the Machine Aided Indexing of our articles.
Additionally, although we are reasonably happy with the structure of the relationships within the SKOS format that has been posted, there are components that may mature in future versions. The process by which the RDF file was generated included a mix of download from MAIstro and manual editing – we hope to hone this to full downloading perfection, with no manual edits necessary, going forward. Plus (so inevitably!) we must resolve a “special characters” problem that caused the few Subject Area Landing Page URLs for terms with special characters to drop out….
But in the meantime this work as it stands is a significant step forward and we hope it enables further progress for Linked Open Data, and look forward to a new round of input and suggestions from those who think about the world, and the knowledge in it, in these terms.
The “we” in this post includes:
• Tony Hammond (Data Architect), Michele Pasin (Product Manager – Knowledge Graph) and Andrew Needham (Ontology Manager) – all from Springer Nature,
• Bob Kasenchak (Director of Product Development) and Mary Garcia (Systems Analyst) from Access Innovations,
• Sahar Aghajani (Software Development Supervisor) and CJ Rayhill (Chief Technology Officer), here at PLOS.