ALM, the Research of Research – Recent Developments

Article-Level Metrics (ALM) capture a broad spectrum of activity on research articles, offering a window into how researchers engage with scientific findings. We are beginning to understand what these data mean as ALM matures, the momentum continues to build, and the broader scholarly community joins the conversation. One important aspect is scholarly research. The June 23 altmetrics14 conference, part of the ACM Web Science Conference, is an important venue to present and discuss this work. PLOS has participated in three projects to be presented (abstracts).

Brainstorming community needs for standards & best practices related to altmetrics

Todd Carpenter and Nettie Lagace  from the National Information Standards Organization (NISO), together with one of the authors of this post (MF), will present work on Brainstorming community needs for standards & best practices related to altmetrics. This work was done as part of the first phase of the Sloan-funded NISO Alternative Assessment Metrics Project, and is summarized in a white paper that went up for public comment this Monday. The white paper lays out 25 potential action items for further work by NISO and the community. These action items were grouped in broad areas such as terminology, use cases, data quality, aggregation, and context. The white paper was written by Martin Fenner (who chairs the NISO Alternative Assessment Metrics Project Steering Group), Todd Carpenter and Nettie Lagace, but captures the views expressed by the community via three in-person meetings, 30 personal interviews and many discussions in the NISO Alternative Assessment Metrics Steering Group. The document is currently released for public comment through July 18, 2014, and we encourage all to contribute your thoughts.

How consistent are altmetrics providers?

Zohreh Zahedi and Rodrigo Costa from the Centre for Science and Technology Studies (CWTS) Leiden, together with one of the authors of this post (MF), investigated the consistency of ALM data across different aggregators in Analysis of ALM data across multiple aggregators: How consistent are altmetrics providers? Study of 1000 PLOS ONE publications using the PLOS ALM, Mendeley and APIs. Building off of a similar study by Scott Chamberlain (2013), they found rampant discrepancies between the counts harvested by, Mendeley, and PLOS across a number of sources, but focussing on Mendeley, Facebook and Twitter. The results pose a serious challenge to data validity that needs to be addressed as soon as possible, and the conference is a good venue to start this discussion.  In this light, the authors call for greater transparency of data collection and provenance as well as convergence in the methods of collection.

Wikipedia references across PLOS publications

The third project, from the authors of this post, explores Wikipedia references across PLOS publications. As the importance of Wikipedia within the scholarly community is growing – Wikipedia is among the top 10 referrers to scholarly articles – little is known about the referencing behaviors (i.e., how, what types, etc.). The preliminary view of the data shows that coverage is moderate (with coverage on par with science blogs) and international (with only half the mentions of PLOS articles in the English Wikipedia). The pattern of references is distinct from popular social networks and other ALM (including usage and citations). Correlation was found instead between the number of Wikipedia references and the number of active editors for each Wikipedia.


Category: Tech | Leave a comment

Research on the move – new mobile sites for PLOS journals

PLOS is pleased to announce new mobile websites for its suite of influential journals. The mobile journal experience has been optimized for easy browsing on small screens, with a simplified interface that highlights popular and newsworthy content. It features:

  • Prominent article titles and abstracts for quick browsing
  • Condensed article sections to make it easy to get to the right content
  • Flexible display options for search and browse results
  • Streamlined figure views with the option to open and zoom into full resolution figures

PLOS Mobile

As is sometimes necessary, this initial release does not include every bit of functionality you currently see on the full site. For example, while you can read article comments from your phone, posting new comments is not yet supported. We’ve worked hard to include those features researchers are most likely to utilize from a phone, and for content or functionality not yet optimized for phones we’ve provided links to corresponding pages on the full site.

All the articles that PLOS publishes have always been immediately free to access online, distribute and reuse; now they are available to readers wherever they roam. Let us know how you conduct research on the go!

Category: Tech | Leave a comment

Thesaurus evolution – a case study in “Synthetic biology”

Science does not stand still and neither does the PLOS thesaurus. With more than 10,700 Subject Area terms, we use the thesaurus to index our articles and provide useful links to related papers, enhanced search functions, and, for PLOS ONE (more than 90 articles published every day!), customizable Subject Area-based email alerts and Subject Area landing pages.

Sometimes we decide to renovate a sector of the thesaurus to better reflect the make-up of the PLOS corpus. For example, we’ve long had a Subject Area term for “Synthetic biology,” sitting beneath “Biology and life sciences.” We even have a healthy Synthetic Biology Collection. However, the Subject Area term “Synthetic biology” was being applied to only a handful of articles despite the fact that many more PLOS articles were about synthetic biology and should ideally have been indexed accordingly. Why was this?

Part of the explanation is that ‘synthetic biology’ is not a phrase that is frequently used in natural language. So whereas an article about hypertension may use the word ‘hypertension’ 26 times within the text, an article about synthetic biology might state ‘synthetic biology’ rarely, if at all. This poses a challenge to the Machine Aided Indexing process which assigns Subject Areas to articles based on the frequency of matches in the text.

The way around this is to introduce a level of abstraction to the rulebase that governs the Machine Aided Indexing. The base rules are very literal: “if I see ‘synthetic biology’ in the text I’m going to use the ‘Synthetic biology’ Subject Area term.” But there are additional words and phrases that are diagnostic of synthetic biology topics, such as “biobricks” and “Registry of Standard Biological Parts.” Adding rules for these terms – for example “if I see ‘Registry of Standard Biological Parts’ in the text I’m going to use ‘Synthetic biology’” – increases the frequency of indexing to “Synthetic biology” and thus the retrieval of relevant articles in our searches.

A second factor is to do with the hierarchical structure of the thesaurus – an especially important factor given that our search functionality is designed to utilize this hierarchy. For example, a Subject search for “Vascular medicine,” beneath which Hypertension sits, retrieves articles indexed specifically with Hypertension, even if they have not been explicitly tagged with “Vascular medicine.” In earlier versions of the PLOS thesaurus “Synthetic biology” had no narrower terms, and this was doing it no favours with regard to how useful it was for retrieving relevant articles. We therefore reviewed essays about synthetic biology, scope descriptions from relevant institutional and departmental web sites, and proceedings from synthetic biology conferences, all in light of the content of our articles, and introduced new, narrower terms to sit beneath our existing “Synthetic biology” where that made sense.  So we went from having the single “Synthetic biology” term to the new structure of 30 terms in one renovation.  Here is what we have now:


Much of the evolution of the PLOS thesaurus is gradual, as for example when we realised that “puma” can be used as an abbreviation for “p53 upregulated modulator of apoptosis” as well as a kind of big cat, or learned that asteroids can be starfish. Dealing with these indexing missteps requires small-scale changes to specific rules. But sometimes the change needs to be more radical. Our new “Synthetic biology” sector was implemented in Ambra 2.9.12 (released March 26th, 2014). Where previously only a handful of articles was indexed with “Synthetic biology,” now a Subject search across all PLOS journals retrieves over 400 “Synthetic biology” articles – much more fitting for this important and developing field.

For more about the work PLOS is doing with Synthetic biology see “An Invitation to Contribute to the Second Life of the Synthetic Biology Collection.”

Category: Tech | 1 Comment

Getting to CrossMark

This week, we launched our participation in CrossRef’s CrossMark program. It’s an exciting step for PLOS, and getting there was a learning experience we hope you’ll find interesting.

The Program

CrossMark is a service of CrossRef that is gaining traction among scholarly publishers, with more than 30 publishers to date, and nearly half a million scholarly documents. The purpose of the CrossMark logo appearing on article pages is to give researchers a consistent way to know the status of any article, from any participating publisher. When someone clicks the CrossMark logo from either the online version of the article, or the PDF, they see a popup like this one. It indicates that either the article is up to date, or that updates are available.


It’s clear that the CrossMark service is valuable for keeping content current, which assists the integrity and completeness of the scholarly record. It’s also worth highlighting that we’d like our initial CrossMark participation to be the first step toward additional exciting uses in the future. We could extend our CrossMark usage to…

  • support article versioning
  • display FundRef info
  • display info about our peer review process
  • link to related data
  • experiment with threaded publications
  • …and more

The Journey

Getting from “we want to participate in CrossMark” to “the CrossMark logo is live” was a process that took time. Seven months, if you want to know the truth! Don’t let that scare you if you’re a publisher interested in kicking off your own CrossMark participation. The main reason it took us 7 months is that we bundled the CrossMark initiative into a larger corrections handling overhaul, which included a massive data migration effort. Anyone who has been through one of these will tell you the same thing: data migrations are not for the faint of heart. And in retrospect, this bundling of initiatives was a decidedly un-Agile way to go.

So the overall initiative included overhauling our corrections handling process, which meant switching systems for inputting and publishing correction notices. This new process required system development, which in turn required documentation, training, and hands-on practice for a pretty big chunk of our staff. And then there was the data migration effort, which took a long time on its own. (None of this part of the initiative included our CrossMark program implementation.)

Then, we tackled the CrossMark piece, which was fairly straightforward in the scheme of the overall project. We added the CrossMark logo to articles: the CrossMark logo now appears on every PLOS article page on our journal sites, and on the downloadable PDFs for all newly-published articles going forward. And we updated our deposit toolchain to include the CrossMark metadata. But there were a few complications, because of the aforementioned data migration.

First, we chose to create a back-deposit of CrossMark data for our entire corpus. Over ten years of publishing equals somewhere around 110,000 articles, as well as over 3,000 migrated corrections. Naturally, things change over time. How does a person get a grasp of the minor differences between article XML generated over ten years? You can look at a few files from various periods in each year, but that’s just barely scratching the surface. You still have no clear idea of what might actually be different. A metaphorical needle in a gigantic digital haystack. So we wrote some XSL transforms, threw the whole lot at ’em, and temporarily kicked some cans down the road. We figured we’d let CrossRef’s submission results tell us if something was wrong. After sending off 110,000+ XML files (with a slight chuckle) and letting the script run for about twelve hours, we had a pretty decent success rate. After some slight tweaking, the rest were good to go as well.

Dealing with back-deposits for our migrated corrections was a bit dirtier, and required a little more clean-up. First they had to be re-formatted simply for display on our website in their new form, and then mined for the needed CrossMark deposit information before sending the XML off for deposit (thanks for that .jar file, CrossRef!). The vast majority of the work was accomplished with a small toolset, really. Some .jar files (provided by CrossRef), and some XSLT files did most of the heavy lifting. Though how you compile and prepare your corpus could vary from ours.

And now a few words about article PDFs for our CrossMark program. As we mentioned, the CrossMark logo appears on PDFs for articles we publish going forward. We chose to back-update the online versions of our articles to include full CrossMark functionality, but we decided not to update the 110,000+ downloadable PDFs for previously-published articles. It was a decision based more on our unique volume situation, and less about the process of updating the PDFs. The marking and stamping process is simple, once you have it set up. But we decided that the testing and remediation challenges associated with replacing 110,000+ active PDFs was too much to take on at this time. CrossRef leaves it up to the publisher in terms of whether you choose to fully update your corpus, or start participating in CrossMark from a given date onward. We took a bit of a hybrid approach because we chose to add CrossMark functionality to all HTML articles, but only to PDFs for newly-published articles.

So there you have it! Overall, getting to CrossMark turned out to be a bit more of a journey than we anticipated, but we have arrived, and we’re glad we took the trip. We hope this post is useful to any of you who may be considering kicking off a CrossMark participation program of your own.

Category: Tech | Leave a comment

Feedback Wanted: Publishers & Data Access

Short Version: We need your help!

We have generated a set of recommendations for publishers to help increase access to data in partnership with libraries, funders, information technologists, and other stakeholders. Please read and comment on the report (Google Doc), and help us to identify concrete action items for each of the recommendations here (EtherPad).


Background and Impetus

The recent governmental policies addressing access to research data from publicly funded research across the US, UK, and EU reflect the growing need for us to revisit the way that research outputs are handled. These recent policies have implications for many different stakeholders (institutions, funders, researchers) who will need to consider the best mechanisms for preserving and providing access to the outputs of government-funded research.

The infrastructure for providing access to data is largely still being architected and built. In this context, PLOS and the UC Curation Center hosted a set of leaders in data stewardship issues for an evening of brainstorming to re-envision data access and academic publishing. A diverse group of individuals from institutions, repositories, and infrastructure development collectively explored the question:

“What should publishers do to promote the work of libraries and IRs in advancing data access and availability?”

We collected the themes and suggestions from that evening in a report: The Role of Publishers in Access to Data. The report contains a collective call to action from this group for publishers to participate as informed stakeholders in building the new data ecosystem. It also enumerates a list of high-level recommendations for how to effect social and technical change as critical actors in the research ecosystem.

We welcome the community to comment on this report. Furthermore, the high-level recommendations need concrete details for implementation. How will they be realized? What specific policies and technologies are required for this? We have created an open forum for the community to contribute their ideas. We will then incorporate the catalog of listings into a final report for publication. Please participate in this collective discussion with your thoughts and feedback by April 24, 2014.

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

(Blog post is cross-posted on Data Pub)

Category: Tech | Leave a comment

The Mystery of the Missing ALM

What is ten years old and has 1.2 billion interactions a day?

No, it’s not the Large Hadron Collider. Nor is it a philosophical riddle, but rather a commonly known fact in the tech world (news here). Facebook, the largest social network, is bigger than LinkedIn, Twitter, and Google+ – combined. At the same time, it is largely absent in the scholarly communications conversations, which focus on Twitter and Mendeley.

Continue reading »

Category: Tech | Leave a comment

Disambiguation with ORCID at PLOS

Here at PLOS, linking authors and publications continues to be a persistent problem. Trying to find that one paper by “John Kim” that you wanted to reference? The prospect of wading through numerous “John Kim’s” can seem daunting. At PLOS ONE when searching for John Kim as an author you get four different results. Combing through search results and DOIs is not ideal. Throw ambiguous reviewer and academic editor identities into the mix and things only get worse.

Enter Open Researcher & Contributor ID (ORCID). ORCID provides a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized. The ORCID service launched in October 2012, and to date more than half a million researchers have registered for an ORCID identifier. During that time, a number of integrations were launched that supported the visibility and adoption of ORCID identifiers by the community, and we at PLOS are proud to be a platinum sponsor of ORCID and one of the more than 100 organizations who have become ORCID members.


ORCID integration chart at

Integration with the ORCID service can happen in a variety of ways, and that of course depends on the type of organization, e.g. publisher, academic institution or funder. As a publisher, the following integrations are important at PLOS:

  1. allow authors to provide their ORCID identifier when they submit a manuscript;
  2. allow authors and other contributors to link their ORCID identifier to their PLOS profile page;
  3. include the ORCID identifiers for authors in the manuscript metadata sent to services such as CrossRef and PubMed;
  4. import and verify the authorship claims made by authors in their ORCID profile for already published PLOS papers (and again send this information to CrossRef and PubMed);
  5. import other information from the ORCID service into the PLOS profile such as affiliation, publications and other research outputs not published with PLOS, and grant information.

In September 2013 PLOS announced integration #1 allowing authors to include their ORCID identifier with manuscript submissions. Today we are happy to announce integration #2, allowing everyone with a PLOS account to link their account to their ORCID identifier. This is done via the OAuth protocol and works via a three-step authentication process:

  • First, we (PLOS) authenticate ourselves with ORCID
  • Second, you (the user) authenticate yourself with ORCID, and grant PLOS the authority to read data from your ORCID profile on your behalf.
  • Finally, PLOS queries ORCID for metadata about you (the user).


Authenticating with ORCID and allowing PLOS to read profile information. 

At any point in time our users will have the authority to revoke this access by de-linking their PLOS account from the ORCID service, again via OAuth.


PLOS account with linked ORCID identifier. 

Linking the PLOS account with the ORCID identifier is a very important integration step and we encourage all users to update their accounts. And if you don’t have an ORCID ID yet, you can easily create an ORCID ID in the process.

As you can see from the list of needed integrations above, there is still a lot of work to do. Not only do we have to encourage everyone to add their ORCID ID, but we also have to exchange this information with external services. It is not enough that PLOS knows what ORCID identifiers are associated with what PLOS publications, but this information has to be shared with external services such as CrossRef, PubMed, commercial indexing services such as Web of Science and Scopus, and of course ORCID. Additionally, PLOS wants to know about the authorship claims to PLOS papers made in other places, e.g. the ORCID registry.

As of last Friday, there were 551,203 live ORCID IDs, but ONLY 121,529 ORCID IDs with at least one linked work (e.g. publication). Do your part : The more works that are linked  to ORCID IDs, the more rich the ORCID profiles (associated publications, affiliation and (coming soon) grants) and assurance that you are recognized for your hard work and successful disambiguation.

Category: Tech | 2 Comments

One Step Closer to Article-Level Metrics openly available for all Scholarly Content


Since PLOS embarked on the Article-Level Metrics (ALM) work in 2009, we have always imagined a future in which ALMs would be freely available regardless of publisher. Metrics would be compiled to facilitate comparisons between articles and add even greater value to the scholarly community.

PLOS built the ALM application to handle our own substantial publishing enterprise, and it has been running well for us for almost exactly 5 years now. We made the ALM software available as open source software in 2011, and last year a number of publishers have started to use the ALM software for their own journals. As more publishers are expressing interest in collecting and displaying the data, as well as the ongoing efforts to discuss altmetrics standards and best practices, in particular the NISO Altmetrics Initiative, we have seen the discussion shift from WHAT (are ALMs and should we care) to HOW (can we implement ALM)?

As part of this increased interest in ALM we today are launching the ALM community site. This site contains a lot of content previously available in a number of separate places, and we plan to add more content in the coming months. We like the Examples section which showcases ALM visualizations of done with d3.js and R, with source code and data openly available to make it easier for people to get started using ALM data.

With the launch of the CrossRef Labs ALM application today, we get another big step closer to that goal. For the first time, the latest ALMs will be available for journal articles and other content items (books chapters, data, theses, technical reports) from thousands of scholarly and professional publishers around the globe. The publications span the entire spectrum of scholarly research, including life sciences, physical sciences, humanities, social science, etc. As this CrossRef Labs experiment is just getting started, it will take a couple of months to begin collecting data for the 11 million+ publications from 2010 onwards, currently included. But there’s no need to wait until then. We encourage everyone to start using the data that has already been harvested, which is expanding to cover more of CrossRef’s collection on a daily basis. All data are freely available online or via API for customized, bulk requests.

We invite everyone across the scholarly research ecosystem – researchers, bibliometricians, institutions, funders, librarians, technology providers, etc. – to think big with us. Now that we have systematic data about the activity surrounding research publications, how do we turn it into useful information for discovery and evaluation? This is a work in progress, not yet a formal service. But the launch of the ALM application by CrossRef Labs is a monumental step towards making ALMs an underlying and integral part of the infrastructure that supports the research process and facilitates its progress.

When testing out the CrossRef Labs ALM application, please keep in mind that this a lab experiment. Not only are there some differences in the available data sources for a variety of reasons, but the number of articles managed by the CrossRef Labs ALM application is larger by orders of magnitude. So please be patient as you try out this new resource. As an open source project, it is under active development, so look out for continued improvements and new types of article activity soon coming.

Category: Tech | 5 Comments

A new way to explore figures

We’re thrilled to announce another improvement to our journal websites at PLOS. Head over to your favorite article and open any figure to see our brand new figure lightbox that makes it easy to explore the details of very large figures. I recommend this fantastic article about thresher sharks.

You’ll notice that figures open to take advantage of your full browser window. If you use a large monitor, maximize your browser window to see the new lightbox in full effect.




For large, detailed figures you can now zoom in and pan around to study every data point or nuanced tail slap.




The full caption is still easily accessible in the lower right, while simple navigation options will help you quickly skim through a figure-heavy paper.




And to keep things snappy, the lightbox will initially load a smaller version of the figure while the full resolution image loads in the background. Once the larger version is fully loaded, it will automatically replace the smaller version, enabling the zoom and pan options mentioned above.

We’ve also added some new functionality to help you quickly assess whether an article is relevant to your research. When you’re browsing the PLOS ONE homepage, subject area browse pages, figure-based search results, or monthly issue pages, you can launch the figure lightbox for research articles without first loading the full article page. From there, you can toggle between an abstract view, the figure browser, or a reference view using the links in the upper right corner, enabling you to quickly judge whether an article is worth further reading.

We hope you find this to be helpful to your research process. As always, we’d love to hear your feedback–feel free to write us or leave your thoughts as a comment below.

Category: Tech | Leave a comment

Connecting all the dots – looking at the research paper & beyond


Research catalyzes research, which catalyzes more research…  



At PLOS, we’re connecting the dots between our publications and the plethora of outside content discussing the research.  On every article, we now showcase the rich tapestry of activity surrounding the research paper.  They are pulled together by our staff and are now collectively shared with the article.  We surface stories about the research covered in news media, blog commentary, as well as a diverse set of contexts.  As a whole, these stories enrich the research narrative and offer a vivid view into broad scholarly and public activity surrounding the article.

Continue reading »

Category: Tech | 4 Comments