Latest Developments in PLoS Article-Level Metrics

PLoS continues to expand and refine Article-Level Metrics (ALM). This suite of performance measures (including usage statistics, citations, trackbacks from blogs, bookmarks, social media coverage and user comments and ratings) are available on every PLoS article so that authors and the scientific community can assess the impact of the research. We are also broadening our outreach activities to spread the word on ALM to more researchers, technical experts, other publishers, funders, and institutions.

A key part of the current effort is to convene scholarly metrics thought leaders to help spearhead the widespread adoption of ALM. By engaging leading authorities in metrics, and bringing them together in a working group, PLoS can better coordinate the development of ALM. The following experts serve on the ALM Technical Working Group in an advisory role to help steer the direction of PLoS ALM implementation:

  • Pedro Beltrao, University of California at San Francisco
  • Phil Bourne, University of California at San Diego
  • Bjoern Brembs, Freie Universität Berlin
  • Martin Fenner, PLoS
  • Duncan Hull, European Bioinformatics Institute
  • Cameron Neylon, Science and Technology Facilities Council Oxford
  • Heather Piwowar, NESCent, Duke University
  • Jason Priem, University of North Carolina at Chapel Hill
  • Dario Taraborelli, Wikimedia Foundation
  • Jevin West, University of Washington
  • Johan Bollen, Indiana University

Starting May 16, 2012, Martin Fenner will join PLoS as the ALM Technical Lead. Martin announced the news on his PLoS blog Goobledygook last month. Martin has been a leading advocate of scholarly metrics and has worked on several article usage applications. As ALM Technical Lead, he will not only help with the development of the PLoS ALM application, he will also lead developer outreach for the project.

Further information on all these developments are available on a new ALM website launched today, please take a moment to check it out. We also encourage you to visit our journals and to view ALM by clicking on an article metrics tab.

Category: Publishing | Leave a comment

New Hope – The New Platform for the PLoS Journal Websites

After five years of hosting the PLoS journals on Topaz, the PLoS development team decided earlier in the year that it was time to re-think the platform for the next five years. They came up with a new architecture, named New Hope, which leverages best practices in developing enterprise platforms, a private “cloud” of virtual servers and a distributed file system that contains multiple copies of site content.

This new environment is scalable to support the future growth of the journals, flexible in that it can store any type of data/content, built to minimize downtime, much easier for developing new features and best of all, it makes the journal websites perform much faster.

The migration to New Hope occurred over a 3 day period in November and New Hope officially went into production on November 14. This migration was the culmination of months of development and testing. The migration was completely seamless and users experienced no downtime!

Three weeks after the migration to New Hope, we can show that the new platform really did enhance our journal’s performance. For example:

  • Average load time of this PLoS ONE article went from 4 seconds to 0.8 seconds.
  • Nightly indexing of article data from Mulgara to Solr used to take 3-6 hours. From MySQL to Solr, the indexing now takes 24 minutes.

Warmest congratulations to the New Hope development team (and our intrepid Linux systems administrator) for building a streamlined new home for the PLoS Journals!

Category: Technology | Leave a comment

PLoS Website Maintenance – December 21 5am-7am PST

Due to scheduled maintenance at our co-location facility, all PLoS websites will be intermittently unavailable on December 21 between 5:00am – 7:00am PST. During the downtime, you can access the PLoS journal archives via PubMed Central.

Category: Technology | Comments Off

Maintenance for PLoS Journal Websites on September 29 at 6pm PDT

We will update the PLoS journal websites with the Ambra 1.2 release tonight along with server upgrades. The PLoS journal websites will be down from approximately 6:00pm PDT to 7:00pm PDT. During this time, the journal websites will display a site maintenance page directing users to PubMed Central. The Ambra 1.2 release is a maintenance release to resolve some of the slow page loads that have been occurring over the past two weeks.

Category: Technology | Leave a comment

Update of the PLoS Journal Websites to Ambra 1.1

We will update the PLoS journal websites with the Ambra 1.1 release tonight along with server upgrades. The PLoS journal websites will be down from approximately 7:30pm PDT to 9:30pm PDT. During this time, the journal websites will display a site maintenance page directing users to PubMed Central.

The features implemented in Ambra 1.1 include:

  • Support for NLM DTD 2.3 and new XSL stylesheets. The single XSL stylesheet has been split into two stylesheets: a generic XSL stylesheet to handle NLM DTD 2.3 and an XSL stylesheet specifically for Ambra.
  • Support for HTML iframes for advertising blocks.
  • Caching of CrossRef search results on the “Find this Article Online” page.
  • Fix to author and editor search facet sorting.
  • Fix to search results for author affiliation.
Category: Technology | Comments Off

PLoS ALM Data in Google Fusion Tables

Google Fusion Tables is a new Google labs endeavor that allows people to upload data tables from spreadsheets for sharing and visualizing data online. Google provides the Fusion Tables API for programmatic access to the data content. The PLoS Article Level Metrics data from May 18, 2010 was uploaded to Google Fusion Tables and is publicly available.

The Google Fusion Table links to the PLoS Article Level Metrics data are:

You can compare this data to the PLoS ALM data that was uploaded to Many Eyes in October 2009 and put into some nice visualizations by Mike Chelen.

Category: Technology | 2 Comments

Colo Move and Journal Site Upgrades

Two big events happened recently for the I.T. team. We moved the PLoS production servers to a new colo facility and upgraded the journal websites to the latest Ambra release.

Josh, Russ and I recently moved all of our production servers from UnitedLayer to Internet Systems Consortium (ISC) in Redwood City. Russ implemented a failover stack for the journal websites so there was no site out during the move. Unfortunately, once the servers were moved, things did not go as smoothly as planned due to server malfunctions and network configuration issues. We were able to switch the sites from the failover stack on Sunday, May 23 but needed a few more days and some quick site outages to deal with cleanup. Josh was almost given his own office at ISC because he was onsite for a number of days. The dust has settled but we still have a few things to do at the new colo and don’t expect any further site outages.

The journal websites were updated last night to the release of Ambra 1.0 (“Babbage”). This release contains the new and improved search UI for advanced search and the search results. Liz has a nice writeup on the new features. If you haven’t seen the new UI yet, head over to your favorite PLoS journal and do a simple and advanced search.

The sites were slow for a few hours after the upgrade because a Yahoo crawler bombarded our sites, completely ignoring the robots exclusion standard (robots.txt) that we have in place. Upgrades require clearing a cache that holds ~1 million objects (mostly images) and it has to slowly refill over a period of ~24 hours. The sites can normally handle this traffic when the cache is filled, but the combination of the cache just starting to refill and the Yahoo crawler slamming the sites at the same time caused the sites to slow down. We have blocked the Yahoo crawler but will re-enable it once the cache fills and/or Yahoo responds to our complaints about their crawler.

The journal websites are stable now and barring more voodoo doll shenanigans, we shouldn’t have any other unplanned site outages or slowdowns.

Category: Technology | Leave a comment

A new search server is powering the PLoS journal websites

Last night, we hooked up a new search server to the PLoS journal websites and now search queries are returning results at blazing fast speeds. Now that the new search server is in place, we’re working on a complete UI overhaul for search. We should complete the development in a few weeks and launch the much-improved search UI in May.

The new search server is using Solr, an Open Source search server originally developed by CNET Networks, with the Lucene search engine. We moved search from the Mulgara triple-store to its own server which improved the performance of search dramatically. In fact, the new search server has improved the overall performance of the PLoS journal websites by a factor of ten. Articles that would take 2.7 seconds to appear are taking just 0.27 seconds.

The new search server also allows us to add a lot of new features to the advanced search form and to the search results page. The next release of Ambra will add a query builder for power searches and a greatly improved search results page including article usage statistics from our Article Level Metrics project. The PLoS development team is hard at work on these new features which we will launch in May.

Category: Technology | Leave a comment

Details on Outage and Recovery of PLoS Journal Websites

UnitedLayer, our collocation facility for the production servers, experienced an outage yesterday. From UnitedLayer: “A series of power brownouts occurred today at 2:56 PM PST due to PG&E instability related to the recent storms. Our 300KVA UPS system is not working as designed, the temporary repairs from last week did not hold. We anticipate a faulty motherboard.”

A number of our servers (all powered by the 300KVA UPS) lost power at that time. Our large disk array (2TB of storage) that is the file server for both Fedora and Mulgara had a boot failure and refused to power up. Russ went to the colo and restarted the disk array which went into an automatic rebuild of the disks. This took about three hours to complete. Russ then started a program that checks for disk consistency and repairs any problems in the drives. This program was still running at 8pm – any recovery would have to wait until the program ended (many more hours). We made a decision to stop the program, format the drives and restore Fedora and Mulgara from a previous backup to speed up recovery.

We estimated that it would take ~3.5 hours to restore Fedora from backup. It took ~5.5 hours. Once complete, Russ brought the systems back online at ~2:50AM PST. Big thanks to Russ for babysitting the file server the whole day/night and for bringing up the system after the backups completed.

This is the first time that we had a major hardware malfunction to the large disk array and the first time we had to restore from a backup. While the disaster recovery plan worked, it took much longer than expected due to the size of the Fedora storage. We will look into solutions that enable a quicker disaster recovery. We are also meeting with UnitedLayer to discuss mitigation options.

Category: Technology | Leave a comment

PLoS Journals Outage – January 19, 2010

The PLoS journal websites are experiencing an outage due to a hardware malfunction after our colocation (United Layer) experienced a brown out due to lightning. We’re working to resolve the problem and hope to have the journal websites back online soon.

Category: Technology | Leave a comment