Using d3.js to visualize Article-Level Metrics over time

PLOS Article-Level Metrics (ALM) are a great set of data (available via API and as monthly data dump) for some nice data visualizations. I have recently become a big fan of the d3.js javascript library, and have now used d3 to look at some ALM data over time.


Continue reading »

Category: Thoughts | Tagged | Comments Off

New version of Article-Level Metrics app released

On Tuesday we released the latest version of the PLOS Article-Level Metrics application. As always, the source code is available at Github. The changes in this version focus on improving API perfomance, making it easier to install the application, and RSS feeds for the most popular articles by source and publication date (e.g. the most tweeted papers published in the last 7 days). See the Github Wiki page for more details, in the Wiki you also find the development roadmap and the issue tracker for feature suggestions.

Category: Snippets | Tagged , | Comments Off

The Price of Innovation – my Thoughts for Beyond the PDF

The Beyond the PDF Conference is currently taking place in Amsterdam. Unfortunately I am unable to attend in person this time (I took part in the first Beyond the PDF in January 2011), but I was watching the livestream of the Business Case panel disucssion yesterday afternoon.


Continue reading »

Category: Thoughts | Tagged , , | 14 Comments

Bye-bye Google Reader

Yesterday Google announced that they will shut down Google Reader July 1st. In a way this announcement didn’t surprise me, as my own use of RSS readers has gone down in favor of news readers such as Flipboard and using Twitter as a discovery tool. And built-in support for RSS had slowly been depreciated in web browsers such as Firefox (version 4, 2011) and Safari (version 6, 2012).


Continue reading »

Category: Snippets | Tagged | 1 Comment

Additional Markdown we need in Scholarly Texts

Following up from my post last week, below is a suggested list of features that should be supported in documents written in scholarly markdown. Please provide feedback via the comments, or by editing the Wiki version I have set up here. Listed are features that go beyond the standard markdown syntax.

The goals of scholarly markdown are

  1. to support writing of complete scholarly articles,
  2. don’t make the syntax more complicated than it is today, and
  3. don’t rely on HTML as the fallback mechanism.

In practice this means that scholarly markdown should support most, but not all scholarly texts – documents that are heavy in math formulas, have complicated tables, etc. may be better written with LaTeX or Microsoft Word. It also means that scholarly markdown will probably contain only limited semantic markup, as this is difficult to do with a lightweight markup language and much easier with XML or a binary file format.

Cover Page

Optional metadata about a document. Typically used for title, authors (including affiliation), and publication date, but should be flexible enough to handle any kind of metadata (keywords, copyright, etc.).

Typography

Scholarly markdown should support superscript and subscript text, and provide an easy way to enter greek letters.

Tables

Tables should work as anchors (i.e. you can link to them) and table captions should support styled text. Unless the table is very simple, tables are probably better written as CSV files with another tool, and then imported into the scholarly markdown document similar to figures.

Figures

Figures in scholarly works are separated from the text, and have a figure caption (which can contain styled text). Figures should work as anchors (i.e. you can link to them). Figures can be in different file formats, including TIFF and PDF, and those formats have to be converted into web-friendly formats when exporting to HTML (e.g. PNG and SVG).

Citations and Links

Scholarly articles typically don’t have inline links, but rather citations. The external links (both scholarly identifiers such as DOIs and regular web URLs) are collected in a bibliography at the end of the document, and the citations in the text link to this bibliography. This functionality is similar to footnotes.

Citations should include a citation key in the text (e.g. [@Smith2006] or [#Smith2006]) and a separate bibliography file in BibTeX (or RIS) format that contains references for all citations. Inserting citations and creating the bibliography can best be done with a reference manager.

Cross-links – i.e. links within a document – are important for scholarly texts. It should be possible to link to section headers (e.g. the beginning of the discussion section), figures and tables.

Math

Complicated math is probably best done in a different authoring environment, but simple formulas – both inline and as block elements – should be supported by scholarly markdown.

Comments

Comments are important for multi-author documents and if reviewer feedback should be included. Comments should be linked to a particular part of a document to provide context, or attached at the end of a document for general comments. It would also be helpful to “comment out” parts of a document, e.g. to indicate parts that are incomplete and need more work. Revisions of a markdown document are best handled using a version control system such as git.

 

Category: Thoughts | Tagged , | 4 Comments

A Call for Scholarly Markdown

Markdown is a lightweight markup language, originally created by John Gruber for writing content for the web. Other popular lightweight markup languages are Textile and Mediawiki. Whereas Mediawiki markup is of course popular thanks to the ubiquitous Wikipedia, Markdown seems to have gained momentum among scholars. Markdown really focuses on writing content, many of the features of today’s word processors are just a distraction (e.g. (fonts, line spacing or style sheets). Adding markup for document structure (e.g. title, authors or abstract) on the other hand is overly complicated with tools such as Microsft Word.

Fortunately or unfortunately there are several versions (or flavors) of Markdown. The original specification by John Gruber hasn’t been updated for years. Github uses Markdown with some minor modifications. Multimarkdown and Pandoc provide features important for scholarly content, e.g. citations, superscript and tables.

  • Markdown
  • Github-flavored Markdown
  • Multimarkdown
  • Pandoc

The Pandoc flavor of Markdown probably comes closest to the requirements of a scholar, but still has limitations, e.g. support for metadata and tables isn’t very flexible. I propose that we as a community create a new Scholarly Markdown flavor, which takes into account most of the use cases important for scholarly content.

One of the big advantages of Markdown is that the format can not only be translated to HTML, but also to other formats, and Pandoc is particularly good in translating to and from many different formats. We want to make sure that Scholarly Markdown not only translates into nice Scholarly HTML (with good support for HTML5 tags relevant for scholars), but also into Microsot Word, LaTeX and PDF, as these are the formats typically required by manuscript tracking systems.

Some of the features required for Scholarly Markdown include:

  • Superscript and subscript
  • Highlighting text (supporting the HTML tag <mark>)
  • Captions for tables and figures (with support for the HTML tags <caption> and <figcaption>)
  • Support for document sections (the HTML5 tags <article>, <header>, <footer>, <section>)
  • Good table support
  • Math support
  • Good citation support
  • Support for comments and annotations

Multimarkdown and Pandoc of course already support many of these features. Tables and citations are two examples where it is important to not only support them, but support them in a non-intrusive way that doesn’t get in the way of the flow of writing

BTW, this wouldn’t be the first community flavor for Markdown. The screenwriting community has dome this already with Fountain.

Category: Thoughts | Tagged , , | 56 Comments

Altmetrics: first we need the for what? and only then the how? OK?

Altmetrics track the impact of scholarly works in the social web. Article-Level Metrics focuses on articles, but also looks at traditional citations and usage statistics. The PLOS Article-Level Metrics project was started in 2008. The altmetrics manifesto was published in October 2010 and described the fundamental ideas. By October 2011 we had a number of altmetrics tools, fueled by the Mendeley/PLOS API programming contest. In 2012 the focus shifted from the fact that we can provide these numbers to a discussion of the many open questions. We could see this at the altmetrics12 conference in June, and even more so at the altmetrics workshop hosted by PLOS last week in San Francisco.

Altmetrics can provide a large amount of information about the post-publication activity around an article (and other scholarly content), and this is exciting, but at the same time also somewhat overwhelming and scary. Some of the things that we as a community have to figure out include standards for collecting, aggregating and displaying altmetrics data, strategies to combat attempts to game these metrics, and finding appropriate ways for the different organizations providing altmetrics to work together as a community. These and other topics were discussed in great detail at the PLOS altmetrics workshop, and we made excellent progress not least thanks to the excellent moderation by Cameron Neylon. The third day of the workshop was a hackathon, and we were able to translate some of the ideas into prototypes of new tools.

The most important conclusion from the workshop for me personally was that weshould really should focus on use cases. Altmetrics should help answer questions that we can’t answer today, and despite the promise, the various altmetrics tools still have a log way to go. A case in point is the promise that altmetrics can make it easier to find relevant scholarly content. We all use social media to help us find papers and other stuff, but integration of altmetrics into the traditional scholarly search tools is still missing. ReRank is a cool prototype developed during the hackathon last Saturday, but we are still a long way from having altmetrics feeding directly into the relevance sorting of search results.

With these thoughts in the back of mind, I look forward to the altmetrics session at the SpotOn London conference this Sunday afternoon. Sarah Venis from Médecins sans Frontières (MSF) will talk about the questions that she hopes altmetrics can answer for her organization. MSF is very interested to look beyond citations for the impact of their publications, as their primary target audience is not really the scholarly community, but rather people in need in various parts of the world. Marie Boran from the Digital Research Enterprise Institute (DERI) is interested in using altmetrics as a recommendation tool to find researchers with similar interests. Euan Adie from altmetric.com and I (technical lead for the PLOS Article-Level Metrics project) will use our respective tools to try to answer some of these questions. For me altmetrics are primarily tools to tell a good story, and that is one reason why we picked the title Altmetrics beyond the Numbers for this session. The focus of the session will then shift to an open discussion, and I hope we can get some good answers to this and other questions.

A clear focus on use cases should go a long way to reduce that feeling of being overwhelmed by all the numbers that altmetrics can provide. If we have specific goals for which we need altmetrics, it becomes much easier to decide what numbers work best for us, what standards we need and whom to ask to collect this information. AJ Cann and Brian Kelly have written two excellent blog post about the confusion that too many altmetrics numbers can create, and the workshop Assessing Social Media Impact during SpotOn London addresses some of these questions. Hackathons have played an important role in the history of altmetrics. I invite you to come to the SpotOn London hackathon this Saturday if you have some cool ideas and want to get started with the help of others.

Other reports from the PLOS Article-Level Metrics (aka Altmetrics) Workshop

Please let me know if you see other reports of the workshop that I have missed.

Category: Thoughts | Tagged , | 6 Comments

ORCID has launched. What’s next?

Last week has been busy. I went to Berlin for the launch of the Open Researcher & Contributor ID (ORCID) service. ORCID allows researchers to obtain a persistent identifier that can be used to claim publications and other scholarly works. I’m 0000-0003-1419-2405, and we put the ID (and the QR code linking to the profile on the ORCID website) on the name tags for the ORCID Outreach Meeting last Wednesday (Geek alert: I also have received a T-shirt with my name, ORCID and QR code).

I was invited to work with ORCID in early 2010 after writing about the initiative that was started in November 2009 on this blog (ORCID or how to build a unique identifier for scientists in 10 easy steps). And now, after three years and a lot of work by a lot of people, ORCID is real and everyone can use the system. As a researcher, you can go to the ORCID website and register.

Obtaining a number is of course not very interesting in itself, few people get excited about the fact of having a 16-digit unique identifier. What ORCID is really about is claiming your publications and other scholarly works, and Connecting Research and Researchers is the slogan of the organization. In the ORCID system you can now claim publications found in the CrossRef database, and other work types will be added over time.

What I’m particularly interested in is the claiming of research datasets. Everyone wants to give researchers better credit for the data they have produced, transformed and annotated, but data citation is still not a widespread practice. I am therefore very excited to be involved in the ORCID and DataCite Interoperability Network (ODIN), a EU-funded project that had its kickoff meeting last week in Berlin.

In the ODIN project we will work closely with DataCite, an organization that provides digital object identifiers (DOIs) for research data. One of the many things I like about ODIN is that social sciences is one of the disciplines where we will build a proof of concept (with the British Library, the other discipline is high-energy physics and CERN). We also want to understand how to best link researchers, data and publications. Dryad is an ODIN project partner and obviously has a lot of experience linking biological datasets to publications, and we will discuss how to integrate ORCID identifiers into the workflow.

Unique identifiers for researchers are of course also an essential part of any work on article-level metrics. I spoke about this at the altmetrics12 conference in June, and I’m excited that we can now finally start linking things together. ImpactStory was one of the ORCID launch partners, and I demoed their ORCID integration last week in Berlin.

ScienceCard is a fork of the open source PLOS Article-Level Metrics application, and is a project I started about a year ago. ScienceCard allows researchers to list all their publications, and the metrics associated with them. With the launch of ORCID I was able to finally add one important missing piece. Through automatic lookup of the ORCID identifier and retrieval of the publications claimed in the ORCID profile is has become much easier to create and maintain a ScienceCard profile – it shouldn’t take more than 5 min and a few mouse clicks (collecting all metrics takes longer because that happens in the background). I added ORCID integration to ScienceCard over the weekend, using the free public ORCID API.

ScienceCard is a great tool to explore how research impact can be collected and displayed, and I appreciate feedback in the form of feature requests and bug reports, ideally in the GitHub issue tracker of the project. This will also provide very valuable feedback to improve the PLOS Article-Level Metrics application, as they use almost the same code base. The API is for example completely the same, rplos and other tools using the PLOS ALM API can be used with ScienceCard by just changing the URL. Another example is the PLOS ALM WordPress Widget, with minor modifications it can be also be used with ScienceCard, allowing a researcher to display the metrics for his publications from PLOS and other sources on his blog. The upcoming Altmetrics workshop and hackathon (November 1-3 in San Francisco) will be a great opportunity to explore this further.

Category: Thoughts | Tagged , | 5 Comments

Announcing the ScienceCard Relaunch

Almost exactly a year ago (in the hackathon of the Science Online London 2011 conference) I started the ScienceCard project. ScienceCard is a fork of the Open Source PLOS Article-Level Metrics (ALM) code, personalizing the Article-Level Metrics.

A lot has happened in the last 12 months, most importantly that I started to work for PLOS as technical lead for the Article-Level Metrics project in May. In July, version 2.0 of the PLOS ALM application was released, and the code made available on Github. This not only means that everyone can install his own ALM application (assuming some familiarity with the Ruby and Rails web framework), but that we can fork the code and modify it.

Although my focus is now clearly on improving the PLOS application, it didn’t feel right to shut down the ScienceCard project. I really like the idea of personalized Article-Level Metrics (something I spoke about at the altmetrics12 conference). So I sat down the last two weekends to upgrade ScienceCard to the PLOS ALM 2.0 code (the source code of my fork can be found here). With the imminent launch of the Open Researcher & Contributor ID (ORCID) service next month I dropped the functionality to get all articles by a particular person from Microsoft Academic Search. For now you can add your articles by DOI or PubMed ID (currently only articles from PubMed), but you can now also add interesting articles not authored by you. This makes ScienceCard a nice tool to try out the PLOS ALM code without installing the software. Please remember that all metrics are collected in the background, so it can take a few hours until they show up for newly added articles.

ScienceCard also shows the power of open source software. Open source doesn’t simply mean free software, it means that you can modify the code if your requirements are different. For ScienceCard I added authentication via Twitter (required to add articles, I don’t want to deal with usernames and passwords), a simple lookup by DOI or PubMed ID (something not needed if you are publisher of the article and have that information), and comments and likes. Article-Level Metrics is not about collecting numbers, it is about capturing the activity surrounding an article post-publication.

 

Category: Thoughts | Tagged , , | Comments Off

Bye bye Nature Network, welcome SciLogs.com

The science blogging network Nature Network is moving to a new home. Today SciLogs.com launched as new home for Nature Network bloggers. I have been blogging at Nature Network for three years, starting with my first blog post (Open access may become mandatory for NIH-funded research) almost exactly 5 years ago to the day. My blog moved to PLOS BLOGS in September 2010 and all my old Nature Network content can be found here at PLOS BLOGS.

Blogging at Nature Network has changed my life in many ways. Thank you Matt and Corie (and later Lou) to make this possible.

Category: Thoughts | Tagged | Comments Off