Blogging Beyond the PDF

Four weeks ago I wrote about the Beyond the PDF workshop that is planned for January in San Diego. The goal of the workshop is to identify a set of requirements, and a group of willing participants to develop open source code to accelerate scientific knowledge sharing. The Google Group for the workshop has already seen a lot of interesting discussions. I have since had more time to think about my contribution and decided to propose the following:

Evaluate the blogging software WordPress as a platform to author, review and publish scientific manuscripts. Extend the WordPress functionality for authors and citations.

Blogging Beyond the PDF
Wordpress is a very interesting candidate for this because of the following features:

  1. WordPress is Open Source and can be easily modified and extended by Themes, Plugins, etc.
  2. WordPress is used by millions of users, a market much larger than the scientific software market. Many requirements for a scientific writing platform are not specific for scientific software.
  3. WordPress treats article writing as a workflow that includes collaborative writing, version control, and a document status.

Of course I’m not the first to think about WordPress in this context, the Code4Lib journal and Knowledge Blog are just two examples. To get started, I installed WordPress and a number of interesting Plugins at a new Blogging Beyond the PDF website. Phil Bourne has kindly provided material from a recent PLoS Computational Biology paper for the workshop. I formatted a first version of this paper using WordPress and the result can be seen here. There are a lot of rough edges (several tables and most references are still missing), but to me this looks good enough considering this is the result of maybe 10 hours of the work. In the next six weeks I will continue to work on this example manuscript to get a better idea of the strengths and weaknesses of the WordPress platform. I also hope to learn more from the experience of others. My first impressions are below.

The Co-Authors Plus Plugin enables multiple authors per article. Each author can be linked to an author page for displaying biographical info. WordPress could be extended to include additional info such as institution or past publications. Linking the WordPress user account to the unique author identifier ORCID, and describing the role of the author in the paper (e.g. conceived and designed the experiments or analyzed the data) would be particularly interesting. Plugins such as Edit Flow can extend the workflow by adding custom status messages (e.g. resubmission), reviewer comments, and email notifications.

Wordpress has good functionality to add figures to articles, including the option to add a caption or resize the figure. In addition, there is a good number of Plugins that help with the resizing, other manipulations and display of images.

Wordpress doesn’t do tables, but several Plugins add that functionality, including WP-Table Reloaded. This Plugin adds some very interesting functions, including the option to export table data as CSV or XLS files. This makes it much easier to reuse these data, something that I find very useful.

The WP-Footnotes Plugin is one of several plugins that adds footnotes to articles. Several Plugins integrate with reference managers such as CiteULike, but the functionality is still very limited compared to how most reference managers are integrated with Microsoft Word or LaTeX.

Related Posts Plugin for WordPress, Blogger...
This entry was posted in Conferences, Interviews, Presentations, Recipes, ResearchBlogging, Reviews, Snippets, Thoughts and tagged , . Bookmark the permalink.

20 Responses to Blogging Beyond the PDF

  1. Peter Sefton says:

    Re links to ORCIDS – this is essential, see my post on how we might be able to provide rich semantic links that express stuff like this ORCID ID represents an author.

    I think this is potentially one way to author papers, but it would be useful to tease apart the use of WP as an authoring platform from its use as dissemination/discussion platform.

    There are several limitations with WP as an authoring tool – tables and referencing are two that are mentioned, but there are also problems with figures – sure you can upload pictures, but you don’t get an integrated drawing environment like a word processor. This is why I think a more useful authoring environment for many authors will still be an office suite with a reference manager such as Zotero. That’s not to say that WP is not going to be useful for distributing articles particularly using commenting services like

  2. There are also WordPress plugins to facilitate inline comments so that you can associate a comment with a paragraph (not just the whole post). For instance, see CommentPress and Also an example of CommentPress in action.

    A few months ago, Firefox marketed bookmarklet sets for various purposes. I think it would make a lot of sense to package and promote various workflows and plugin sets for WordPress.

  3. Carl says:

    I found this piece very interesting. I’m a graduate student researcher in Theoretical Ecology, and use WordPress for my lab notebook. In addition to figures and tables, WordPress handles latex equations well, but where I believe the real advantage exists from the sense of a lab notebook or publication is in reproducibility. Clicking on my figures brings me to my figure library (external, in Flickr), where I can see similar results. Each figure is stamped with the code version number and links to my github repository, where I can find the exact version of the code that produced the figure, verifying parameter values, etc.

    I strongly agree that the need for a nicely integrated reference manager could improve citation of articles online. By editing my posts in an offline text editor (vim), I can access my Mendeley library to embed citations, but a wordpress plugin would be great.

    Thanks for some great suggestions and I look forward to seeing where this is headed. My lab notebook is here:

  4. Pingback: Quick Links | A Blog Around The Clock

  5. Peter says:

    Martin, very cool project. I hope you’ll write a lot about the workshop — it sounds exciting!

    I am wondering about two things.

    a) metastructure: What structure is your content in now? Is it just hidden away in some wordpress database and/or did you create some (pseudo-) meta structure that was then transformed into the blog? How is your content structured (i.e., are the different sections different elements within the meta structure?)
    I feel this aspect is the most lacking of them all — pdf is not just bad for metadata, but also for reusing material since it’s just one big blob. In that spirit:

    b) portability/reinterpretation: How easy is it to port the content to some other form of presentation (say, PDF or LaTeX for typesetting)? I think to move beyond PDF we need some format that allows re-interpretation in many different ‘form factors’ — first things that come to my mind are print/offline viewing, different screens (from smartphone to projector) and above all “remix” (i.e., citation, ‘full’ integration of data, for journalists etc).

    Carl’s features seem to allow parts of this through external tools (sweet idea btw).

    I always found typesetting on wordpress painfully restricted — complicated information does better with suitable layouts (cf. Oh, and don’t get me started on that old LaTeX-2-png-plugin — horrible quality, not accessible for visually impaired readers. Thankfully there’s a mathjax plugin in development… (sorry for the rant, I’m a mathematician after all…)

    On a side note, have you ever looked at ? That’s my portable notebook — one html/javascript file, a full wiki. It could be a good comparison; shouldn’t be too hard get the content in one (an example and the plugins are amazing (especially the mathsvg one for LaTeX…).

    Finally, looking at Figure 4/6 in the blog/article: do you know of a (picture) file format where text, rulers etc. are part of the meta data? I have heard too many stories where editor/referee/coauthor wanted a different font or color (in Fig 6) and some poor grad student had to redo everything (by hand of course…). If such information were to be separated as meta data changing one could be done independently of the other. I think that small detail could help a lot; especially with “remixing”.

  6. Peter says:

    Hm… My comment from yesterday never made it :(

  7. Peter says:

    oops. sorry about that. now I can see it…

  8. Pingback: Beyond the PDF proposed session : Bring the web to the researcher : Mainly on authoring tools « ptsefton

  9. Pingback: Citations – Carl Boettiger

  10. Pingback: Citations are links, so where is the problem? | Gobbledygook

  11. Martin Fenner says:

    Sorry guys, I was a little slow approving your comments. And thank you for your very valuable feedback.

    Several people have told me that they rather continue using a word processor for authoring. I think that’s fine and that there will never be a single authoring tool for everybody. But it would be nice if WordPress became a viable alternative.

  12. Martin Fenner says:

    Peter and Jody,

    I like inline comments. is nice, but requires a particular theme/layout for the site. And the comments look a little bit too intrusive to me, CommentPress is similar in this regard. Inline comments require that we can link to particular sections of a paper, and that should also work from other sites or places like Twitter or FriendFeed.

  13. Martin Fenner says:

    Carl, thank you for your comment and the link to your lab notebook. I think that currently you still have to be a geek to use WordPress to write scholarly works (you mention BibTex and vim). But the situation could be very different 12 months from now, I would never underestimate the power of a large user base and developer community.

  14. Martin Fenner says:

    Peter (Krautzberger), very good comments as always. As for metastructure, we should not underestimate what we can already do with HTML (h1, h2, p, div, span, etc.). In my example article I used h2 for the article sections and h3 for subsections. On the test website I use the WPTouch Plugin for mobile users, and of course there is a RSS feed. The PDF is generated automatically and nows about document sections and links, but obviously needs fine-tuning.

    You make a very good point about labelling images. SVG would be an image format that could do that, but I don’t about good editors for it. You can also store image metadata (not the axis labelling, but title, caption, date of the experiment, author, etc.) in EXIF and XMP – several Wordpres Plugins can read that information.

  15. Peter says:

    I don’t think html itself is a good solution (but probably that’s not what you meant). html (like PDF) is designed to display content, not to structure content — this universality is its strength but for your project its weakness.

    I think, if you do not want to rely on the raw data (in your case, the wordpress generated mysql database) the development of a professional css is critical.

    For example, in scientific writing not all paragraphs are equal; we have abstracts, introductions, descriptions of protocols etc. pp. From a displaying point of view they are all the same — simple paragraphs, nothing else should matter for a tool that displays it. The problem is, if you do not use markup that reflects this structure you will not be able to process displayed html into another form of display without loss of information. With a well designed css however, you could embed this structure easily while making it more natural from a writer’s point of view (well, unless you’re not aware that you’re writing an abstract 😉 but any LaTeX user can explain how quickly this enforced behaviour becomes a blessing).

    So on the one hand, I believe that the key (community) effort for this to work would be to generate some consensus on the structural elements we need for effective scientific writing (this will probably be split up into different levels of detail and evolving with time and subject). On the other hand, we need specialists in xml- or other mark-up languages to find ways to develop, say an initial css, so that it can be extended effectively in the future — maybe the librarians’ expertise can come in, maybe the programmers’ skills, who knows.

    Anyways, I’m looking forwards to reading more about your project.

  16. Martin Fenner says:


    I wouldn’t underestimate the usefulness of HTML – it probably strikes a good balance between simplicity and structured content that can be displayed in several different ways. There are many interesting semantic tools for scientific articles (and a quasi XML standard with the NLM-DTD), but the problem is that the average user will never use them.

  17. Pingback: Wordpress for Reference Management | Gobbledygook

  18. Pingback: HTML5 or messages from beyond the PDF | Gobbledygook

  19. Pingback: WordPress | Digital Monograph Technical Landscape study #jiscPUB

  20. Pingback: WordPress [and the jiscPUB project] | ptsefton