Interview with Pablo Fernicola

The Article Authoring Tag Set of the National Library of Medicine (NLM DTD for short) creates a standardized format for new journal articles that can be used by authors to submit publications to journals and to archives such as PubMed Central.[1] The Microsoft Word Article Authoring Add-in that was released earlier this year reads and writes this format. Pablo Fernicola from Microsoft first explains the Add-in in a video and then answers a few questions.

1. Can you describe what the Microsoft Word Article Authoring Add-in is and does?
There is an already ongoing transition to digital workflows for journals, as well as a nascent transition to digital distribution and consumption. Generating content that is best suited for digital distribution, as well as for archival, search, and semantic analysis in the future, is going to be essential. As a community, today we are not taking full advantage of the potential that the tools and formats currently in use present to us, and trying to bolt these capabilities to existing print centric processes is costly, inefficient, and does not allow for exercising the benefits afforded by the digital medium. Authoring for print delivery still dominates many of the processes, and constrains the final outcome, even if print distribution is discontinued.

It is important to realize that the content we generate today is what we will be accessing, and relying on as reference material, in the future. It is imperative that we generate content that is best suited for the way it will be consumed. Also, it would be ideal if this transition could be done in a non intrusive and low cost approach as possible.

The Article Authoring Add-in for Word 20072 is a free download, which provides new capabilities to the Word application, focused on Scholarly Publishing. The overall goal for the Authoring Add-in project is to help improve the scholarly publishing process for workflows that rely on Microsoft Word or Word generated content, covering the authoring experience, editorial workflow, and archiving of STM articles, with an eye towards generating content that is best suited for preservation, digital consumption, and search. A core capability provided by the add-in is the ability to open and save files in the NLM XML format.

2. How is the Authoring Add-in different from commercial tools such as
eXtyles from Inera?3

The add-in differs in at least the following ways:

  • It is targeted at both authors and editorial staff audiences
  • It focuses on enhancing the experience within Word, both for authors and editorial staff, and not just for content but also for metadata
  • Enables providing a consistent experience for authors, in relation to templates and the entering of metadata, across journals
  • Enables two way interaction with the NLM formats (article and book formats), saving and opening NLM files within Word
  • Provides a platform for software vendors to build on top of
  • Builds on the transition to XML as the underlying native format in Word 2007

3. What advantages do you see for authors that submit their manuscripts in the NLM Journal Publishing XML format?
The add-in doesn't force authors to submit their articles in the NLM format, but makes the conversion to that format a lot easier as part of the workflow. The add-in provides a way for authors to enter information in their articles so that this content (semantics and metadata) is preserved through the publishing workflow, in a way that is ready to save as a valid NLM document, while still using the Word user interface. I would expect that the more common scenario will be that of journals providing templates and authors submitting docx files, augmented with NLM data through the add-in, and that the journals or repository staff will be the ones doing the conversion to the NLM format.

4. How do you think the author submission process will change in relation to formats?
Many journals already use the NLM format as part of their publishing workflow, as well as for their archival format. Some publishers are moving to the format now. And, certainly for NIH funded research, all articles eventually end up in the NLM format as part of the submission to PubMed Central. But I don’t know of any journals that take in the NLM format directly from authors.

We have tried to take a very end user centric approach in the project. We would like for authors not to have to be aware of the underlying format. We don’t want authors to think of XML, for example, as formats should be something that happens in the background, which the authoring tools handle for the authors, and just make the authors’ work easier, and their content more easily searchable and relevant.

In our work with the journals, publishers, and repositories we focus a lot on interoperability based on formats and protocols commonly used in the community, not only in the form of the NLM format itself, but also by incorporating technologies such as SWORD4 and OAI-ORE5.

5. What is your job at Microsoft? What did you do before working on the
Authoring Add-in?

I have always been involved in software development, working and managing both small and large teams. Currently I am a group manager at Microsoft, in charge of running this overall project, which also includes an online service focused on the peer review process. I drive the development, the technology direction and architecture, and community engagement as it relates to scholarly publishing6. Before starting this effort, I worked for many years on developer platform technologies related to text, reading, graphics, and multimedia, both at Microsoft and at Apple, as well as being the Program Manager in charge of the web developer platform in Internet Explorer for a couple of versions of the browser.

6. Do you plan to also release an Add-in for Microsoft Word 2008 for
Macintosh?

We are investigating how we can best bring the authoring focused features to Word 2008 users. On Windows, Word 2007 is now in many ways a developer platform in its own right, there are even software development kits for it, and Word 2007 provides a lot of extensibility and programmability to developers. The equivalent developer support is not provided in Word 2008, with the Macintosh offering’s strength being on providing a great end user experience to its end user audience.

7. Do you want to talk about future plans for the Authoring Add-in?
We got a lot of feedback on our version 1 of the add-in, which we made available this past July, from folks involved in the back-end of the publishing workflow, such as folks at journals, publishers, and repositories, as well as from companies that provide the software tools and services in support of this work (of note is the integration work we did with Design Science for their MathType package7).

Development work on version 2 is currently underway and we expect to make available a Technology Preview soon, with the final release of version 2 in 2009. In version 1 we focused quite a bit on the architecture and getting the basic infrastructure in place to provide support for the NLM format. In version 2 there is a greater focus on the author experience, as well as on continuing to improve the support for the format.

Some of the driving questions that we would like to address are:
* in which ways can we make the submission/upload process easier for authors?
* Can we make the author and article metadata more reliable and consistent, thereby reducing the roundtrips between authors and the journals, as well as reducing the cost for cleaning up the data?

And overall, the philosophy is to simplify, simplify, simplify. Especially for authors, help them get the content into the article, and keep the technology in the background. For the staff at journals and repositories, provide them with access to all the richness of the NLM format, and the flexibility that they will need to build their own solutions.

fn1. Article Authoring Tag Set

fn2. Article Authoring Add-in for Microsoft Office Word 2007

fn3. eXtyles Product Information

fn4. SWORD

fn5. Open Archives Initiative Object Reuse and Exchange

fn6. ex Scientia

fn7. MathType

Related Posts Plugin for WordPress, Blogger...
This entry was posted in Interviews and tagged . Bookmark the permalink.

11 Responses to Interview with Pablo Fernicola

  1. Maxine Clarke says:

    I replied to this post but I got an error message when I pressed “submit”, and did not keep a copy. Two main points:
    (1) The word “folks” seems to be increasingly used these days, for “colleagues”, “staff” or other professional workers. Is it an Americanism? I personally don’t like it, as it has an informal, whimsical, country-like meaning, at least in the UK, as in “folksy”. My understanding also is that it is a collective noun, so the widespread “folks” is inaccurate/unnecessary.
    (2) The post is good although the process sounds too theoretical to say much. For example, who knows what all this web-based code will do to the journal’s editing process? Probably clashes and crashes. Perhaps by “scholarly” he means “journals that don’t have an editing process”, in which case I can see that it has a better chance of working from that point of view, at least.

  2. Martin Fenner says:

    I believe that Microsoft Word 2003 or Microsoft Word 2007 (the new XML-based format) are not the best document formats for scientific papers. I see several advantages when using the NLM DTD format, and I don’t expect more clashes and crashes. The NLM DTD is better to enforce the required formatting (e.g. are abstract, keywords, etc. present, are the references formatted correctly) and it is better to reformat the paper for other uses (e.g. Web. PDF).

  3. Maxine Clarke says:

    I definitely see clashes and crashes if the MS format then has to go through a standard journal editing and typesetting process (before being converted for publication on the web).
    For example, the references may be formatted by the author, but need reformatting for the journal’s style, or re-ordering/re-numbering, etc.

  4. Richard P. Grant says:

    Folks,
    What is ‘Microsoft Word’? Gah, it’s people like that who make me want to learn LaTeX.

  5. Pablo Fernicola says:

    Assuming that a journal goes to NLM XML as part of the submission/intake.
    Formating will be less of an issue, as in the future it is likely to take place as a transformation of the XML content for presentation (I will write a posting on this topic shortly).
    Re-ordering would be something one could do directly in the XML, or by bringing the NLM XML into Word, making the changes, and saving back out.

  6. Martin Fenner says:

    The NLM XML is just like any other file format. Not only Microsoft Word 2007/2008, but also OpenOffice and Apple Pages/Keynote/Numbers use XML-based file formats. PubMed Central is one of the places that understands the NLM XML, e.g. using the “PMC Article Previewer”:http://www.pubmedcentral.nih.gov/utils/pv/.
    We need standards for better data exchange and integration, and the NLM XML is a good standard for journal articles. PDF is fine for published articles, but an -edible- editable format is better for articles still in the submission process.

  7. Elizabeth Slade says:

    See also “Lemon8-XML”:http://www.lemon8.org/ from the “Public Knowledge Project”:http://pkp.sfu.ca/ which has a similar aim to Microsoft’s add-in:
    “Lemon8-XML is a web-based application designed to make it easier for non-technical editors and authors to convert scholarly papers from typical word-processor editing formats such as MS-Word .DOC and OpenOffice .ODT, into publishing layout formats such as the open, industry-standard NLM Journal Publishing XML format.”

  8. Richard P. Grant says:

    Thing is Martin, the vast majority of scientists don’t care what XML is, and they certainly don’t want another bloody plug-in. Endnote’s plugin to Word is clunky enough (although whether that’s Word’s or Endnote’s fault is moot).
    We want to write a paper and have either editors or machines extract the metadata. We don’t want to see it happen, we just want to know that it does. Strikes me that this is best achieved using heuristic methods at the journal itself.

  9. Martin Fenner says:

    Elizabeth, thanks for mentioning Lemon8-XML. I think that it is a very promising tool and “I blogged about it before”:http://network.nature.com/people/mfenner/blog/2008/06/14/my-paper-writing-dream-machine-1-0 (see the comments).
    Richard, I see that I have a hard time making my point. I do believe that the current system of submitting papers is a waste of time and money – both for the paper author and the journal. After I finish writing a paper, I spend sometimes endless hours formatting it in journal style – most of this time with the references and figures. The NLM XML standard and the Word 2007 Add-In will be a big help in this process. As Pablo pointed out in the interview, you don’t have to care about or understand XML to use the Add-In.

  10. Maxine Clarke says:

    Sounds to me as if it would be more for self-publishing than fitting into a journal workflow, or at least, a journal that (1) has a print-first workflow and (2) edits (which not all journals do).
    Martin, I don’t know about “most journals” but for example at Nature we provide a template for authors on our website which overwhelmingly they say they like and saves them time; once they have provided the template our editors do all the rest (reference style, checking and ordering included), and then our various technical systems do things such as convert into PDF and web, add links, feed into A&I services etc.

  11. Pingback: A very brief history of Scholarly HTML | Gobbledygook