Just DOI it!

With the December 18 issue Nature started to support XMP markup in article PDFs (reported last week on the Nascent blog by Tony Hammond)1. XMP stands for Extensible Metadata Platform and is a technology to embed metadata in files, including PDFs2. XMP was created by Adobe (with XMP support in PDF files since 2001), but is an open standard with backing by others, including Creative Commons3. The Digital Object Identifier ( DOI) is the most important piece of information in the metadata, as the DOI provides a link to the journal publisher website where more metadata can be retrieved. XMP support in scientific PDFs is unfortunately still very uncommon and probably hasn't changed much since Pierre Lindenbaum checked last year4.

Adding metadata to PDFs seems to be a no-brainer. We have done the same with music (mp3 ID) and photos (IPTC and EXIX) for years and it has been a tremendous help in organizing these files stored on our computers. Unfortunately there aren't too many tools that can extract the DOI or other metadata from the XMP in article PDFs. But I expect more desktop software to support XMP, once XMP support in scientific articles is more widespread. We will then be able to add a journal PDF to our reference manager of choice and have the relevant metadata (including authors, title, journal and issue) automatically filled in. As well as many other creative uses. Until then we need tools like Papers or Mendeley that can extract metadata from PDF files without this XMP information.

For a more technical discussion of XMP in scientific articles, please read the set of blog posts by Tony Hammond5,6,7.

fn1. XMP Labelling for Nature

fn2. Adding intelligence to media

fn3. XMP

fn4. Is there any XMP in scientific pdf? No

fn5. Metadata in PDF: 1. Strategies

fn6. Metadata in PDF: 2. Use Cases

fn7. Metadata in PDF: 3. Deployment

Related Posts Plugin for WordPress, Blogger...
This entry was posted in Snippets and tagged . Bookmark the permalink.

3 Responses to Just DOI it!

  1. Tony Hammond says:

    Hi Martin:
    > XMP support in scientific PDFs is unfortunately still very uncommon and probably hasn’t changed much since Pierre Lindenbaum checked last year
    I think you may find that the situation is somewhat improved from Pierre’s findings in 2007. :)
    We know of one other main publisher (Elsevier) who is routinely providing rich XMP in their PDFs. See, for example, a title with guest access to free text, e.g.
    The American Journal of Human Genetics
    If you look at the PDFs in the latest issue (vol. 83, num, 6), say, you will see that they contain bibliographically enriched XMP packets.
    We can only hope that other STM publishers will come on board in 2009 and further reverse Pierre’s earlier findings.

  2. Martin Fenner says:

    Tony, thanks a lot for the update. I haven’t found a list of all the publishers (or journals) that add XMP to their PDF files.

  3. Pingback: Beyond the PDF – it is time for a workshop | Gobbledygook