ORCID or how to build a unique identifier for scientists in 10 easy steps

ORCID stands for Open Researcher and Contributor ID and was announced in early December. This blog post tries to summarize some of the problems that have to be solved to develop a unique identifier for scientists.

1. Identify the problem
The Researcher Identification Primer by the Gen2Phen Knowledge Center lists some of the problems that a unique identifier for scientists tries to solve, including

  • Disambiguation of author names in the scientific literature and establishing/validating relationships between authors and publications.
  • A solid foundation for permitting and tracking online scientific contributions, such as database submissions, scientific blogging, and community curation efforts.
  • Knowledge discovery applications using some or all of the above components.

The Gen2Phen workshop co-organized by Gudmundur Thorisson in May 2009 discussed these issues in much more detail. One of several articles talking about the problem of disambiguation of author names (especially Asian author names) appeared in Nature News in February 2008. A December 2009 Nature editorial emphasized that a unique identifier for researchers will be especially valuable to track scientific contributions that are not related to authoring a paper. Phil Bourne and J. Lynne Fink also wrote about this in PLoS Computational Biology in December 2008: I Am Not a Scientist, I Am a Number. A number of tools have tried to solve this problem, but it is not possible to link the researcher identities in the many systems.

2. Define what you want to accomplish
Geoff Bilder gave a very good introduction to the problem at Science Online London in August and STM Innovations in December. Both talks were similar, but the latter is available as video and PDF. He emphasized that ORCID is about Knowledge Discovery and not Access Control, and explained the terminology for subject, identifier, profile, persona and credential. Access Control is a related problem that is sometimes mixed in, but there is no requirement that a unique researcher identifier also has to provide secure access via whatever mechanism (Open ID is one solution to that problem).

3. Win support of stakeholders
Founding members of the ORCID initiative can be found on the ORCID homepage and include publishers, funders, universities, organizations and software companies. A number of important stakeholders are already part of the initiative, support by more funders (besides the Wellcome Trust) and software companies (particularly those that build reference managers or social networking sites for scientists) would be great. Probably the biggest name not on the list is the U.S. National Library of Medicine that runs the PubMed database of biomedical literature (the ORCID members Wellcome Trust and British Library are involved in UK PubMed Central).

4. Make decisions about the general design of the system
Some of the design decisions obviously are not set in stone at this stage. One continuing discussion is centralized vs. federated, and it looks like ORCID will be a centralized system similar to the DOI. Geoff Bilder has some good arguments for a centralized system. Another recurring theme is how much control an individual researcher has over his ORCID record. Although external assertion from publishers or funders will certainly be part of ORCID, the individual researcher will have an important role, not only because of privacy concerns, but also because this is the easiest way to fix errors that even the best automated algorithms for author assignment will produce. And it looks as if ORCID will be an extensible system that will for example allow publishers or social networking sites to add functionality they require. The discussion at the STM Innovations meeting in early December touches some of these issues and is recorded as video (after the talk by David Kochalko).

5. Pick a name
The name Open Researcher and Contributor ID (ORCID) is obviously a combination of ResearcherID (Thomson Reuters) and Contributor ID (CrossRef). I would have preferred a simpler name, but I guess we have to get used to ORCID.

6. Build on available tools
ORCID will be based on the ResearcherID software from Thomson Reuters. From what I’ve seen, the Open ID system will not be a central part of ORCID. But ORCID certainly will be designed to work together with Open ID and other authentication mechanisms. I don’t know what Elsevier and the Scopus Author ID will contribute to ORCID.

7. Form an independent organization
In order to be adopted widely, ORCID must be run by an independent organization, and not by a single publisher, software company, research organization or funder. With the experience of running the DOI system to identify digital objects such as scientific papers, CrossRef would be one obvious candidate, but the ORCID founding members have yet to decide on that.

8. Secure financing
Starting and maintaining ORCID will obviously cost money. In my little survey about author identifiers back in April 2009, the opinions were split about who should pay for this. Journal publishers and database maintainers (referring to such databases as PubMed, Scopus or Web of Science) were the two most common answers. ORCID will make it easier for funding agencies to evaluate scientists and they might therefore also contribute to the system. Individual researchers hopefully will not have to pay for any of this, but their input in time is obviously required.

9. Promote ORCID
A Nature editorial in December was a good start to promote ORCID to a wider audience. A unique identifier for scientists will only become accepted if widely used. That’s why it is important that publishers and funders quickly adopt this service. Software companies that build interesting tools around ORCID are also critical, e.g. integration of ORCID into manuscript submission systems (including the use of ORCID for the peer reviewers) and social networking sites (including of course Nature Network). My experience with the DOI for papers (e.g. the limited support in PubMed) tells me that adoption of ORCID will be a long process.

10. Involve individual researchers
Individual researchers currently have no way to get directly involved in ORCID. But some level of involvement is critical for an author identifier to work. The best place is currently probably the LinkedIn Group Unique Identifiers for Researchers started by Cameron Neylon. But I hope we soon see ORCID discussions on Nature Network and other social networking sitess. The best place on Nature Network to discuss ORCID is currently probably the Scientific Researchers and Web 2.0: Social Not Working? Forum.

15 Responses to ORCID or how to build a unique identifier for scientists in 10 easy steps

  1. Heather Etchevers says:

    Wow. Thank you so much for the update, Martin. I can only imagine how much work this informational post must have represented for you. Why is everyone making best of 2010 posts so early on in the year!? Hard acts to follow.
    I hope this is prescient, in any case; we really need the ORCID system. I look forward to #10 being implemented, as it means that many of the other hurdles will have been surmounted.

  2. Cameron Neylon says:

    Thanks for the really useful summary. My concerns remains as I have said before. If you don’t give the user complete control then I think it will be at best a login system for some journals. And I am still worried about the lack of sensitivity shown by redirecting orcid.org to a thomson-reuters URL. Not because of the actual doing of it but because it shows a real lack of awareness of some central issues around identity and the sense of ownership required to make it work.

  3. Martin Fenner says:

    I agree that it is crucial that individual researchers are involved in the implementation of the ORCID system, but they currently only have a voice through universities or institutions. We should have a forum (the LinkedIn group or something else) for concerns regarding privacy and other sensitive issues. But I’m confident that ORCID will be run by an independent organization, and not by a single company.
    As much as I like the ORCID initiative, one consequence of a working author identifier system is that the overuse of bibliometrics (Impact Factors, etc.) will only increase, as administrattors will only have to push a button to get to the numbers of individual researchers.

  4. Richard Wintle says:

    Oh thank goodness. I thought from your post’s title that you meant *I* had to follow 10 steps in order to make an ORCID for myself. Not gonna happen if it’s that complicated… ;)

  5. Richard Wintle says:

    P.S. The overuse of bibliometrics you identify as an undesirable side effect will be a real boon to those of us who are obliged to gather such data for reporting to funding agencies. I’m all in favour of “push button” data gathering. Especially if it gets rid of f*rting around with PubMed and *shudder* Google Scholar.

  6. Martin Fenner says:

    I have no idea how to get an ORCID. Will my ResearcherID A-7225-2008 also work as ORCID (or is it ORCID ID)? It would be relatively easy to assign an ORCID to authors when they submit a paper, to a reviewer when they are asked for peer review or to a user of Nature Network, Mendeley or CiteULike when they start to use the service. But for all the papers that are out there already (currently about 1.5 million new ones a year)?
    Our institution currently does a bibliography of all published papers every year, and it is up to each researcher to add his publications to the database. ORCID will be great to help with CVs or publication lists for an individual, department or institution. But I get nervous when people start to add/multiply/divide the Journal Impact Factors on these publication lists to come up with a single number for each reseacher.

  7. Elizabeth Moritz says:

    Great post Martin! Though, like Richard, when I first read the title I also thought I would need to follow 10 steps to make an ORCID and was a bit worried. (Also, I’m assuming most people pronounce that acronym to sound like the flower, but I keep thinking ORC-ID, as in IDs for Orcs, Tolkien style).
    Will this system be used to go back to already published materials and assign ORCIDs, would that be up to individual researchers? Or do you think this system will be primarily used to assign ORCIDs as papers are published in the future?

  8. Richard Wintle says:

  9. Martin Fenner says:

    Richard, interestingly, Microsoft is one of the founding members of ORCID. Will be interesting to see whether Google, will also join. Google Scholar is so frustrating to use, I gave up using it a long time ago (you mention some of the problems with it in your post). And Google Scholar had the potential to become a real great scholarly search engine (fulltext search and all that).

  10. Richard Wintle says:

    Martin – thanks for the pointer at Scopus over at my blog (I’d forgotten about it!). Glad I’m not the only one frustrated by Scholar… I was beginning to have little niggling doubts that it might be much more powerful than I knew, and I was just totally missing the boat.
    In its defence, it’s rather good at finding hits inside all manner of weird and wonderful documents, such as PhD dissertations and the like.

  11. Martin Fenner says:

    Richard, what I don’t like about Google Scholar is that it doesn’t seem to treat references as structured data, it is almost like one big fulltext search. Web of Science and Scopus are good alternatives, but commercial. It is interesting that both these services have already implemented a unique identifier for authors.
    P.S. I recently started to like “Wolfram Alpha”:http://www.wolframalpha.com/, (especially their iPhone app), maybe they should start offering searching of the scholarly literature.

  12. Richard Wintle says:

    Hm, that Wolfram Alpha looks a bit complicated for a bear of little brain such as myself. Agree that Scholar would benefit greatly from unique IDs tagged to each reference, but I guess that’s just not how it works. Unfortunately. The full text nature of the search is its biggest strength of course.

  13. Duncan Hull says:

    Hi Martin, thanks for a useful summary… Talking of
    _Disambiguation of author names in the scientific literature and establishing/validating relationships between authors and publication_
    You might also be interested in Author name disambiguation in MEDLINE by Vetle I. Torvik and Neil R. Smalheiser in ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, No. 3. (2009), pp. 1-29.
    Which describes something called Author-ity “tools for identifying Medline articles written by a particular author” – and a lot of the related work…
    Speaking of which, another related paper Understanding PubMed user search behavior through log analysis claims that *search by author name* is one of the most common queries in PubMed, which confirms how important the issue of author identification is.

  14. Richard Wintle says:

    Interesting references, Duncan. Thanks for that.
    A whole different topic might be “is the new PubMed interface an improvement over the old one?” ;)

  15. Martin Fenner says:

    Duncan, thanks for the *Author-ity* reference. Another interesting link is Interoperable identification infrastructure, a project which tries to develop an author identifier (among other things) for institutional repositories. The list of author identifier projects is pretty long, almost material for a separate blog post.
    Richard, my thoughts on the PubMed redesign are here.