Author Identifier Overview

This is the pre-print of a manuscript submitted to the Libreas journal.

Abstract

Unique identifiers for scholarly authors are still not commonly used, but provide a number of benefits to authors, institutions, publishers, funding organizations and scholarly societies. This report gives an overview about some of the popular author identifier systems, and their characteristics. The report also discusses several important issues that need to be addressed by author identifier systems, namely identity, reputation and trust.

Introduction

We have long assigned unique numbers to genes, species or stars, and have used unique identifiers for scholarly works for more than 10 years, but unique identifiers for authors are still fairly new and not yet in widespread use. Unique author identifiers are useful for the following reasons:

  1. Researchers want to find potential collaborators, and want an easier way to get credit for their scholarly activities,
  2. Institutions want to collect, showcase and often evaluate the scholarly activities of their faculty,
  3. Publishers want to simplify the publishing workflow, including peer review,
  4. Funding organizations want to simplify the grant submission workflow and want to track what happened to the research they funded, and
  5. Scholarly societies want an easier way to track the achievements of their members.

The reason that unique identifiers for authors are not as commonly used as unique identifiers for scholarly contributions is not that they are not needed, but rather that they are something rather difficult to implement. In this report I want to summarize the status quo and some of the important issues that need to be addressed by an author identifier system. Throughout the text I will use the term author in the broader meaning of a creator of scholarly works, in most instances this term could be replaced by researcher, scholar or contributor.

Status quo

Some popular author identifier systems for scholarly researchers are listed in table 1. While some systems have been around for more than 10 years, several new systems have emerged in the last three years and there clearly is an increased awareness for unique author identifiers [Bourne 2008]. The ORCID and PubMed Author ID system have been announced, and are expected to become publicly available later this year. With the exception of the few countries with mandatory author identifiers such as Brazil and the Netherlands, and some specific disciplines, author identifiers are still not widely used.

Table 1. Popular author identifier systems (click to enlarge). A PDF version of the table is available here.

In addition to unique author identifiers for scholarly works, we also see the emergence of identity systems with a much broader scope. The International Standard Name Identifier (ISNI) system will cover all creators of creative works, including artists, musicians. And OpenID has become the de facto standard for identification and authentication of internet users.

The overview of existing systems is not only helpful to describe the status quo, but also to understand the different approaches to author identification that these systems have taken. In the following sections I want to focus on three important aspects: identity, reputation and trust.

Identity

In its simplest form an author identifier system provides a unique identifier to a person. The identifier could be given to everybody who asks for it – as with the OpenID system – or could be given to all authors of creative works – as is intended for the International Standard Name Identifier (ISNI) system – or could be given only to someone actively involved in scholarly work. In the latter case we have to think about the definition for scholarly work, and here two approaches are in use. One option would be to assign the identifier upon graduation with a science degree, and this is what Brazil and the Netherlands are doing. The problem is that this approach might not catch all authors of scholarly works, and this is why some author identifier systems, including AuthorClaim and Researcher ID are open to registration by everybody. The other option would be to assign an author identifier when someone has created a scholarly work, most commonly this would mean a scientific paper or book chapter. This is the approach taken by the ArXiv Author ID and the Scopus Author ID systems.

Until now we have talked about unique author identifiers being assigned proactively, most commonly when an author decides to get an identifier. The much more complicated situation is the retrospective assignment of unique identifiers to authors, including authors that are no longer actively doing scholarly work. Scopus Author ID is an example of a service that does name disambiguation, and ORCID is also working on name disambiguation. This retrospective assignment only works if another person – or a computer algorithm – can unambiguously identify a particular person. There are actually two problems to solve: different people might have the same name, a situation particularly prominent in China and Korea. And we have to solve the opposite problem where different names all point to the same person. A reason for this could be name changes, e.g. through marriage, or several different spellings of the same name – this is common for names from countries such as China using non-latin alphabets, but also a problem for countries using the latin alphabet, e.g. because of an umlaut in a German name. Name disambiguation is inherently difficult, and the algorithms are at best 95-98% perfect.

Some of the currently available unique identifier systems are not universal, but limited to a specific discipline (e.g. the ArXiv Author ID to physics, mathematics and related disciplines) or country (e.g. LATTES in Brazil or NARCIS in the Netherlands). With this approach we run into problems with interdisciplinary or multinational scholarly works. A good example would be assigning author identifiers to all publications in the multidisciplinary journals Science or Nature. We therefore also need universal identifiers, and Researcher ID, Scopus Author ID, AuthorClaim and ORCID all provide such a service. ORCID is the only service trying to associate the ORCID identifier with other existing author identifiers. This integration is needed so that established specific author identifiers such as LATTES or ArXiv Author ID can be used in parallel with universal identifiers.

Reputation

A unique author identifier in itself has limited value. We have to add meaning to it by associating the identifier with biographic and bibliographic information: where does the author work and has worked in the past, what scholarly works has he created and with whom, what other author identifiers point to the same person, etc. With this information we are building an author profile, and this can be done either by the system issuing the identifier, by the systems that collect scholarly contributions, or by one or more other systems. As there is currently no initiative for a single universal system that holds the scholarly record, profile information for the time being will continue to be distributed and duplicated. All author identifier systems discussed here collect profile information. The profile information is a proxy for the reputation of an author, i.e. the opinion of the scientific community.

While reputation is influenced by many factors, the information that can be collected in an author profile should ideally consist mostly of information collected from other systems using digital identifiers. For scholarly activities we have both discipline-specific identifiers (e.g. pmid for life sciences publications or gi for nucleotide sequences) assigned by individual organizations collecting this information and universal digital object identifiers (DOIs) assigned by registration agencies such as CrossRef and DataCite. Whereas most scholarly publications now have a DOI assigned to them, we are still at the beginning of routinely assigning DOIs to research datasets. We do have universal and unique identifiers for publications and research datasets, but not for the other scholarly activities that could be listed in an author profile, including but not limited to grants, awards, patents, peer review, or teaching. Most unique author identifier profiles are limited in scope to scholarly works, but LATTES, NARCIS, ORCID and PubMed Author ID also look at other scholarly contributions. AuthorClaim, VIAF, Scopus Author ID, LATTES, NARCIS and the Names Project are assigning identifiers to institutions, whereas Researcher ID, ArXiv Author ID and ORCID don’t use unique identifiers for institutions.

Not all scholarly activities of an author are public information that can be included in an author profile. Peer review is a good example for an important and valuable scholarly activity where the authors of the reviewed paper or grant do not know the identity of the reviewer. Journals and funding organizations might use unique author identifiers internally to simplify the peer review workflow, but the public author profile will probably at most list the journals and funding organizations for whom the peer review was done.

Related to reputation is provenance, which describes the record of ownership of an object. For a scholarly work provenance not only refers to its authors, but also to the place and time it was published, the other works citing it, etc. When reading a scientific paper or looking at a research dataset, we always do this in the context of its provenance, and this is obviously easier to do with unique author identifiers.

Reputation and provenance in the scholarly context are typically used for knowledge discovery and academic metrics. Author profile information collected with the help of unique author identifiers improves knowledge discovery; it becomes much easier to find other scholarly works by the same author or other authors with similar research interests. Academic metrics are increasingly used to make funding and job hiring decisions, and this is done by trying to put the reputation of an academic, department or institution into numbers. Author identifiers simplify academic metrics, but a lot of work still needs to be done about whether reputation can be put into numbers, how these numbers should be calculated, and whether this is the best approach to forecast the academic productivity of individuals or institutions.

Trust

Identity and reputation are based on trust in the claims made about the author and his scholarly contributions. The individual author has to trust the author identifier system. Most importantly he wants to control the privacy settings of his profile information. Authors also want to know that the author information system is reliable and will be around for a long time to come, and that the information in the system is open, meaning that the data collected by the author identifier system can be freely accessed, exported and reused. Authors also need trust in the organization running the author identifier service, and this has historically been an issue for proprietary systems run by private companies, from Microsoft Passport as single-sign on system for internet users to Thomson Reuters and Elsevier with their Researcher ID and Scopus Author ID services.

Other users of an author identifier system also have to trust the claims made in an author profile. This is not possible in a system that relies on self-claims made by authors – e.g. the AuthorClaim system – but requires verification of these claims. This would typically be institutions for author affiliations, publishers for scholarly publications and data centers for research datasets. Scopus Author ID is an example of a system that primarily relies on external claims. The problem with a system that only uses external claims is that that these claims are much more difficult to do and still will never be 100% accurate.

The best trust exists in systems that use claims by both authors and external sources. This is most easily done when the author identifier is used at the time a paper, grant or dataset is submitted, and much more difficult when done retrospectively. Self-claims and external claims not only require a unique author identifier, but also a mechanism for authentication (confirm that this is really author x) and authorization (allow journal y to add publication z to author profile y, but not change the other publications). Authentication and authorization are not a core function of author identifier systems, and can also be provided by standard protocols such as OpenID and OAuth.

Conclusions

Unique identifiers for scholarly authors benefit all involved stakeholders, but are currently not common practice. A number of recent initiatives are addressing this problem and we can expect to see major progress in this area in 2011. Author identification is a complex problem and involves a large number of stakeholders who sometimes have opposing views on some of the issues that need to be addressed. Building an author identifier system is therefore not just about technical challenges, it also requires decisions about openness, privacy, collaboration, business models and other critical issues.

Disclaimer

The author is a member of the ORCID Board of Directors. The views expressed here are his personal opinion.

References

  1. Falagas ME. Unique author identification number in scientific databases: a suggestion. PLoS medicine. 2006 May ;3(5):e249. DOI: 10.1371/journal.pmed.0030249
  2. Qiu J. Scientific publishing: identity crisis. Nature. 2008 Mar ;451(7180):766-7. DOI: 10.1038/451766a
  3. Aerts R. Digital identifiers work for articles, so why not for authors? Nature. 2008 Jun ;453(7198):979. DOI: http://dx.doi.org/10.1038/453979b
  4. Cals JW, Kotz D. Researcher identification: the right needle in the haystack. The Lancet. 2008 Jul ;371(9631):2152-2153. DOI: 10.1016/S0140-6736(08)60931-9
  5. Wolinsky H. What’s in a name? EMBO reports. 2008 Dec ;9(12):1171-4. DOI: 10.1038/embor.2008.217
  6. Bourne PE, Fink JL. I am not a scientist, I am a number. PLoS computational biology. 2008 Dec ;4(12):e1000247. DOI: http://dx.plos.org/10.1371/journal.pcbi.1000247
  7. Thorisson GA. Accreditation and attribution in data sharing. Nature Biotechnology. 2009 Jan ;27(11):984-985. DOI: 10.1038/nbt1109-984b
  8. Enserink M. Scientific publishing. Are you ready to become a number? Science. 2009 Mar ;323(5922):1662-4. DOI: 10.1126/science.323.5922.1662
  9. Researcher Identifcation Primer. GEN2PHEN Knowledge Center. 2009. Available from: http://www.gen2phen.org/researcher-identification-primer
  10. Habibzadeh F, Yadollahie M. The problem of “Who”. The International Information & Library Review. 2009 Jun ;41(2):61-62. DOI: 10.1016/j.iilr.2009.02.001
  11. Credit where credit is due: The Open Researcher and Contributor ID (ORCID). Nature. 2009 ;462(7275):825. DOI: 10.1038/462825a
  12. ORCID or how to build a unique identifier for scientists in 10 easy steps. Gobbledygook Blog. 2010. Available from: http://blogs.plos.org/mfenner
  13. Warner S. Author Identifiers in Scholarly Repositories. ArXiv. 2010. Available from: http://arxiv.org/abs/1003.1345
  14. Lane J. Let’s make science metrics more scientific. Nature. 2010 Mar;464(7288):488-9. DOI: 10.1038/464488a

5 Responses to Author Identifier Overview

  1. Pingback: Author Identifier Overview | Gobbledygook

  2. Pingback: Quick Links | A Blog Around The Clock

  3. Tomas Baiget says:

    In 2007 we put online IraLIS (International Registry of Authors – Links to Identify Scientists)
    http://iralis.org
    We address our service maily to hispanic authors because of the problem of using 2 family names, being the “important” one written first (and not at the end).
    We are an academic non-profit researchers group (CIEPI, International Center for Information Development and Strategy).
    We provide an author name standardization system for the E-LIS repository:
    http://eprints.rclis.org/
    but we cover all disciplines.
    We provide an iralis identifier and a database of all name variations used by each author.

  4. Tomas, thanks a lot for mentioning IraLIS, and for the problems that hispanic authors have with author identifier systems that don’t have the option to provide two family names.

  5. Pingback: Identifiant Auteur : travaux en cours | docnews