Evaluating Impact: What’s your number?

Flickr photo by szczel.

Flickr photo by szczel.

What’s your number? This Saturday, we (MF) will be addressing this very question at the SpotOn London conference with Marie Boran, David Colquhoun, Jean Liu and Stephen Curry. In advance of the discussion on the role of altmetrics in evaluating scientific research, we offer our views from a historical perspective.

In the last few decades, research has grown from an activity of a relative small number of academics working alone or in small groups into a large-scale operation that involves large numbers of people and money. One of the consequences of this trend is the increasing difficulty of making decisions about acceptance for publication, academic positions, and grant funding. If you work in a relatively obscure field such as psychoceramics, you might know the work of everyone in this field, but that is no longer true for research fields such as the genetics of breast cancer. The trend is only increasing towards Big Science. High-energy physics provides an easy example as Salvatore Mele from CERN will discuss in his SpotOn London keynote: 1 boson, 50 years, 50,003 scientists: understanding our universe through global scientific collaboration and Open Access.

I. Modern peer review begins…

Around the beginning of the Higgs boson adventure, the state of scholarly communications looked quite different. Peer review was only introduced relatively recently to solve this problem. It is often overlooked that the famous Watson and Crick paper on the structure of DNA wasn’t peer reviewed in 1953. The journal Nature implemented peer review during this decade. Journal editors were increasingly overwhelmed by the sheer amount of research papers submitted and by the increasing specialization of science, and they needed help from subject area experts to make publication decisions. We think it is likely also that universities and research funders really began to rely on external peer review to help make decisions sometime around this era as well.

Flickr photo by LEOL30

Flickr photo by LEOL30

Peer review does not measure the quality of research but rather is a subjective evaluation aimed at a specific end, whether it be journal publication or proposed grant award decision-making. In the best case, the process works well. However, most universities, journals, and funders do not have the resources to do this properly. Amy Brand (Assistant Provost for Faculty Appointments and Information) has spoken about the Harvard University process of faculty appointments at the recent ALM workshop. It was clear that their elaborate and labor-intensive review process seems to work well for them but isn’t easily translated to smaller institutions or those without dedicated resources governing and running this process. In the worst case as a subjective evaluation, peer review can actually be harmful. Walker-Eyre and Stoletzki’s recent analysis in PLOS Biology showed little correlation between the subjective assessment of merit in the datasets they studied and conclude that subjective assessments are an error-prone, biased, and expensive method by which to assess merit.

II. … And then bibliometrics enters…

To overcome the subjectivity and other limitations of peer review, we increasingly began to use bibliometrics such as citation counts in the 1970s. As quality of research cannot be directly measured, bibliometrics created a construct of impact based on how research is perceived and consumed by the scientific community. Citations in the scholarly literature were used as this proxy, and higher citation counts were constructed to represent higher scientific impact. This construct has two major limitations: a conceptual one (by using impact as a proxy for quality, we create undesired incentives) and a practical one (citation counts are influenced by many other factors, e.g discipline). It has given birth to unintended and undesired consequences, of which a principal one is the privileging of things that can be measured with this construct (namely, journal articles instead of other research outputs as well as research that is more likely to be cited).

Flickr photo by magannie.

Flickr photo by magannie

One of the first bibliometric indicators that became widely used was the Journal Impact Factor. The decision to focus on the journal and not the individual article were rooted in practical and environmental reasons no longer applicable. Counting citations was painful manual work before digital publishing became the norm in the 1990s. We also didn’t have norms for uniquely identifying journals articles until DOIs were introduced in the 2000s. The short citation window and document type asymmetry were also due to concerns for cost-efficiency. Today, there are many reasons for why using the Journal Impact factor for individual level bibliometrics are a bad idea. Panelist Stephen Curry’s remarks on the subject are particularly well communicated here: Sick of Impact Factors.

III. What next? ALM, Altmetrics, but we also need more

These two historical moments provide some background as to how we have arrived at the current state. But separate from this narrative, we believe that quality of research is something that really can’t be fully captured or measured in numbers. This is true for all creative work (including literature) and should come as no surprise to anyone involved in research endeavors.

But should we stop right here, give up, and go back to our business of doing science? In the last few years, three important trends have begun to pave a way out of the current muck and change how we use bibliometrics:

  • metrics for individual articles instead of journal-based metrics
  • metrics that go beyond citations, including usage statistics and social media activity
  • metrics for research outputs other than journal articles

These trends have been summarized with the terms article-level metrics and altmetrics and will be the focus of the panel next Saturday. As altmetrics is still a young discipline, more work needs to be done before they become part of mainstream bibliometrics. That said, the number of altmetrics presentations at the latest international bibliometrics conference (with the Eugene Garfield Doctoral Dissertation Scholarship for 2013 awarded to Ehsan Mohammadi for altmetrics work), as well as the July 2013 Sloan grant for NISO to develop altmetrics standards and best practices are good indicators that the prefix “alt-” might not be meaningful much longer.

We can use altmetrics to overcome many of the shortcomings of citation-based bibliometrics currently in use, and the PLOS Article-Level Metrics project has always been an important part of this change. We and others are working on open source tools that make it affordable for everyone to collect article-level metrics, better tools to analyze and visualize article-level metrics, standards for grouping metrics into categories, standards for data collection, and more. We are positive that we will make good progress developing these new metrics in the coming years, but they will not solve the problem we started out with: The quality of research is something that can’t be measured in numbers.

The PLOS Biology editorial discussing the PLOS Biology paper by Walker-Eyre and Stoletzki goes into a bit more detail discussing the concept of merit, saying that merit is not a single measurable quantity and that we should identify multivariate metrics that are more appropriate to 21st Century science. Maybe we should go one step further: altmetrics are a big improvement over traditional journal-based metrics and over subjective evaluation by a handful of peers, but at the end of the day they are still a construct that measures something different. We should therefore be painfully aware of unintended consequences, including:

  • we create incentives for researchers to do things that “count” and to not engage in activities that “don’t count”
  • we spend more and more time evaluating research rather than doing and discussing research.

We have the opportunity to address both of these issues as well as continue to reconsider how our assessment activities represent our values as a research community. The methods of evaluation always implicitly reflect some set of ideas about what is important (and what is not). Whether they are ones we would explicitly embrace or ones that disappoint… that is up to all of us in the research community.

This entry was posted in Tech and tagged , . Bookmark the permalink.
Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>