Why I love the H-index

The H-index – a small number with a big impact.  First introduced by Jorge E. Hirsh in 2005, it is a relatively simple way to calculate and measure the impact of a scientist (Hirsch, 2005). It divides opinion.  You either love it or hate it. I happen to think the H-index is a superb tool to help assess scientific impact.  Of course, people are always favourable towards metrics that make them look good.  So let’s get this out into the open now, my H-index is 44 (I have 44 papers with at least 44 citations) and, yes, I’m proud of it! But my love of the H-index stems from a much deeper obsession with citations.

As an impressionable young graduate student, I saw my PhD supervisor regularly check his citations.  Citations to papers means that someone used your work or thought it was relevant to mention in the context of their own work.  If a paper was never cited, and perhaps therefore also little read, was it worth doing the research in the first place? I still remember the excitement of the first citation I ever received and I still enjoy seeing new citations roll in.

The H index: what does it mean, how is it calculated and used?

The H-index measures the maximum number of papers N you have, all of which have at least N citations. So if you have 3 papers with at least 3 citations, but you don’t have 4 papers with at least 4 citations then your H-index is 3. Obviously, the H-index can only increase if you keep publishing papers and they are cited.  But the higher your H-index gets, the harder it is to increase it.

One of the ways in which I use the H-index is when making tenure recommendations. By placing the candidate within the context of the H-indices of their departmental peers, I can judge the scientific output of the candidate within the context of the host institution. This is a useful because it can be difficult to understand what is required at different host institutions from around the world.  It would be negligent to only look at H-index and so I use a range of other metrics as well,  together with good old fashioned scientific judgement of their contributions from reading their application and papers.

The m value

One of those extra metrics I use was also introduced by Hirsch, and is called m (Hirsch, 2005). M measures the slope or rate of increase of the H-index over time and is, in my view, a greatly underappreciated measure.   To calculate the m-value, take the researchers H-index and divide by the number of years since their first publication. This measure helps to normalise between those at the early or twilight stages of their career. As Hirsch did for physicists in the field of computational biology, I broadly categorise people according to their m value in the table below.  The boundaries correspond exactly to those used by Hirsch.

m-value – H-index/yr Potential for Scientific Impact
<1.0 Average
1.0-2.0 Above average
2.0-3.0 Excellent
>3.0 Stellar

So post-docs with an m-value of greater than three are future science superstars and highly likely to have a stratospheric rise. If you can find one, hire them immediately!

The H-trajectory

The graph below shows the growth of the H-index for three scientists  - A, B and C – who respectively have an H-index of 12, 15 and 16.  I call these curves a researcher’s H-trajectory.

If we calculate their m-value, then we find that A has a value of 0.5, B has 0.94 and C a value of 1.67. So while each of these researchers has a similar H-index, their likelihood for future growth can be predicted based on past performance. Recently, Daniel Acuna and colleagues presented a sophisticated prediction of future H-index using a number of several features, such as number of publications and the number in top journals (Acuna et al. 2012).

As any serious citation gazer knows, the H-index has numerous potential problems. For example, researcher A who spent time in industry has fewer publications, people with names in non English alphabets or very common names can be difficult to correctly calculate, different fields have widely differing authorship, publication and citation patterns. But even considering all these problems, I believe the H-index is here to stay.  My experience is that ranking scientists by H-index and m-value correlates very well with my own personal judgements about the impact of scientists that I know and indeed with the positions that those scientists hold in Universities around the world.

Alex Bateman is currently a computational biologist at the Wellcome Trust Sanger Institute where he has led the Pfam database project. On Novembert 1st, he takes up a new role as  Head of Protein Sequence Resources at the EMBL-European Bioinformatics Institute (EMBL-EBI).

References

J.E. Hirsch. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. 102, 16569-16572.

D.E. Acuna, S. Allesina & P. Konrad. Predicting scientific success. Nature 489, 201-202.

This entry was posted in Biology, Computational biology, Data, Metrics, Research and tagged , , , . Bookmark the permalink.

29 Responses to Why I love the H-index

  1. Constantin says:
    VA:F [1.9.22_1171]
    Rating: +4 (from 6 votes)

    The problem with the H factor is that it is, to a considerable extent, a measure of how old you are. The m index is supposed to correct this but it can distort things by assuming a linearity that just isn’t there in the development of a scientist.

    The alternative I propose is the H5Y factor. It is the H factor, but calculated only on citations received in the past five years. This equalizes the playing field and my guess is that it is a much better predictor of performance for the next five years than H or m. Who cares what you have published thirty years ago? (unless it is still being cited, of course!)

    • abateman says:
      VN:F [1.9.22_1171]
      Rating: -1 (from 1 vote)

      Most of the H-trajectory plots that I have created for active scientists do show quite a linear trend. I only showed three in my graph above, but researcher A was the only significant deviation that I found. Creating these H-trajectory plots was not as easy as I thought it was going to be. Downloading the full citation data is time consuming given the limits imposed by SCOPUS and ISI. I also found that the underlying data for citations was not nearly as clean as I expected.

    • VA:F [1.9.22_1171]
      Rating: +2 (from 2 votes)

      Google Scholar gives exactly this statistic under the standard h-value.

  2. Ged Ridgway says:
    VA:F [1.9.22_1171]
    Rating: +1 (from 5 votes)

    I agree with Constantin, and would add that the m-index is particularly unfair to those who take early career breaks, since it takes several years before the penalty of having a gap after their first few papers starts to become trivially small.

    Interestingly, Google Scholar’s “My Citations” pages (e.g. see my profile link for a not-so-random example!) does calculate what Constantin proposes, an H-index computed from the last five years’ citations, though they don’t call it H5Y. Personally, though I agree that this is better than H, I still think it’s a rather biased measure of quality, which more strongly reflects quantity or length of active career.

    Funnily enough, I think Google are already using a better measure (which they call H5), but only for their journal rankings, not their author profiles, see e.g.
    scholar.google.co.uk/citations?view_op=top_venues
    This measure is the H-index for work published in the last five years, rather than just cited in the last five years, and they call this H5.

    I think it would be great if Google Citations profiles showed H5 for authors, but frustratingly, Google’s FAQ indicates that they are opposed to adding new metrics:
    http://scholar.google.com/intl/en/scholar/citations.html#citations
    But perhaps Scopus, ResearcherID, Academia.edu, ResearchGate or similar will add H5 in the future…

    • abateman says:
      VN:F [1.9.22_1171]
      Rating: -1 (from 1 vote)

      I agree that it is important to be able to take account of career breaks so that we do not penalise researchers unfairly. Being able to plot the H-trajectory might help spot these. But as I mentioned in the article these metrics should only be used as part of a wider evaluation of individuals outputs. I tend to agree with the google view on the proliferation of metrics that this could lead to more confusion than it solves. But H5-like measures seem like another reasonable way to normalise out the length of career issue.

  3. Pingback: Links 10/22/12 | Mike the Mad Biologist

  4. Subhadeep says:
    VA:F [1.9.22_1171]
    Rating: +4 (from 4 votes)

    I find Google Scholar far better than ISI. It is updated more regularly and gives better representation to publications in non-English journals. I would choose it over others to calculate any sort of index.

  5. joanna k says:
    VA:F [1.9.22_1171]
    Rating: -1 (from 1 vote)

    Ok, but why would the h-index given by an online calculator ever be higher than the number of publications?

  6. Nicolas Le Novere says:
    VA:F [1.9.22_1171]
    Rating: -1 (from 1 vote)

    I agree with Alex. I had the same experience, whether to recruit post-docs, young group leaders or evaluate tenure (and even in one case head of large institute). After 3, 10, 20 or 30 years of research, the h-index and m numbers are very good to evaluate not only the brilliance at one point, but also the steady success. You do not hire the genious who had only one magic paper and nothing else significant. The likelyhood that the magic happens again is very low.
    You have to compare with peers though. Having been an experimental neuroscientist and a computational modeller I know that the citation patterns are quite different. However, when using the H-index to compare people, we are generally in a situation where we compare similar scientists.

    All that of course being a way to quickly sort out A, B or C lists, and uncovering potential problems (100 publications and h-index of 10). After that step, you need to evaluate the candidates more attentively, using interviews etc. But interestingly you very rarely read the publications. In the first screen you have too many of them and in the second you do not need them anymore.

    (and I am “excellent” yeah! Not “stellar” though. One delusion I have to get rid of ;-) )

  7. Walt says:
    VA:F [1.9.22_1171]
    Rating: +3 (from 3 votes)

    Interesting logic exercises. What about superstars who translate their work into patents/products and can not publish due company confidentiality, company goals, etc? Patents are not cited anywhere close to publications. Organic journals often have low impact factors and low citation rates as the animal studies in higher impact journals always overshadow the original synthetic papers. A good friend of mine has an H-index of only 7 but has designed a block-buster drug (and several other promising leads)–I would trade my inflated H-Index (product of a hot, speculative field) in a minute to have his stock options–oh and that drug that helps tens of thousands everyday. Sorry to burst that bubble–H-Indexers. Used to be a believer but I have now seen the light. Yep–and I would also take that “flash in the pan” invention of PCR (and the Nobel) over 50 years of high citations. One flash can have a greater impact than a thousand scientists over a thousand years.

    • DrFreddy says:
      VA:F [1.9.22_1171]
      Rating: +1 (from 1 vote)

      Amen. Coming from someone who was stuck too long in a company in which publishing in the open domain was a big no-no. H-index is one number, but it is not _the_ number. Neither are Google’s variations, and so forth. For example, IQ is another number, it has it uses, but it clearly isn’t _the_ number either. Me not like metrics so much.

  8. Pingback: Friday Coffee Break « Nothing in biology makes sense!

  9. Katy says:
    VA:F [1.9.22_1171]
    Rating: +2 (from 2 votes)

    Thanks for the helpful discussion. I just googled “what is a good h-index” and yours was the first thing to come up. I think as with any single statistic it has limitations, but overall is a decent reflection of output, especially for comparison with similar applicants for a position.

    I’d convert your m index based on h-index/(fte years working since first paper) which would take account of breaks/part-time working. This would particularly help women remain competitive in the context of extended periods of part-time working. So my 10 years since 1st paper would turn into 10-1-(5*0.6) = 6, so my m index is 2 = 12/6 instead of 12/10. Woo!

    Anyway I don’t think anything will stop me checking my citations obsessively and google citations is the easiest place I’ve found to keep my publications organised.

    • abateman says:
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      I think that is a good suggestion. It is important to take account of career breaks when judging peoples scientific output. Its not perfect to just subtract the break length or some combination of time. Even during a career break your pre-break papers will still be cited and potentially increasing your H-index. But to a first approximation what you suggest makes good sense. It would be interesting to look at the H-trajectories of people who have taken a career break to see how it affects growth of H-index.

  10. VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    I like the m-value, but it has the unfortunate effect of penalising the early starter. For example, someone who publishes a paper from their Hons thesis may be penalised by 3-4 years in the denominator producing their m-value when compared with someone starting publishing in the third or fourth year of their PhD. So I would take Kate’s idea further, and use FTE as THE denominator when calculating m, instead of years since first paper. This could include time spent as a PhD student, or not, as long as it was standardised.

    • abateman says:
      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      Yes that is a good point. Publishing a paper during your degree should be seen as a strong positive indicator in my opinion and as you say not penalise the person. OK, so lets use years of FTE employment as the denominator.

  11. Suman Ghosh says:
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    One problem with H or M index can be how many people are actively involved in the research in a particular field. For example, there are only ~186 laboratories in the whole world working on my previous field, Candida albicans. But, right now I am working on cancer biology. Huge number of people are working in that field, hence the h index will increase dramatically.

  12. abateman says:
    VN:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    It is best to only using H-index for comparing people within the same field. I’m not sure that moving field is any guarantee of increasing H-index, but it will be easier for your H-index to grow in the large field. I guess the smart thing to do is to start in cancer biology then move to the specialist field ;-)

  13. Kingkong says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 2 votes)

    I don’t like h index when it is used to rank journals as it basically gives a statistics about the best papers in that journal. For example if nature has a 5 year h index of 300, it only says something about those 300 papers and nothing about the thousands of other papers they published. Because of that, plos one has a very high h index, I think ranked top 30, but that just reflects the number of elite papers being published there, not the tens of thousands of junk papers it publishes..

    My problem with h-index for insividuals is that it does not differentiate first author papers from contributing author papers. A tech could be put in 50 high impact papers over 5-6 years because he is in a super high impact lab for technical contributions. However a post doc in such a lab would have much fewer papers because be would be focusing on making first author papers. However in the end, the tech would have a higher h index. Is that a fair assessment? Also that tech could be a postdoc in name but doing tech work. Would such a technician postdoc be at a higher advantage to employers who only look at h index?

    • deepak says:
      VA:F [1.9.22_1171]
      Rating: 0 (from 0 votes)

      There should be a metrics which weigh the author position. Typically first author does all the work. So the first authorship and the final authorship should have higher weightage compared to other authorships. Personally, I think the name appearing after the third author and before the final author should not have any weightage

  14. Kwame says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 2 votes)

    The extreme scenarios given to discount H-index can be absurd:
    a) A tech having 50 papers! I do not know a tech that is put on 50 papers in 5 years in any lab. If such a tech exists, then he/she is a superstar tech and needs to be celebrated.
    b) why penalize someone who publishes in their PhD with an m index- well you forget that if someone publishes early, then their h index will increase because their papers will start collecting citations early so even if the denominator is increased by a few years, isn’t the numerator also increased?
    c) In chemistry, there was a table of the top 500, based on h-index. All on that table were superstars, by other metrics, and all the recent nobel prize winners were on that list. There was not a single name on that list who was not famous.
    I agree that one can not use h index to different between h of say 15 and 20. But if someone has an h of say 60 and the other has 30, there is usually a light and day between them.
    The h is here to stay.

  15. RobJ says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 2 votes)

    None of the above addresses key weaknesses of the h-index – self citation and citation rings.

    If you work in large collaborations and projects it is *easy* for *many* people to rack up large numbers of citations (and h-indices) by citing each others papers and by simply appearing on lots of papers for which they have done little work. At the very least I believe citations should be a conserved quantity – one citation is one citation, and if it to is a paper with 100 authors then it should not add 1 citation to *each* of those author’s records, it should add 0.01 (or some other agreed fraction dictated by author order such that Sigma (fraction) =1).

    Then, self-citations, both in the form of you citing your own paper, or any papers upon which a co-author appears citing that paper should not count.

    This would cut many h-indices down to size and be a much truer reflection of an individual’s contribution.

    What’s your normalised (by number of authors) h-index, excluding self-citations?

  16. Pingback: Reporte Ciencia UANL » Why I love the H-index

  17. Kwhak says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    Out lab publishes over 40 papers a year and has three techs contributing to almost every paper for technical work. They have higher h then postdocs.

    But what is worse are PIs who do no work and don’t even read the paper but is still on the author list…apparently a common occurrence for high energy physics consortiums.

  18. Pingback: Four great reasons to stop caring so much about the h-index | Impactstory blog

  19. Pingback: Four great reasons to stop caring so much about the h-index | Impactstory blog

  20. Pingback: Blast Off! | Quantixed

  21. Sampath Parthasarathy says:
    VA:F [1.9.22_1171]
    Rating: +1 (from 1 vote)

    1. Take out self citations
    2. Take out review article citations
    3. Have a negative field (topic) correction factor.
    4. Have a negative correction factor for study section member, journal editor etc.
    5. have a name and country correction factor.

    Then let us compete…

  22. Constantin says:
    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

    I strongly agree about taking out citations for review articles. They totally distort the evaluation of a scientists’ worth. Reviews are cited much more highly than oritinal articles and contribute zero to the advancment of science by their authors. Less strongly, I also agree about self-citations because it is so hard to distinguish between genuine ones and irrelevant self-serving ones.

    As both journal editor and study section member, I can assure you that neither capacity does anything to citations. I can’t think of anyone gratuitously citing my papers so that they can get preferential treatment. This is preposterous.

    Finally, topic and country corrections are much more meaningful to apply to the final use of the h-factor than to its calculation. Whether that use is promotion, tenure, new appointment or funding, you are competing against your compatriot colleagues in the same field. Across countries, most responsible decision makers will apply a correction factor. When looking at post-doc applicants, I will rather take someone from, e.g. India with h=3 than from the US with h=5.

Add Comment Register



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>