To mark evolutionary biologist Megan Head’s May 20 appearance on PLOS Science Wednesday “Ask Me Anything” (AMA) series on redditscience, we re-post this April 9 interview with the author conducted by PLOS Neuro Contributing Editor, Neuroskeptic. In it, Head discusses the recent PLOS Biology research article, “The Extent and Consequences of P-Hacking in Science,” on which she is lead author. Published on March 13, 2015 this paper has clearly struck a chord in the scientific community, having received over 40,000 views.
Update 5/20/15 12:00 pm… A highlight from today’s PLOS Science Wednesday AMA with Megan Head:
Question from ethanjf99: Could you provide an quick “elevator pitch” explanation that I could give to a (reasonably educated) layman (say an undergrad degree in a non-science field, or better) as to
(a) what p-hacking is
(b) why this is an issue
(c) why a non-scientist cares
(d) what questions to ask when reading, say, the latest breathless general media piece (biologists find that compound X causes cancer in lab rats) to help become a skeptical, informed reader of general science news?
Answer from Megan Head:
a) P-hacking is when researchers analyse their results in multiple ways or multiple times until they get their desired result
b) this is an issue because it can make us think that an effect or relationship is more important than it really is
c) these could lead to policies or recommendations based on false results. For instance if their have been a lot of studies showing that a particular drug has no effects on a particular disease if the data were p-hacked it might actually look like the drug helps to prevent the disease, when in fact it doesn’t, then doctors would be prescribing medicines that don’t actually work.
d) trust research that has been replicated lots of times more than one off studies.
To read a full transcript of the completed PLOS r/science AMA with Megan Head, click here.
Neuroskeptic In-Depth with Megan Head
By Neuroskeptic (originally posted on April 9, 2015 on PLOS Neuro Community)
I spoke with Megan Head of the Division of Evolution, Ecology and Genetics at the Australian University in Canberra, AU about her PLOS Biology research article, “The Extent and Consequences of P-Hacking in Science.” Head and Colleagues used automatic text parsing (text mining) to extract published p-values from the Results and Abstract sections of every open access paper in the Pub Med database. Head et al. found an excess of p-values just below 0.05 – this being the threshold that conventionally denotes statistical significance. This implies that p-hacking or other biases are acting to favour the publication of significant results.
NS: In your study, you used text mining to automatically extract p-values from scientific publications. Previous studies have used the same approach – how does your method differ from the ones that went before?
MH: Our method is different from previous work that uses text-mining to look at p-hacking in that it extracts data from the full text (results section) of articles rather than just abstracts.
This is important for studies of p-hacking, because the p-values presented in the abstract may not be representative of p-values presented more generally.
NS: You found that “p-hacking is widespread throughout science” but also that “its effect seems to be weak relative to the real effect sizes being measured.” Were you surprised by either of these results?
I wasn’t surprised that it was widespread. Evidence for p-hacking has been found previously in specific disciplines, and I thought there would be no reason why those disciplines would be special. Initially I was surprised that the effect of p-hacking was weak, but on reflection this makes sense. In my discipline – Evolutionary Biology – data used in meta-analyses often include data from studies where that data did not form part of the primary hypothesis. This kind of data is less likely to be hacked than data relating to the primary hypothesis of a paper. Further, I suspect that in Evolutionary Biology most p-hacking occurs for p-values very close to the significance threshold, so in reality effect sizes aren’t being altered drastically by p-hacking
You looked at all of the Open Access papers in the PubMed database. Do you expect the situation to be any different for non-Open Access papers?
It’s hard to say. There is a lot of debate about quality control in open access journals versus prestige of some non-open access journals, both these factors might affect the extent of p-hacking. But actually I suspect that the result would be the same because papers are often sent to multiple journals before they are eventually published.
When collecting data for this paper we had thought that it would be interesting to compare open access and non-open access journals, however, our inability to easily obtain data from non-open access journals made this impossible. Being able to obtain this kind of data is another advantage of open access that is often neglected.
You’re an evolutionary biologist – what led you to decide to do a study of p-hacking?
I had read lots of papers from the psychology/neuroscience literature on p-hacking and was interested to see how widespread it was and how bad the problem was in my field. The bias created by p-hacking could potentially inhibit scientific progress, so this is an important question for any field. Science is the best method we have for finding out how the world works. But we still need to be critical of our methods to ensure unbiased and rigorous results.
In your opinion, what would be the single best way to reduce or mitigate p-hacking?
I think the best way to reduce p-hacking is to educate researchers on the way that common practices may create bias. Many researchers don’t realise that the methods they employ lead to p-hacking. For example, often when researchers are asked what their sample size will be they reply something like “I’ll do a certain number samples and check the results to see if I need to do more” What they mean is if their results are significant they’ll stop, or if they are close to significant they’ll do more – this is one form of p-hacking.
I hope that our study helps to promote the issues surrounding p-hacking and also highlights that this is an issue that all disciplines should be concerned with.
Are issues around questionable research practices being much discussed in your field of evolutionary biology?
Our group has started doing work on this, but in general, no, it is not really discussed. I think most evolutionary biologists either think that this is a problem with other disciplines or accept that there are problems but think nothing can be done about it.
You point out that “Many researchers don’t realise that the methods they employ lead to p-hacking.” Who do you think bears the responsibility for educating people about this? Is it a matter for statistics lecturers, or is it a broader issue?
We should certainly be teaching about questionable research practices like p-hacking to students early on. However, I wouldn’t put sole responsibility on statistics lecturers, doing that could lead to a long lag in better practice while students move through the ranks. There are plenty of opportunities to be having these conversations for example when advising about experimental design/analyses, when reading draft manuscripts, when discussing journal articles and when reviewing papers.
As you point out, these issues are often discussed in the context of psychology, but the same problems crop up widely – seemingly in almost every field that uses p-values. What do you make of the argument that the best way to stop p-hacking is to stop using p-values altogether?
I’m not convinced that ditching the p-value is the answer to the problem. Studies have shown that hacking can occur with with other metrics that are used to indicate the importance of findings as well – for example effect sizes.
I think researchers need to have a better understanding of what the p-value represents and combine that information with information from other metrics presented. They also need to acknowledge the potential for bias and employ practices that reduce this bias in their research. I think taking arbitrary thresholds too seriously is also a mistake.
Neuroskeptic is a British neuroscientist who takes a skeptical look at his own field, and beyond. On Twitter@Neuro_Skeptic
The views expressed in this post belong to the author and his interviewee and are not necessarily those of PLOS.