This week I am incredibly fortunate to be attending the Knight Science Journalism’s Medical Evidence Boot Camp. Today was Day One, with Jennifer Croswell, MD, MPH, of the Agency for Healthcare Research and Quality, and was thoroughly compelling. Breaking down, piece by piece, how medical studies are designed, how to understand how risk is quantified—for example, the very critical difference between absolute risk and relative risk—the meaning of various statistics, and how clinical guidelines are created.

I would like to put all of it down in a blog post because the information is so critical to understanding medicine and being able to parse the constant influx of results and associated clamor for our attention. But alas, as is so often the case, the spirit is willing but the flesh is weak. In other words, it’s almost time for dinner. So, in lieu of a full recall, here’s a tidbit that I loved (among the many tidbits that I loved equally as much).

The *P* value. The statistic to which we’ve trained our eyes to run when presented with an abstract. Here is something I didn’t know: what the *P* stands for. Simply, probability. Speaking of probability, probably many of you already knew this. But I didn’t. I always thought, “*P* means statistically significant” but somehow never managed to connect *P* with probability. A small matter, but still.

Also, .05, you know the number that we always consider to be the barometer of significance? It’s an arbitrary number, an agreed-upon value that indicates the likelihood that the result is not due to chance.

And, here is my favorite gem: a lower *P* value does not mean that the findings are more dramatic. Here is where I risk looking foolish! I always thought that if the *P* value was really, really, really low – like .00000001 – that meant the study findings were somehow better than if the *P* value was simply .04. Nope! It doesn’t mean that at all. Both of those *P* values tell you the same thing: that the observed effect was probably not due to chance. (As an example of how *P* values can be misused, the excellent Jennifer Croswell showed us an advertisement in which a company is touting a *P *value of .000009 for its recent study.)

Lastly, *P* values are NOT a measure of the quality of the study. A study could have been put together in a terrible way, but the P value can still be statistically significant. P values are not a mark of trial quality, and they are not a mark of a measure of the importance of the finding. Croswell gave us the imaginary (I think it was made up…) example of a drug that lowers fever in children by .1 degree. The study was statistically significant, but is lowering a fever from 103.5 to 103.4 meaningful? No.

So, hopefully more to come this week, but a little bit from today’s incredible wealth of information and accompany rich and valuable discussion.

“P values are NOT a measure of the quality of the study.”

Amen. You can come up with a beautifully low P-value, but if you assume a wrong underlying model, then your statistic is wrong, hence the wrong P-value. Or, you may have a very small sample size, which doesn’t capture the whole variation in the population. That leads to a wrong error model and hence a wrong P-value.

Lastly, a P-value of 0.05 is the probability of getting the same results

by pure chance(hence no association whatsoever) if you were to repeat the experiment 100 times. So, think about it: if 100 labs across the world engage in the same experiment, 5% will get a P-value of 0.05 or lessjust by pure chance. And those will be the results that will likely get published, whereas the others won’t.Sorry for the ramble, I posted about this last week, hence I have it fresh in my head.

For your further edification, a p-value is not a statistic in the true sense. It’s a “statistic” in the sense of being a number that is often reported along with research results, but a statistic is actually the number that the p-value assesses the probability of. Examples of statistics are t, z and chi-squared.

A statistic is a quantity calculated from a given sample. Mathematicians have computed many distributions — normal, chi-square, t, etc. If you can assume your sample is such that your calculated statistic follows any of the known distributions, then you can compute the P-value according to the given distribution (as in: if I were to toss a die out of that distribution, how many times would I observe my computed statistic just by chance?)

The problem is exactly in the key question: does the computed statistic follow distribution X or not? Because if it doesn’t, the P-value is meaningless, no matter how small.

Thank you for these comments and additional insights! -Jessica

Came to your website through Digg. You already know I am subscribing to your feed.