I am delighted to offer Mind the Brain readers a guest blog written by Keith Humphreys, Ph.D., John Finney, Ph.D., Alex Sox-Harris, Ph.D., and Daniel Kivlahan, Ph.D. Drs. Humphreys, Sox-Harris, and Finney are at the Palo Alto VA and Stanford University. Dr. Kivlahan is at the Seattle VA and the University of Washington.
Follow Professor Humphreys on Twitter @KeithNHumphreys.
Image Credit: Bogdan, Wikimedia Commons
A team of scientists recently reported that states with laws permitting medical marijuana had lower rates of opioid overdose than states without such laws. In a New York Times essay, two members of the team suggested this state-level association between medical marijuana access and deaths reflects the behavior of individuals in pain:
If enough people opt to treat pain with medical marijuana instead of prescription painkillers in states where this is legal, it stands to reason that states with medical marijuana laws might experience an overall decrease in opioid painkiller overdoses and deaths.
At first blush, saying it “stands to reason” seems, well, reasonable. But in the current issue of the journal in which the study appeared, we point out that the assumption that associations based on aggregations of people (e.g., counties, cities and states) must reflect parallel relationships for individuals is a seductive logical error known as the “ecological fallacy.”
Once you understand the ecological fallacy, you will recognize it in many interpretations of and media reports about science. Here are some examples that have been reported over the years:
Such differences are counter-intuitive and therefore a bit baffling. If individuals having heart attacks who receive high quality care are far more likely to survive, doesn’t it follow that hospitals that provide higher quality care to larger percentages of their heart attack patients would have substantially lower mortality rates? (Answer: No, their results are barely better). Why don’t patterns we see in the aggregate always replicate themselves with individuals, and vice versa?
The mathematical basis for the ecological fallacy has multiple and complex aspects (our detailed explanation here), but most people find it easiest to understand when presented with a simple example. Imagine two states with 100 people each residing in them, with each state population including a comparable proportion of people in pain. Potsylvania has a loosely regulated medical marijuana system that 25% of residents access. Alabstentia, in contrast, limits access to medical marijuana so only 15% of residents can obtain it.
Medical Marijuana User
Medical Marijuana Non-User
Died of Opioid Overdose
Did Not Die of Overdose
Medical Marijuana User
Medical Marijuana Non-User
Died of Opioid Overdose
Did NotDie of Overdose
Ganja-loving Potsylvania has a lower opioid overdose death rate (5%) than more temperate Alabstentia (10%). Does this prove that individuals in those states who use medical marijuana lower their risk of opioid overdose death? Nope. In both states, medical marijuana-users are more likely to die of a pain medication overdose than are non-users: 2 of 25 (8%) of marijuana users dying versus 3 of 75 (4%) marijuana non-users dying in Potsylvania; 4 of 15 (26.6%) of marijuana users dying versus 6 of 85 (7.1%) of non-users dying in Alabstentia!
Embracing the ecological fallacy is tempting, even to very bright people, but it must be resisted if we want to better understand the world around us. So, the next time you see a study saying, for example, that politically conservative states have higher rates of searching for sex and pornography on line and want to immediately speculate about why conservative individuals are so hypocritical, pause and remember that what applies at the aggregate level does not necessarily apply to individuals. For all we know, alienated liberals in red states may just be feeling lonely and frustrated.
As described in the last issue of Mind the Brain, peaceful post-publication peer reviewers (PPPRs) were ambushed by an author and an editor. They used the usual home team advantages that journals have – they had the last word in an exchange that was not peer-reviewed.
As also promised, I will team up in this issue with Magneto to bust them.
Attacks on PPPRs threaten a desperately needed effort to clean up the integrity of the published literature.
In a later issue of Mind the Brain, I will describe an incident in which authors of a published paper had uploaded their data set, but then modified it without notice after PPPRs used the data for re-analyses. The authors then used the modified data for new analyses and then claimed the PPPRs were grossly mistaken. Fortunately, the PPPRs retained time stamped copies of both data sets. You may like to think that such precautions are unnecessary, but just imagine what critics of PPPR would be saying if they had not saved this evidence.
Until journals get more supportive of post publication peer review, we need repeated vigilante actions, striking from Twitter, Facebook pages, and blogs. Unless readers acquire basic critical appraisal skills and take the time to apply them, they will have to keep turning to the social media for credible filters of all the crap that is flooding the scientific literature.
I’ve enlisted Magneto because he is a mutant. He does not have any extraordinary powers of critical appraisal. To the contrary, he unflinchingly applies what we should all acquire. As a mutant, he can apply his critical appraisal skills without the mental anguish and physiological damage that could beset humans appreciating just how bad the literature really is. He doesn’t need to maintain his faith in the scientific literature or the dubious assumption that what he is seeing is just a matter of repeat offender authors, editors, and journals making innocent mistakes.
Humans with critical appraisal risk demoralization and too often shirk from the task of telling it like it is. Some who used their skills too often were devastated by what they found and fled academia. More than a few are now working in California in espresso bars and escort services.
How could I maintain the pretense of scholarly discourse when I am dealing with an author who repeatedly violates basic conventions like ensuring tables and figures correspond to what is claimed in the abstract? Or an arrogant editor who responds so nastily when his slipups are gently brought to his attention and won’t fix the mess he is presenting to his readership?
As a mere human, I needed all the help I could get in keeping my bearings amidst such overwhelming evidence of authorial and editorial ineptness. A little Shakespeare and Monty Python helped.
The statistical editor for this journal is a saucy full-gorged apple-john.
Cognitive Behavioral Techniques for Psychosis: A Biostatistician’s Perspective
Domenic V. Cicchetti, PhD, quintessential biostatistician
Domenic V. Cicchetti, You may be, as your website claims
A psychological methodologist and research collaborator who has made numerous biostatistical contributions to the development of major clinical instruments in behavioral science and medicine, as well as the application of state-of-the-art techniques for assessing their psychometric properties.
But you must have been out of “the quintessential role of the research biostatistician” when you drafted your editorial. Please reread it. Anyone armed with an undergraduate education in psychology and Google Scholar can readily cut through your ridiculous pomposity, you undisciplined sliver of wild belly-button fluff.
You make it sound like the Internet PPPRs misunderstood Jacob Cohen’s designation of effect sizes as small, medium, and large. But if you read a much-accessed article that one of them wrote, you will find a clear exposition of the problems with these arbitrary distinctions. I know, it is in an open access journal, but what you say is sheer bollocks about it paying reviewers. Do you get paid by Journal of Nervous and Mental Disease? Why otherwise would you be a statistical editor for a journal with such low standards? Surely, someone who has made “numerous biostatistical contributions” has better things to do, thou dissembling swag-bellied pignut.
More importantly, you ignore that Jacob Cohen himself said
The terms ‘small’, ‘medium’, and ‘large’ are relative . . . to each other . . . the definitions are arbitrary . . . these proposed conventions were set forth throughout with much diffidence, qualifications, and invitations not to employ them if possible.
Cohen J. Statistical power analysis for the behavioural sciences. Second edition, 1988. Hillsdale, NJ: Lawrence Earlbaum Associates. p. 532.
Could it be any clearer, Dommie?
You suggest that the internet PPPRs were disrespectful of Queen Mother Kraemer in not citing her work. Have you recently read it? Ask her yourself, but she seems quite upset about the practice of using effects generated from feasibility studies to estimate what would be obtained in an adequately powered randomized trial.
Pilot studies cannot estimate the effect size with sufficient accuracy to serve as a basis of decision making as to whether a subsequent study should or should not be funded or as a basis of power computation for that study.
Okay you missed that, but how about:
A pilot study can be used to evaluate the feasibility of recruitment, randomization, retention, assessment procedures, new methods, and implementation of the novel intervention. A pilot study is not a hypothesis testing study. Safety, efficacy and effectiveness are not evaluated in a pilot. Contrary to tradition, a pilot study does not provide a meaningful effect size estimate for planning subsequent studies due to the imprecision inherent in data from small samples. Feasibility results do not necessarily generalize beyond the inclusion and exclusion criteria of the pilot design.
A pilot study is a requisite initial step in exploring a novel intervention or an innovative application of an intervention. Pilot results can inform feasibility and identify modifications needed in the design of a larger, ensuing hypothesis testing study. Investigators should be forthright in stating these objectives of a pilot study.
Dommie, although you never mention it, surely you must appreciate the difference between a within-group effect size and a between-group effect size.
Interventions do not have meaningful effect sizes, between-group comparisons do.
As I have previously pointed out
When you calculate a conventional between-group effect size, it takes advantage of randomization and controls for background factors, like placebo or nonspecific effects. So, you focus on what change went on in a particular therapy, relative to what occurred in patients who didn’t receive it.
Turkington recruited a small, convenience sample of older patients from community care who averaged over 20 years of treatment. It is likely that they were not getting much support and attention anymore, whether or not they ever were. The intervention that Turkington’s study provided that attention. Maybe some or all of any effects were due to simply compensating for what was missing from from inadequate routines care. So, aside from all the other problems, anything going on in Turkington’s study could have been nonspecific.
Recall that in promoting his ideas that antidepressants are no better than acupuncture for depression, Irving Kirsh tried to pass off within-group as equivalent to between-group effect sizes, despite repeated criticisms. Similarly, long term psychodynamic psychotherapists tried to use effect sizes from wretched case series for comparison with those obtained in well conducted studies of other psychotherapies. Perhaps you should send such folks a call for papers so that they can find an outlet in Journal of Nervous and Mental Disease with you as a Special Editor in your quintessential role as biostatistician.
Douglas Turkington’s call for a debate
Professor Douglas Turkington: “The effect size that got away was this big.”
Doug, as you requested, I sent you a link to my Google Scholar list of publications. But you still did not respond to my offer to come to Newcastle and debate you. Maybe you were not impressed. Nor did you respond to Keith Law’s repeated request to debate. Yet you insulted internet PPPR Tim Smits with the taunt,
Click to Enlarge
You congealed accumulation of fresh cooking fat.
I recommend that you review the recording of the Maudsley debate. Note how the moderator Sir Robin Murray boldly announced at the beginning that the vote on the debate was rigged by your cronies.
Do you really think Laws and McKenna got their asses whipped? Then why didn’t you accept Laws’ offer to debate you at a British Psychological Society event, after he offered to pay your travel expenses?
High-Yield Cognitive Behavioral Techniques for Psychosis Delivered by Case Managers…
Dougie, we were alerted that bollacks would follow with the “high yield” of the title. Just what distinguishes this CBT approach from any other intervention to justify “high yield” except your marketing effort? Certainly, not the results you have obtained from an earlier trial, which we will get to.
Where do I begin? Can you dispute what I said to Dommie about the folly of estimating effect sizes for an adequately powered randomized trial from a pathetically small feasibility study?
I know you were looking for a convenience sample, but how did you get from Newcastle, England to rural Ohio and recruit such an unrepresentative sample of 40 year olds with 20 years of experience with mental health services? You don’t tell us much about them, not even a breakdown of their diagnoses. But would you really expect that the routine care they were currently receiving was even adequate? Sure, why wouldn’t you expect to improve upon that with your nurses? But would you be demonstrating?
The PPPR boys from the internet made noise about Table 2 and passing reference to the totally nude Figure 5 and how claims in the abstract had no apparent relationship to what was presented in the results section. And how nowhere did you provide means or standard deviations. But they did not get to Figure 2 Notice anything strange?
Despite what you claim in the abstract, none of the outcomes appear significant. Did you really mean standard error of measurement (SEMs), not standard deviations (SDs)? People did not think so to whom I showed the figure.
If your goal is to emphasize small and unimportant differences in your data, show your error bars as SEM, and hope that your readers think they are SD.
If our goal is to cover-up large differences, show the error bars as the standard deviations for the groups, and hope that your readers think they are a standard errors.
Why did you expect to be able to talk about effect sizes of the kind you claim you were seeking? The best meta analysis suggests an effect size of only .17 with blind assessment of outcome. Did you expect that unblinding assessors would lead to that much more improvement? Oh yeh, you cited your own previous work in support:
That intervention improved overall symptoms, insight, and depression and had a significant benefit on negative symptoms at follow-up (Turkington et al., 2006).
Let’s look at Table 1 from Turkington et al., 2006.
A consistent spinning of results
Don’t you just love those three digit significance levels that allow us to see that p =.099 for overall symptoms meets the apparent criteria of p < .10 in this large sample? Clever, but it doesn’t work for depression with p = .128. But you have a track record of being sloppy with tables. Maybe we should give you the benefit of a doubt and ignore the table.
But Dougie, this is not some social priming experiment with college students getting course credit. This is a study that took up the time of patients with serious mental disorder. You left some of them in the squalor of inadequate routine care after gaining their consent with the prospect that they might get more attention from nurses. And then with great carelessness, you put the data into tables that had no relationship to the claims you were making in the abstract. Or in your attempts to get more funding for future such ineptitude. If you drove your car like you write up clinical trials, you’d lose your license, if not go to jail.
The 2014 Lancet study of cognitive therapy for patients with psychosis
Forgive me that I missed until Magneto reminded me that you were an author on the, ah, controversial paper
Morrison, A. P., Turkington, D., Pyle, M., Spencer, H., Brabban, A., Dunn, G., … & Hutton, P. (2014). Cognitive therapy for people with schizophrenia spectrum disorders not taking antipsychotic drugs: a single-blind randomised controlled trial. The Lancet, 383(9926), 1395-1403.
But with more authors than patients remaining in the intervention group at follow up, it is easy to lose track.
You and your co-authors made some wildly inaccurate claims about having shown that cognitive therapy was as effective as antipsychotics. Why, by the end of the trial, most of the patients remaining in follow up were on antipsychotic medication. Is that how you obtained your effectiveness?
But neither a retraction nor even a formal expression of concern has appeared.
Maybe matters can be left as they now are. In the social media, we can point to the many problems of the article like a clogged toilet warning that Journal of Nervous and Mental Disease is not a fit place to publish – unless you are seeking exceeding inept or nonexistent editing and peer review.
Vigilantes can periodically tweet Tripadvisor style warnings, like
Now, Dommie and Dougie, before you again set upon some PPPRs just trying to do their jobs for little respect or incentive, consider what happened this time.
Special thanks are due for Magneto, but Jim Coyne has sole responsibility for the final content. It does not necessarily represent the views of PLOS blogs or other individuals or entities, human or mutant.
Some of those involved in the Twitter exchange banded together in writing a letter to the editor.
Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2014). Statistical errors and omissions in a trial of cognitive behavior techniques for psychosis: commentary on Turkington et al. The Journal of Nervous and Mental Disease, 202(7), 566.
Lakens explained in his blog
Now I understand that getting criticism on your work is never fun. In my personal experience, it very often takes a dinner conversation with my wife before I’m convinced that if people took the effort to criticize my work, there must be something that can be improved. What I like about this commentary is that is shows how Twitter is making post-publication reviews possible. It’s easy to get in contact with other researchers to discuss any concerns you might have (as Keith did in his first Tweet). Note that I have never met any of my co-authors in real life, demonstrating how Twitter can greatly extend your network and allows you to meet interesting and smart people who share your interests. Twitter provides a first test bed for your criticisms to see if they hold up (or if the problem lies in your own interpretation), and if a criticism is widely shared, can make it fun to actually take the effort to do something about a paper that contains errors.
It might be slightly weird that Tim, Stuart, and myself publish a comment in the Journal of Nervous and Mental Disease, a journal I guess none of us has ever read before. It also shows how Twitter extends the boundaries between scientific disciplines. This can bring new insights about reporting standards from one discipline to the next. Perhaps our comment has made researchers, reviewers, and editors who do research on cognitive behavioral therapy aware of the need to make sure they raise the bar on how they report statistics (if only so pesky researchers on Twitter leave you alone!). I think this would be great, and I can’t wait until researchers from another discipline point out statistical errors in my own articles that I and my closer peers did not recognize, because anything that improves the way we do science (such as Twitter!) is a good thing.
Hindsight: If the internet group had been the original reviewers of the article…
The letter was low key and calmly pointed out obvious errors. You can see it here. Tim Smit’s blog Don’t get all psychotic on this paper: Had I (or we) Been A Reviewer (HIBAR) describes what had to be left out to keep within the word limit.
Table 2 had lots of problems –
The confidence intervals were suspiciously wide.
The effect sizes seemed too large for what the modest sample size should yield.
The table was inconsistent with information in the abstract.
Neither they table nor the accompanying text had any test of significance nor reporting of means and standard deviations.
Confidence intervals for two different outcomes were identical, yet one had the same value for its effect size as its lower bound.
Figure 5 was missing labels and definitions on both axes, rendering it uninterpretable. Duh?
The authors of the letter were behaving like a blue helmeted international peacekeeping force, not warriors attacking bad science.
But you don’t send peacekeeping troops into an active war zone.
In making recommendations, the Internet group did politely introduce the R word:
We believe the above concerns mandate either an extensive correction, or perhaps a retraction, of the article by Turkington et al. (2014). At the very least, the authors should reanalyze their data and report the findings in a transparent and accurate manner.
Fair enough, but I doubt the authors of the letter appreciated how upsetting this reasonable advice was or anticipated what reaction would be coming.
A response from an author of the article and a late night challenge to debate
The first author of the article published a reply
Turkington, D. (2014). The reporting of confidence intervals in exploratory clinical trials and professional insecurity: a response to Ritchie et al. The Journal of Nervous and Mental Disease, 202(7), 567.
He seemed to claim to re-examine the study data and
The findings were accurately reported.
A table of means and standard deviations was unnecessary because of the comprehensive reporting of confidence intervals and p-values in the article.
The missing details from the figure were self-evident.
The group who had assembled on the internet was not satisfied. An email exchange with Turkington and the editor of the journal confirmed that Turkington had not actually re-examined the raw file data, but only a summary with statistical tables.
The group requested the raw data. In a subsequent letter to the editor, they would describe Turkington as timely the providing the data, but the exchange between them was anything but cordial. Turkington at first balked, saying that the data were not readily available because the statistician had retired. He nonetheless eventually provided the data, but not before first sending off a snotty email –
Click to Enlarge
Tim Smit declined:
Thanks for providing the available data as quick as possible. Based on this and the tables in the article, we will try to reconstruct the analysis and evaluate our concerns with it.
With regard to your recent invitation to “slaughter” me at Newcastle University, I politely want to decline that invitation. I did not have any personal issue in mind when initiating the comment on your article, so a personal attack is the least of my priorities. It is just from a scientific perspective (but an outsider to the research topic) that I was very confused/astonished about the lack of reporting precision and what appears to be statistical errors. So, if our re-analysis confirms that first perception, then I am of course willing to accept your invitation at Newcastle university to elaborate on proper methodology in intervention studies, since science ranks among the highest of my priorities.
When I later learned of this email exchange, I wrote to Turkington and offered to go to Newcastle to debate either as Tim Smits’ second or to come alone. Turkington asked me to submit my CV to show that I wasn’t a crank. I complied, but he has yet to accept my offer.
A reanalysis of the data and a new table
Smits, T., Lakens, D., Ritchie, S. J., & Laws, K. R. (2015). Correcting Errors in Turkington et al.(2014): Taking Criticism Seriously. The Journal of nervous and mental disease, 203(4), 302-303.
The group reanalyzed the data and the title of their report leaked some frustration.
We confirmed that all the errors identified by Smits et al. (2014) were indeed errors. In addition, we observed that the reported effect sizes in Turkington et al. (2014) were incorrect by a considerable margin. To correct these errors, Table 2 and all the figures in Turkington et al. (2014) need to be changed.
The sentence in the Abstract where effect sizes are specified needs to be rewritten.
A revised table based on their reanalyses was included:
Given the recommendation of their first letter was apparently dismissed –
To conclude, our recommendation for the Journal and the authors would now be to acknowledge that there are clear errors in the original Turkington et al. (2014) article and either accept our corrections or publish their own corrigendum. Moreover, we urge authors, editors, and reviewers to be rigorous in their research and reviewing, while at the same time being eager to reflect on and scrutinize their own research when colleagues point out potential errors. It is clear that the authors and editors should have taken more care when checking the validity of our criticisms. The fact that a rejoinder with the title “A Response to Ritchie et al. [sic]” was accepted for publication in reply to a letter by Smits et al. (2014) gives the impression that our commentary did not receive the attention it deserved. If we want science to be self-correcting, it is important that we follow ethical guidelines when substantial errors in the published literature are identified.
Sound and fury signifying nothing
Publication of their letter was accompanied by a blustery commentary from the statistical editor for the journal full of innuendo and pomposity.
He suggested that the team assembled on the internet
reanalyzed the data of Turkington et al. on the basis that it contained some serious errors that needed to be corrected. They also reported that the statistic that Turkington et al. had used to assess effect sizes (ESs) was an inappropriate metric.
Well, did Turkington’s table contain errors and was the metric inappropriate? If so, was a formal correction or even retraction needed? Cicchetti reproduced the internet groups’ table, but did not immediately offer his opinion. So, the uncorrected article stands as published. Interested persons downloading it from behind the journal’s paywall won’t be alerted to the controversy.
Instead of dealing with the issues at hand, Cicchetti launched into an irrelevant lecture about Jacob Cohen’s arbitrary designation of effect sizes as small, medium, or large. Anything he said had already appeared clearer and more accurately in an article by Daniel Laken, one of the internet group authors. Cicchetti cited that article, but only as a basis for libeling the open access journal in which it appeared.
To be perfectly candid, the reader needs to be informed that the journal that published the Lakens (2013) article, Frontiers in Psychology, is one of an increasing number of journals that charge exorbitant publication fees in exchange for free open access to published articles. Some of the author costs are used to pay reviewers, causing one to question whether the process is always unbiased, as is the desideratum. For further information, the reader is referred to the following Web site: http://www.frontiersin.org/Psychology/fees.
Cicchetti further chastised the internet group for disrespecting the saints of power analysis.
As an additional comment, the stellar contributions of Helena Kraemer and Sue Thiemann (1987) were noticeable by their very absence in the Smits et al. critique. The authors, although genuinely acknowledging the lasting contributions of Jacob Cohen to our understanding of ES and power analysis, sought to simplify the entire enterprise
Jacob Cohen is dead and cannot speak. But good Queen Mother Helena is very much alive and would surely object to being drawn into this nonsense. I encourage Cicchetti to ask what she thinks.
Ah, but what about the table based on the re-analyses of the internet group that Cicchetti had reproduced?
The reader should also be advised that this comment rests upon the assumption that the revised data analyses are indeed accurate because I was not privy to the original data.
Actually, when Turkington sent the internet group the study data, he included Cicchetti in the email.
The internet group experienced one more indignity from the journal that they had politely tried to correct. They had reproduced Turkington’s original table in their letter. The journal sent them an invoice for 106 euros because the table was copyrighted. It took a long email exchange before this billing was rescinded.
Science Needs Vigilantes
Imagine a world where we no longer depend on a few cronies of an editor to decide once and forever the value of a paper. This would replace the present order in which much of the scientific literature is untrustworthy, where novelty and sheer outrageousness of claims are valued over robustness.
Imagine we have constructed a world where post publication commentary is welcomed and valued. Data are freely available for reanalysis and the rewards are there for performing those re-analyses.
We clearly are not there yet and certainly not with this flawed article. The sequence of events that I have described has so far not produced a correction of a paper. As it stands, the paper concludes that nurses can and should be given a brief training that will allow them to effectively treat patients with severe and chronic mental disorder. This paper encourages actions that may put such patients and society at risk because of ineffectual and neglectful treatment.
The authors of the original paper and the editor responded with dismissal of the criticisms, ridicule, and, the editor at least, libeling open access journals. Obviously, we have not reached the point at which those willing to re-examine and if necessary, re-analyze data, are appropriately respected and protected from unfair criticism. The current system of publishing gives authors who have been questions and editors who are defensive of their work, no matter how incompetent and inept it may be, the last word. But there is always the force of social media- tweets and blogs.
The critics were actually much too kind and restrained in a critique narrowly based on re-analyses. They ignored so much about
The target paper as an underpowered feasibility study being passed off a source of estimates of what a sufficiently sized randomized trial would yield.
The continuity between the mischief done in this article with tricks and spin in the past work of the author Turkington.
The laughably inaccurate lecture of the editor.
The lowlife journal in which the article was published.
These problems deserve a more unrestrained and thorough trashing. Journals may not yet be self-correcting, but blogs can do a reasonable job of exposing bad science.
Science needs vigilantes, because of the intransigence of those pumping crap into the literature.
Coming up next
In my next issue of Mind the Brain I’m going to team up with Magneto. You may recall I previously collaborated with him and Neurocritic to scrutinize some junk science that Jim Coan and Susan Johnson had published in PLOS One. Their article crassly promoted to clinicians what they claimed was a brain-soothing couples therapy. We obtained an apology and a correction in the journal for undeclared conflict of interest.
But that incident left Magneto upset with me. He felt I did not give sufficient attention to the continuity between how Coan had slipped post hoc statistical manipulations in the PLOS article to get positive results and what he had done in a past paper with Richard Davison. Worse, I had tipped off Jim Coan about our checking his work. Coan launched a pre-emptive tirade against post-publication scrutiny, his now infamous Negative Psychology rant He focused his rage on Neuroskeptic, not Neurocritic or me, but the timing was not a coincidence. He then followed up by denouncing me on Facebook as the Chopra Deepak of skepticism.
I still have not unpacked that oxymoronic statement and decided if it was a compliment.
OK, Magneto, I will be less naïve and more thorough this round. I will pass on whatever you uncover.
Check back if you just want to augment your critical appraisal skills with some unconventional ones or if you just enjoy a spectacle. If you want to arrive at your own opinions ahead of time, email Douglas Turkington firstname.lastname@example.org and for a PDF of his paywalled article. Tell him I said hello. The offer of a debate still stands.
Sundquist, J., Lilja, Å., Palmér, K., Memon, A. A., Wang, X., Johansson, L. M., & Sundquist, K. (2014). Mindfulness group therapy in primary care patients with depression, anxiety and stress and adjustment disorders: randomised controlled trial. The British Journal of Psychiatry.
was previously reviewed in Mental Elf. You might want to consider their briefer evaluation before beginning mine. I am going to be critical not only of the article, but the review process that got it into British Journal of Psychiatry (BJP).
I am an Academic Editor of PLOS One,* where we have the laudable goal of publishing all papers that are transparently reported and not technically flawed. Beyond that, we leave decisions about scientific quality to post-publication commentary of the many, not a couple of reviewers whom the editor has handpicked. Yet, speaking for myself, and not PLOS One, I would have required substantial revisions or rejected the version of this paper that got into the presumably highly selective, even vanity journal BJP**.
The article is paywalled, but you can get a look at the abstract here and write to the corresponding author for a PDF at Jan.email@example.com
As always, examine the abstract carefully when you suspect spin, but expect that you will not fully appreciate the extent of spin until you have digested the whole paper. This abstract declares
Mindfulness-based group therapy was non-inferior to treatment as usual for patients with depressive, anxiety or stress and adjustment disorders.
“Non-inferior” meaning ‘no worse than routine care?’ How could that null result be important enough to get into a journal presumably having a strong confirmation bias? The logic sounds just like US Senator George Aiken famously proposing getting America out of the war it was losing in Vietnam by declaring America had won and going home.
There are hints of other things going on, like no reporting of how many patients were retained for analysis or whether there were intention-to-treat analyses. And then the weird mention of outcomes being analyzed with “ordinal mixed models.” Have you ever seen that before? And finally, do the results hold for patients with any of those disorders or only a particular sample of unknown mix and maybe only representing those who could be recruited from specific settings? Stay tuned…
What is a non-inferiority trial and when should one conduct one?
The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.
Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs so that misuse of the design in the service of getting publishable results is more readily detected. You might want to download the CONSORT checklist for NI and apply the checklist to the trial under discussion. Right away, you can see how deficient the the reporting is in the abstract of the paper under discussion.
Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.
Conflicts of interest could be a reason. Demonstration that the treatment is as good as existing treatments might warrant marketing of the new treatment or dissemination into existing markets. There could be financial rewards or simply promoters and enthusiasts favoring what they would find interesting. Yup, some bandwagons, some fads and fashions psychotherapy are in large part due to promoters simply seeking the new and different, without evidence that a treatment is better than existing ones.
Suspicions are reduced when the new treatment has other advantages, like greater acceptability or a lack of side effects, or when the existing treatments are so good that an RCT of the new treatment with a placebo-control condition would be unethical.
We should give evaluate whether there is an adequate rationale for authors doing an NI RCT, rather than them relying on the conventional test whether the null hypothesis can be rejected of no differences between the intervention and a control condition. Suitable support would be a strong record of efficacy for a well defined control condition. It would also help if the trial were pre-registered as NI, quieting concerns that it was declared as such after peeking at the data.
The first things I noticed in the methods section…trouble
The recruitment procedure is strangely described, but seems to indicate that the therapist providing mindfulness training were present during recruitment and probably weren’t blinded to group assignment and conceivably could influence it. The study thus does not have clear evidence of an appropriate randomization procedure and initial blinding. Furthermore, the GPs administering concurrent treatment also were not blinded and might take group assignment into account in subsequent prescribing and monitoring of medication.
During the recruitment procedure, GPs assessed whether medication was needed and made prescriptions before randomization occurred. We will need to see – we are not told in the methods section – but I suspect a lot of medication is being given to both intervention and control patients. That is going to complicate interpretation of results.
In terms of diagnosis, a truly mixed group of patients was recruited. Patients experiencing stress or adjustment reactions were thrown in with patients who had mild or moderate depression or anxiety disorders. Patients were excluded who were considered severe enough to need psychiatric care.
Patients receiving any psychotherapy at the start of the trial were excluded, but the authors ignored whether patients were receiving medication.
This appears to be a mildly distressed sample that is likely to show some recovery in the absence of any treatment. The authors’ not controlling for the medication was received is going to be a big problem later. Readers won’t be able to tell whether any improvement in the intervention condition is due to its more intensive support and encouragement that results in better adherence to medication.
The authors go overboard in defending their use of multiple overlapping
Play Elvis is Dead at athttp://tinyurl.com/p78pzcn
measures and overboard in praising the validity of their measures. For instance, The Hospital Anxiety and Depression Scale (HADS) is a fatally flawed instrument, even if still widely used. I considered the instrument dead in terms of reliability and validity, but like Elvis, it is still being cited.
Okay, the authors claim these measures are great, and attach clinical importance to cut points that others no longer consider valid. But then, why do they decide that the scales are ordinal, not interval? Basically, they are saying the scales are so bad that the differences between one number to the next higher or lower for pairs of items can’t be considered equal. This is getting weird. If the scales are as good as the authors claim, why do the authors take the unusual step of considering them as psychometrically inadequate?
I know, I’m getting technical to the point that I risk losing some readers, but the authors are setting readers up to be comfortable with a decision to focus on medians, not mean scores – making it more difficult to detect any differences between the mindfulness therapy and routine care. Spin, spin!
There are lots of problems with the ill described control condition, treatment as usual (TAU). My standing gripe with this choice is that TAU varies greatly across settings, and often is so inadequate that at best the authors are comparing whether mindfulness therapy is better than some unknown mix of no treatment and inadequate treatment.
We know enough about mindfulness therapy at this point to not worry about whether it is better than nothing at all, but should be focusing on whether is better than another active treatment and whether its effectiveness is due to particular factors. The authors state that most of the control patients were receiving CBT, but don’t indicate how they knew that, except for case records. Notoriously, a lot of the therapy done in primary care that is labeled by practitioners as CBT does not pass muster. I would be much more comfortable with some sort of control over what patients were receiving in the control arm, or at least better specification.
I’m again trying to avoid getting very technical here, but point out for those who have a developed interest in statistics, that there were strange things going on.
Particular statistical analyses (depending on group medians, rather than means are chosen that are less likely to reveal differences between intervention and control group than the parametric statistics that are typically done.
Complicated decisions justify throwing away data and then using multivariate techniques to estimate what the data were. The multivariate techniques require assumptions that are not tested.
The power analysis is not conducted to detect differences between groups, but to be able to provide a basis for saying that mindfulness does not differ from routine care. Were the authors really interested in that question rather than whether mindfulness is better than routine care in initially designing a study and its analytic plan? Without pre-registration, we cannot know.
There are extraordinary revelations in table 1, baseline characteristics.
Please click to enlarge
The intervention and control group initially differed for two of the four outcome variables before they even received the intervention. Thus, intervention and control conditions are not comparable in important baseline characteristics. This is in itself a risk of bias, but also raises further questions about the adequacy of the randomization procedure and blinding.
We are told nothing about the distribution of diagnoses across the intervention and control group, which is very important in interpreting results and considering what generalizations can be made.
Most patients in both the intervention and control groups were receiving antidepressants and about a third of them either condition were receiving a “tranquilizer” or have missing data for that variable.
Signals that there is something amiss in this study are growing stronger. Given the mildness of disturbance and high rates of prescription of medication, we are likely dealing with a primary care sample where medications are casually distributed and poorly monitored. Yet, this study is supposedly designed to inform us whether adding mindfulness to this confused picture produces outcomes that are not worse.
Table 5 adds to the suspicions. There were comparable, significant changes in both the intervention and control condition over time. But we can’t know if that was due to the mildness of distress or effectiveness of both treatments.
Please click to enlarge
Twice as many patients assigned to mindfulness dropped out of treatment, compared to those assigned to routine care. Readers are given some information about how many sessions of mindfulness patients attended, but not the extent to which they practiced mindfulness.
We are told
The main finding of the present RCT is that mindfulness group therapy given in a general practice setting, where a majority of patients with depression, anxiety, and stress and adjustment disorders are treated, is non-inferior to individual-based therapy, including CBT. To the best of our knowledge, this is the first RCT performed in a general practice setting where the effect of mindfulness group therapy was compared with an active control group.
Although a growing body of research has examined the effect of mindfulness on somatic as well as psychiatric conditions, scientific knowledge from RCT studies is scarce. For example, a 2007 review…
It’s debatable whether the statement was true in 2007, but a lot has happened since then. Recent reviews suggest that mindfulness therapy is better than nothing and better than inactive control conditions that do not provide comparable levels of positive expectations and support. Studies are accumulating that indicate mindfulness therapy is not consistently better than active control conditions. Differences become less likely when the alternative treatments are equivalent in positive expectations conveyed to patients and providers, support, and intensity in terms of frequency and amount of contact. Resolving this latter question of whether mindfulness is better than reasonable alternatives is now critical in this study provides no relevant data.
An Implications section states
Patients who receive antidepressants have a reported remission rate of only 35–40%.41 Additional treatment is therefore needed for non-responders as well as for those who are either unable or unwilling to engage in traditional psychotherapy.
The authors are being misleading to the point of being irresponsible in making this statement in the context of discussing the implications of their study. The reference is to the American STAR*D treatment study, which dealt with very different, more chronically and unremittingly depressed population.
An appropriately referenced statement about primary care populations like what this study was recruited would point to the lack of diagnosis on which prescription of medicaton was based, unnecessary treatment with medication of patients who would not be expected to benefit from it, and poor monitoring and follow-up of patients who could conceivably benefit from medication if appropriately minutes. The statement would reflect the poor state of routine care for depression in the community, but would undermine claims that the control group received an active treatment with suitable specification that would allow any generalizations about the efficacy of mindfulness.
This RCT has numerous flaws in its conduct and reporting that preclude making any contribution to the current literature about mindfulness therapy. What is extraordinary is that, as a null trial, it got published in BJP. Maybe its publication in its present form represents incompetent reviewing and editing, or maybe a strategic, but inept decision to publish a flawed study with null findings because it concerns the trendy topic of mindfulness and GPs to whom British psychiatrists want to reach out.
An RCT of mindfulness psychotherapy is attention-getting. Maybe the BJP is willing to sacrifice trustworthiness of the interpretation of results for newsworthiness. BJP will attract readership it does not ordinarily get with publication of this paper.
What is most fascinating is that the study was framed as a noninferiority trial and therefore null results are to be celebrated. I challenge anyone to find similar instances of null results for a psychotherapy trial being published in BJP except in the circumstances that make a lack of effect newsworthy because it suggests that investment in the dissemination of a previously promising treatment is not justified. I have a strong suspicion that this particular paper got published because the results were dressed up as a successful demonstration of noninferiority.
I would love to see the reviews this paper received, almost as much as any record of what the authors intended when they planned the study.
Will this be the beginning of a trend? Does BJP want to encourage submission of noninferiority psychotherapy studies? Maybe the simple explanation is that the editor and reviewers do not understand what a noninferiority trial is and what it can conceivably conclude.
Please, some psychotherapy researcher with a null trial sitting in the drawer, test the waters by dressing the study up as a noninferior trial and submitted to BJP.
How bad is this study?
The article provides a non-intention-to-treat analysis of a comparison of mindfulness to an ill specified control condition that would not qualify as an active condition. The comparison does not allow generalization to other treatments in other settings. The intervention and control conditions had significant differences in key characteristics at baseline. The patient population is ill-described in ways that does not allow generalization to other patient populations. The high rates of co-treatment confounding due to antidepressants and tranquilizers precludes determination of any effects of the mindfulness therapy. We don’t know if there were any effects, or if both the mindfulness therapy and control condition benefited from the natural decline in distress of a patient population largely without psychiatric diagnoses. Without a control group like a waiting list, we can’t tell if these patients would have improved any way. I could go on but…
This study was not needed and may be unethical
The accumulation of literature is such that we need less mindfulness therapy research, not more. We need comparisons with well specified active control groups that can answer the question of whether mindfulness therapy offers any advantage over alternative treatments, not only in efficacy, but in the ability to retain patients so they get an adequate exposure to the treatment. We need mindfulness studies with cleverly chosen comparison conditions that allow determination of whether it is the mindfulness component of mindfulness group therapy that has any effectiveness, rather than relaxation that mindfulness therapy shares with other treatments.
To conduct research in patient populations, investigators must have hypotheses and methods with the likelihood of making a meaningful contribution to the literature commensurate with all the extra time and effort they are asking of patients. This particular study fails this ethical test.
Finally, the publication of this null trial as a noninferiority trial pushes the envelope in terms of the need for preregistration of design and analytic plans for trials. If authors of going to claim a successful demonstration of non-inferiority, we need to know that is what they set out to do, rather than just being stuck with null findings they could not otherwise publish.
*DISCLAIMER: This blog post presents solely the opinions of the author, and not necessarily PLOS. Opinions about the publishability of papers reflect only the author’s views and not necessarily an editorial decision for a manuscript submitted to PLOS One.
Last month, Netflix released Season 3 of House of Cards. In light of this, I am reposting a blog I wrote about the second season of the series last year: “Claire Underwood From Netflix’s House of Cards: Narcissistic Personality Disorder?”
Last month I used the character of Frank Underwood as a “case study” to illustrate the misunderstood psychiatric diagnosis of Antisocial Personality Disorder, and many of you asked: Well, what about his wife, Claire?
Good question! You asked, and so today I will do my best to answer.
SPOILER ALERT: For those of you who have not yet watched all of Season 2 yet, consider yourself warned.
Clinical lore would certainly support that Claire, herself, must have a personality disorder of some kind – a sort of fatal attraction, where a couple is drawn to each other because there is something in their personality patterns which is complementary and reciprocal.
She does appear to have mastered the art of turning a blind eye to Frank’s more antisocial exploits. She is a highly intelligent woman, and she must have some inkling that her husband may be involved in the death of Zoe Barnes and Peter Russo. But if she has an inkling, she does not show it.
Claire, from what we know, does not engage in outright antisocial behavior. Unlike Frank, she has not murdered anyone and we have not seen her engage in very reckless or impulsive outbursts.
However, she rarely shows emotion—her smiles seem fake, her laugh empty, and her expressions are bland. She is more restrained and guarded than Frank, and she does not reveal her inner thoughts to the viewer the way Frank does so it is much harder to know what could be going on in her mind.
Still, I think I have seen enough to venture forth with an assertion that she may have a Narcissistic Personality Disorder.
What is Narcissistic Personality Disorder?
A pervasive pattern of grandiosity, need for admiration, and lack of empathy beginning by early adulthood and present in a variety of contexts, as indicated by five (or more) of 9 criteria.
Below are the five criteria that I think apply to Claire:
1) Has a sense of entitlement (i.e. unreasonable expectations of especially favorable treatment or automatic compliance with his or her expectations)
She expected Galloway to take the blame for the photos that were leaked and eventually claim it was all a “publicity stunt,” thus ruining his own reputation and image. She expressed no regret that her ex-lover was cornered into having to do this, on her behalf, and no remorse that it almost ruined his life and his relationship with his fiancé. She was entitled to this act because she is “special” and expects that people will “fall on their swords” for her.
2) Is interpersonally exploitative (i.e. takes advantage of others to achieve his or her own ends)
Claire manipulates the first lady, Tricia Walker, into believing Christina (a White House aide) is interested in the president. She pretends to be a friend, wangles her way into becoming the first lady’s confidant, and persuades her to enter couples therapy with the president. All of this is actually part of an elaborate plan to help Frank take the President down so that he can become president and she (Claire) can usurp Tricia as first lady.
Another example: Claire is pressured by the media into revealing that she once had an abortion, but she lies and states that the unborn child was a result of rape (presumably to save political face). Again, she shows no remorse about her lie and instead profits from it, gaining much sympathy and public support.
3) Lacks empathy: is unwilling to recognize or identify with the feelings and needs of others
This was best seen in the way Claire deals with her former employee Gillian Cole’s threat of a lawsuit – she pulls a few strings and threatens the life of Gillian’s unborn baby. In fact, in addition to the obvious lack of empathy was the simmering rage she had toward Gillian for daring to cross her. Again, entitlement, narcissistic rage, and a lack of empathy would explain that evil threat she made, to Gillian’s face, about the baby.
4) Is often envious of others or believes that others are envious of him or her
I think part of the reason Claire was so angry at Gillian was because, deep down, she was envious of her pregnancy. We know that, in parallel, Claire is consulting a doctor about becoming pregnant and is told that her chances are slim. This is such a narcissistic injury to Claire that she directs her rage at Gillian. I don’t think she was even consciously aware of how envious she is of Gillian for being pregnant.
Another example would be the look on her face when Galloway indicates he is madly in love with his fiancé and wishes to make a life with her. For a second her face darkens – a flash of jealous rage – which then translates to indifference and almost pleasure at his eventual public humiliation.
5) Shows arrogant, haughty behaviors or attitudes
Correct me if I am wrong, but Claire just does not appear to be that warm or genuine and has an almost untouchable air about her. Furthermore, we only ever see her with people who work for her (i.e. have less power than her) or with people more powerful than her (i.e. whose power she wants for herself). Other than Frank, where are her equals? Her oldest friends and colleagues? Her family? People who might not be influenced by her title or power?
One last comment – in Season 2 Claire certainly comes across as more ruthless and power hungry than the Claire in Season 1—whether she is now showing her true colors and is dropping her facade or just becoming more lost in Frank’s world and hence looking more like him is unclear to me…
Linking neuroscience research, psychological disorders, health and well-being. The Mind the Brain team follows. Click for full bios.
Shaili Jain MD is a Psychiatrist at the Veterans Affairs Palo Alto Health Care System and Stanford Univ School of Medicine Researcher & Clinical Professor,@shailijainmd.
Adrian Preda MD is a Psychiatrist and Health Sciences Professor of Psychiatry and Human Behavior at the University of California Irvine School of Medicine, @adrianpreda.
James Coyne, PhD, is a Clinical Health Psychologist, and a Professor of Health Psychology at University Medical Center, Groningen, the Netherlands where he teaches scientific writing and critical thinking. He is skeptical about hype and hokum in media representations of psychology, other science and medicine. @Coyneoftherealm
The President-elect of the British Psychological Association drops the N word and invokes the Holocaust in denouncing mental health professionals who embrace the biomedical model. The conversation concerning Understanding Psychosis and Schizophrenia (hereafter UPS) took another wrong turn with extended … Continue reading »
Posttraumatic Stress Disorder (PTSD) has been described as a disorder of memory. It has become quite apparent that there are two types of memory in PTSD: the first being the involuntary intrusions of the trauma, and the second being the voluntarily … Continue reading »
Is positive psychology coaching better than what its competitors offer? Is positive psychology coaching the science-oriented brand or does it just look sciency? How do we judge? In Mind the Brain, we have been showing that critical appraisal … Continue reading »
Although posted only a couple of days ago, my presentation from Royal Edinburgh Infirmary, Division of Psychiatry, University of Edinburgh is now passing the benchmark of over 1000 views. You can also find an interesting Storify of my lecture by … Continue reading »
A plenary session dripping with crank science may be an outlier, but it’s on a continuum with the claims of mainstream positive psychology. Follow the conference attendees following the money, does it take you to science? Imagine… Imagine a PhD … Continue reading »