Spinning makes null results a virtue to be celebrated…and publishable.
An article reporting a RCT of group mindfulness therapy
Sundquist, J., Lilja, Å., Palmér, K., Memon, A. A., Wang, X., Johansson, L. M., & Sundquist, K. (2014). Mindfulness group therapy in primary care patients with depression, anxiety and stress and adjustment disorders: randomised controlled trial. The British Journal of Psychiatry.
was previously reviewed in Mental Elf. You might want to consider their briefer evaluation before beginning mine. I am going to be critical not only of the article, but the review process that got it into British Journal of Psychiatry (BJP).
I am an Academic Editor of PLOS One,* where we have the laudable goal of publishing all papers that are transparently reported and not technically flawed. Beyond that, we leave decisions about scientific quality to post-publication commentary of the many, not a couple of reviewers whom the editor has handpicked. Yet, speaking for myself, and not PLOS One, I would have required substantial revisions or rejected the version of this paper that got into the presumably highly selective, even vanity journal BJP**.
As always, examine the abstract carefully when you suspect spin, but expect that you will not fully appreciate the extent of spin until you have digested the whole paper. This abstract declares
Mindfulness-based group therapy was non-inferior to treatment as usual for patients with depressive, anxiety or stress and adjustment disorders.
“Non-inferior” meaning ‘no worse than routine care?’ How could that null result be important enough to get into a journal presumably having a strong confirmation bias? The logic sounds just like US Senator George Aiken famously proposing getting America out of the war it was losing in Vietnam by declaring America had won and going home.
There are hints of other things going on, like no reporting of how many patients were retained for analysis or whether there were intention-to-treat analyses. And then the weird mention of outcomes being analyzed with “ordinal mixed models.” Have you ever seen that before? And finally, do the results hold for patients with any of those disorders or only a particular sample of unknown mix and maybe only representing those who could be recruited from specific settings? Stay tuned…
What is a non-inferiority trial and when should one conduct one?
An NHS website explains
The objective of non-inferiority trials is to compare a novel treatment to an active treatment with a view of demonstrating that it is not clinically worse with regards to a specified endpoint. It is assumed that the comparator treatment has been established to have a significant clinical effect (against placebo). These trials are frequently used in situations where use of a superiority trial against a placebo control may be considered unethical.
Noninferiority trials (NIs) have a bad reputation. Consistent with a large literature, a recent systematic review of NI HIV trials found the overall methodological quality to be poor, with a high risk of bias. The people who brought you CONSORT saw fit to develop special reporting standards for NIs so that misuse of the design in the service of getting publishable results is more readily detected. You might want to download the CONSORT checklist for NI and apply the checklist to the trial under discussion. Right away, you can see how deficient the the reporting is in the abstract of the paper under discussion.
Basically, an NI RCT commits investigators and readers to accepting null results as support for a new treatment because it is no worse than an existing one. Suspicions are immediately raised as to why investigators might want to make that point.
Conflicts of interest could be a reason. Demonstration that the treatment is as good as existing treatments might warrant marketing of the new treatment or dissemination into existing markets. There could be financial rewards or simply promoters and enthusiasts favoring what they would find interesting. Yup, some bandwagons, some fads and fashions psychotherapy are in large part due to promoters simply seeking the new and different, without evidence that a treatment is better than existing ones.
Suspicions are reduced when the new treatment has other advantages, like greater acceptability or a lack of side effects, or when the existing treatments are so good that an RCT of the new treatment with a placebo-control condition would be unethical.
We should give evaluate whether there is an adequate rationale for authors doing an NI RCT, rather than them relying on the conventional test whether the null hypothesis can be rejected of no differences between the intervention and a control condition. Suitable support would be a strong record of efficacy for a well defined control condition. It would also help if the trial were pre-registered as NI, quieting concerns that it was declared as such after peeking at the data.
- The recruitment procedure is strangely described, but seems to indicate that the therapist providing mindfulness training were present during recruitment and probably weren’t blinded to group assignment and conceivably could influence it. The study thus does not have clear evidence of an appropriate randomization procedure and initial blinding. Furthermore, the GPs administering concurrent treatment also were not blinded and might take group assignment into account in subsequent prescribing and monitoring of medication.
- During the recruitment procedure, GPs assessed whether medication was needed and made prescriptions before randomization occurred. We will need to see – we are not told in the methods section – but I suspect a lot of medication is being given to both intervention and control patients. That is going to complicate interpretation of results.
- In terms of diagnosis, a truly mixed group of patients was recruited. Patients experiencing stress or adjustment reactions were thrown in with patients who had mild or moderate depression or anxiety disorders. Patients were excluded who were considered severe enough to need psychiatric care.
- Patients receiving any psychotherapy at the start of the trial were excluded, but the authors ignored whether patients were receiving medication.
This appears to be a mildly distressed sample that is likely to show some recovery in the absence of any treatment. The authors’ not controlling for the medication was received is going to be a big problem later. Readers won’t be able to tell whether any improvement in the intervention condition is due to its more intensive support and encouragement that results in better adherence to medication.
- The authors go overboard in defending their use of multiple overlapping
measures and overboard in praising the validity of their measures. For instance, The Hospital Anxiety and Depression Scale (HADS) is a fatally flawed instrument, even if still widely used. I considered the instrument dead in terms of reliability and validity, but like Elvis, it is still being cited.
Okay, the authors claim these measures are great, and attach clinical importance to cut points that others no longer consider valid. But then, why do they decide that the scales are ordinal, not interval? Basically, they are saying the scales are so bad that the differences between one number to the next higher or lower for pairs of items can’t be considered equal. This is getting weird. If the scales are as good as the authors claim, why do the authors take the unusual step of considering them as psychometrically inadequate?
I know, I’m getting technical to the point that I risk losing some readers, but the authors are setting readers up to be comfortable with a decision to focus on medians, not mean scores – making it more difficult to detect any differences between the mindfulness therapy and routine care. Spin, spin!
There are lots of problems with the ill described control condition, treatment as usual (TAU). My standing gripe with this choice is that TAU varies greatly across settings, and often is so inadequate that at best the authors are comparing whether mindfulness therapy is better than some unknown mix of no treatment and inadequate treatment.
We know enough about mindfulness therapy at this point to not worry about whether it is better than nothing at all, but should be focusing on whether is better than another active treatment and whether its effectiveness is due to particular factors. The authors state that most of the control patients were receiving CBT, but don’t indicate how they knew that, except for case records. Notoriously, a lot of the therapy done in primary care that is labeled by practitioners as CBT does not pass muster. I would be much more comfortable with some sort of control over what patients were receiving in the control arm, or at least better specification.
I’m again trying to avoid getting very technical here, but point out for those who have a developed interest in statistics, that there were strange things going on.
- Particular statistical analyses (depending on group medians, rather than means are chosen that are less likely to reveal differences between intervention and control group than the parametric statistics that are typically done.
- Complicated decisions justify throwing away data and then using multivariate techniques to estimate what the data were. The multivariate techniques require assumptions that are not tested.
- The power analysis is not conducted to detect differences between groups, but to be able to provide a basis for saying that mindfulness does not differ from routine care. Were the authors really interested in that question rather than whether mindfulness is better than routine care in initially designing a study and its analytic plan? Without pre-registration, we cannot know.
There are extraordinary revelations in table 1, baseline characteristics.
- The intervention and control group initially differed for two of the four outcome variables before they even received the intervention. Thus, intervention and control conditions are not comparable in important baseline characteristics. This is in itself a risk of bias, but also raises further questions about the adequacy of the randomization procedure and blinding.
- We are told nothing about the distribution of diagnoses across the intervention and control group, which is very important in interpreting results and considering what generalizations can be made.
- Most patients in both the intervention and control groups were receiving antidepressants and about a third of them either condition were receiving a “tranquilizer” or have missing data for that variable.
Signals that there is something amiss in this study are growing stronger. Given the mildness of disturbance and high rates of prescription of medication, we are likely dealing with a primary care sample where medications are casually distributed and poorly monitored. Yet, this study is supposedly designed to inform us whether adding mindfulness to this confused picture produces outcomes that are not worse.
Table 5 adds to the suspicions. There were comparable, significant changes in both the intervention and control condition over time. But we can’t know if that was due to the mildness of distress or effectiveness of both treatments.
Twice as many patients assigned to mindfulness dropped out of treatment, compared to those assigned to routine care. Readers are given some information about how many sessions of mindfulness patients attended, but not the extent to which they practiced mindfulness.
We are told
The main finding of the present RCT is that mindfulness group therapy given in a general practice setting, where a majority of patients with depression, anxiety, and stress and adjustment disorders are treated, is non-inferior to individual-based therapy, including CBT. To the best of our knowledge, this is the first RCT performed in a general practice setting where the effect of mindfulness group therapy was compared with an active control group.
Although a growing body of research has examined the effect of mindfulness on somatic as well as psychiatric conditions, scientific knowledge from RCT studies is scarce. For example, a 2007 review…
It’s debatable whether the statement was true in 2007, but a lot has happened since then. Recent reviews suggest that mindfulness therapy is better than nothing and better than inactive control conditions that do not provide comparable levels of positive expectations and support. Studies are accumulating that indicate mindfulness therapy is not consistently better than active control conditions. Differences become less likely when the alternative treatments are equivalent in positive expectations conveyed to patients and providers, support, and intensity in terms of frequency and amount of contact. Resolving this latter question of whether mindfulness is better than reasonable alternatives is now critical in this study provides no relevant data.
An Implications section states
Patients who receive antidepressants have a reported remission rate of only 35–40%.41 Additional treatment is therefore needed for non-responders as well as for those who are either unable or unwilling to engage in traditional psychotherapy.
The authors are being misleading to the point of being irresponsible in making this statement in the context of discussing the implications of their study. The reference is to the American STAR*D treatment study, which dealt with very different, more chronically and unremittingly depressed population.
An appropriately referenced statement about primary care populations like what this study was recruited would point to the lack of diagnosis on which prescription of medicaton was based, unnecessary treatment with medication of patients who would not be expected to benefit from it, and poor monitoring and follow-up of patients who could conceivably benefit from medication if appropriately minutes. The statement would reflect the poor state of routine care for depression in the community, but would undermine claims that the control group received an active treatment with suitable specification that would allow any generalizations about the efficacy of mindfulness.
This RCT has numerous flaws in its conduct and reporting that preclude making any contribution to the current literature about mindfulness therapy. What is extraordinary is that, as a null trial, it got published in BJP. Maybe its publication in its present form represents incompetent reviewing and editing, or maybe a strategic, but inept decision to publish a flawed study with null findings because it concerns the trendy topic of mindfulness and GPs to whom British psychiatrists want to reach out.
An RCT of mindfulness psychotherapy is attention-getting. Maybe the BJP is willing to sacrifice trustworthiness of the interpretation of results for newsworthiness. BJP will attract readership it does not ordinarily get with publication of this paper.
What is most fascinating is that the study was framed as a noninferiority trial and therefore null results are to be celebrated. I challenge anyone to find similar instances of null results for a psychotherapy trial being published in BJP except in the circumstances that make a lack of effect newsworthy because it suggests that investment in the dissemination of a previously promising treatment is not justified. I have a strong suspicion that this particular paper got published because the results were dressed up as a successful demonstration of noninferiority.
I would love to see the reviews this paper received, almost as much as any record of what the authors intended when they planned the study.
Will this be the beginning of a trend? Does BJP want to encourage submission of noninferiority psychotherapy studies? Maybe the simple explanation is that the editor and reviewers do not understand what a noninferiority trial is and what it can conceivably conclude.
Please, some psychotherapy researcher with a null trial sitting in the drawer, test the waters by dressing the study up as a noninferior trial and submitted to BJP.
How bad is this study?
The article provides a non-intention-to-treat analysis of a comparison of mindfulness to an ill specified control condition that would not qualify as an active condition. The comparison does not allow generalization to other treatments in other settings. The intervention and control conditions had significant differences in key characteristics at baseline. The patient population is ill-described in ways that does not allow generalization to other patient populations. The high rates of co-treatment confounding due to antidepressants and tranquilizers precludes determination of any effects of the mindfulness therapy. We don’t know if there were any effects, or if both the mindfulness therapy and control condition benefited from the natural decline in distress of a patient population largely without psychiatric diagnoses. Without a control group like a waiting list, we can’t tell if these patients would have improved any way. I could go on but…
This study was not needed and may be unethical
The accumulation of literature is such that we need less mindfulness therapy research, not more. We need comparisons with well specified active control groups that can answer the question of whether mindfulness therapy offers any advantage over alternative treatments, not only in efficacy, but in the ability to retain patients so they get an adequate exposure to the treatment. We need mindfulness studies with cleverly chosen comparison conditions that allow determination of whether it is the mindfulness component of mindfulness group therapy that has any effectiveness, rather than relaxation that mindfulness therapy shares with other treatments.
To conduct research in patient populations, investigators must have hypotheses and methods with the likelihood of making a meaningful contribution to the literature commensurate with all the extra time and effort they are asking of patients. This particular study fails this ethical test.
Finally, the publication of this null trial as a noninferiority trial pushes the envelope in terms of the need for preregistration of design and analytic plans for trials. If authors of going to claim a successful demonstration of non-inferiority, we need to know that is what they set out to do, rather than just being stuck with null findings they could not otherwise publish.
*DISCLAIMER: This blog post presents solely the opinions of the author, and not necessarily PLOS. Opinions about the publishability of papers reflect only the author’s views and not necessarily an editorial decision for a manuscript submitted to PLOS One.
**I previously criticized the editorial process at BJP, calling for the retraction of a horribly flawed meta-analysis of the mental health effects of abortion written by an American antiabortion activist. I have pointed out how another flawed review of the efficacy of long-term psychodynamic psychotherapy represented duplicate publication . But both of these papers were published under the last editor. I still hope that the current editor can improve the trustworthiness of what is published at BJP. I am not encouraged by this particular paper, however.