We all hope that an alternative can be found to treating persons with schizophrenia with only modestly effective antipsychotic medication that has serious side effects. Persons with schizophrenia and their families deserve an alternative.
That is why reports of this study in the media were greeted with such uncritical enthusiasm. Maybe even the motive for the results of the study getting hyped and distorted, from exaggerated claims attributed to the authors to further amplification in the media.
But at the present time, CBT has not yet been shown to provide an effective alternative and it has not yet been shown to have effect equivalent to antipsychotic medication. The results of this trial do not at all change this unfortunate situation. And to promote CBT as if it had been shown to be an effective alternative would be premature and inaccurate, if not a cruel hoax.
This is the second of two posts about a quite significant, but not earth shattering Lancet study of CBT for persons with unmedicated schizophrenia.
In this continuation of my previous post at Mind the Brain, I will discuss
- The missed opportunity the investigators had to make a more meaningful comparison between CBT and supportive counseling or befriending.
- The thorny problem of separating out effects of CBT from the medication that most patients were receiving in both conditions remaining in follow-up at the end.
- The investigators’ bad decision to include the final follow-up assessment in calculating effect sizes for CBT, a point at which results would have to be made up for most participants.
- Lancet’s failure to enforce preregistration and how the investigative group may have exploited this in putting a spin on this trial.
- The inappropriate voodoo statistics used to put a spin on this trial.
- How applying the investigators own standards would lead us to conclude that this trial did not produce usable results.
- What we can learn from this important but modest, exploratory study, what we need to be told more about it by the investigators in order to learn all we can.
At my secondary blog, Quick Thoughts, I distributed blame for the distorted media
coverage of this study. There is a lot to go around. I faulted the authors for exaggerations in their abstract and inaccurate statements to the media. I also criticized media that parroted other inaccurate media coverage without journalists bothering to actually read the article. Worse, a competition seems to develop to see who could generate the most outrageous headlines. In the end, both Science and The Guardian owe consumers an apology for their distorted headlines. Click here to access this blog post and the interesting comments that it generated.
Why didn’t the investigators avoid an unresolvable interpretive mess by providing a more suitable control/comparison condition?
In my previous PLOS Mind the Brain post, I detailed how a messy composite of very different treatment settings was used for treatment as usual (TAU). This ruled out meaningful comparisons with the intervention group.
This investigator group should have provided a structured, supportive counseling as a conventional comparison/control condition. Or maybe befriending.
The sober-minded accompanying editorial in Lancet said as much.
Such a comparison group would have ensured uniform compensation for any inadequacies in the support available in the background TAU. Such a control group would also address the question whether any effects of CBT are due to nonspecific factors shared with supportive counseling. It would have allowed an opportunity to test whether the added training and expense of CBT is warranted by advantages over the presumably simpler and less expensive supportive counseling.
Why wasn’t this condition included? Tony Morrison’s interview with Lancet is quite revealing of the investigative group’s thinking that led to rejecting a supportive counseling comparison/control condition. You can find the interview here, and if you go to it, listen to what Professor Morrison is saying about halfway into the 12 minute interview.
What follows is a rough, but inexact transcription of what was said. You can compare it to the actual audiotape and decide whether I am introducing any inaccuracies.
The interviewer stated that he understood that patients often cannot tell the difference between supportive counseling and CBT, and conceded he struggled with this as well.
Tony Morrison agreed that there were good reasons why people sometimes struggle to distinguish the two because there is a reasonable amount of overlap.
The core components of supportive counseling approach are also required elements of CBT: developing a warm and trusting relationship with someone, being empathic, and nonjudgmental. CBT is also a talking therapy in a collaborative relationship.
The difference is that there are more specific elements to CBT that are represented within a supportive counseling approach.
One of those elements is that CBT is based on a cognitive behavioral model that assumes it is not the psychotic experiences that people have that are problematic, but people’s way of responding to these experiences. If people have a distressing explanation for hearing voices, that is bound to be associated with considerable distress. CBT helps patients develop more accurate and less distressing appraisal.
The interviewer noted there was no placebo given in this trial and the effects of placebo can be quite large. What were the implications of not using placebo?
Tony Morrison agreed that the interviewer was quite right that the effects of placebo are quite large in trials of treatment of people with schizophrenia.
He stated that it was difficult to do a placebo-controlled randomized trial with respect to psychological therapies because the standard approach to comparison group other than treatment usual would be something like supportive counseling or befriending, both of which might be viewed as having active ingredients like a good supportive relationship.
Not difficult to do a placebo-controlled randomized trial in this sense, Tony, but perhaps difficult to show that CBT has any advantage.
So, it sounds like that a nonspecific supportive counseling approach was rejected as a comparison/control group because it would reduce the possibility for finding significant effects for CBT. It is a pity that a quite mixed set of TAU conditions was chosen instead.
This decision introduced an uninterpretable mess of complexity, including fundamental differences in the treatment as usual depending on participant personal characteristics, with the likelihood that participants would simply be thrown out of some of the TAU.
Ignoring is not enough: what was to be done with the patients receiving medication?
When the follow-up was over most remaining participants in both groups— 10 out of 17– had received medication. That leaves us uncertain whether any effects of participants receiving CBT can be distinguished from effects of taking medication.
This confounding of psychotherapy and medication cannot readily be corrected in such a small sample.
We know from other studies that nonadherence is a serious problem with antipsychotic medication. The investigative group emphasized this as the rationale for trial. But we do not know which participants in each group received antipsychotic medication.
For instance, in the control condition, were patients receiving medication found mainly in the richly resourced early intervention sites? Or were they from the poorer traditional community sites where support for adherence would be lower because contact is lower? And, importantly, how did it come about that patients supposedly receiving only CBT began taking this medication? Was it under different circumstances and with different support for adherence than when medication was given in the TAU?
With a considerably larger sample, multivariate modeling method would allow a sensitivity analysis, with participants in both groups subdivided between those receiving and non-receiving antipsychotics. That would not undo the patients who ended up taking medication having not been randomized to doing so. But, it would nonetheless have been informative. Yet, with only 17 patients remaining each group in the last follow-up, such methods would be indecisive and even inappropriate.
It is a plausible hypothesis that patients enrolled in a trial offering CBT without antipsychotic medication would, if they later accepted the medication, be more adherent. But this hypothesis cannot be tested in the study, nor explored within limited information that was provided to readers.
A decision not to follow some of the patients after the end of the intervention.
The investigators indicated that limited resources prevented following many of the patients beyond the intervention. If so, the investigators should have simply ended the main analyses at the point beyond which follow-up became highly selective. I challenge anyone to find a precedent in the literature where investigators stopped follow-up, but then averaged outcomes across all assessment periods, including one where most participants were not even being followed!
The most reliable and revealing approach to this problem would be to calculating effect sizes for the last point at which an effort was made to follow all participants, the end of the intervention. That would be consistent with the primary analysis for almost all other trials in the psychotherapy and pharmacological literatures. It would also avoid problems in estimating effect sizes for the last follow-up when the data for most patients would have to be made up.
But if that were done, it would have been obvious that this was a null trial with no significant effects for CBT.
The failure of Lancet to enforce preregistration for this trial.
Preregistration of the design of trials including pre-specification of outcomes came about because of the vast evidence that many investigators do not report key aspects of their original design and do not report key outcomes if they are not favorable to the intervention. Ben Goldacre has taught us not to trust pharmaceutical drug trials that are not preregistered, and neither should we trust psychotherapy trials that are not.
Preregistration, if it is uniformly enforced, provides a safeguard of the integrity of results of trials, reducing the possibility of investigators redefining primary outcomes after results are known.
Preregistration is a now requirement for publication in many journals, including Lancet.
Guidelines for Lancet state:
We require the registration of all interventional trials, whether early or late phase, in a primary register that participates in WHO’s International Clinical Trial Registry Platform (see Lancet 2007; 369: 1909-11). We also encourage full public disclosure of the minimum 20-item trial registration dataset at the time of registration and before recruitment of the first participant (see Lancet 2006; 367: 1631-35).
This trial was not preregistered. The required preregistration of this trial(http://www.controlled-trials.com/ISRCTN29607432/) occurred after recruitment had already started. The “preregistration” appeared at the official website on October 21, 2010, yet recruitment started at February 15, 2010. Presumably a lot could be learned in that time period, and adjustments made in the protocol. We are just not told.
Even then, the preregistration failed to commit the investigators to which primary outcome evaluated at which time point will serve as the main evaluation for the trial. That too defeats the purpose of preregistration.
The overall PANSS score is designated in the registration as the primary outcome, but no particular time point is selected, allowing the investigators some wiggle room. How much? The PANSS is assessed six times, and then there is the overall mean, which the authors preferred, bringing a total of seven assessments to choose from.
The preregistration also indicates that effect size of .8 is expected. That is quite unrealistic, unprecedented in the existing literature, including a meta-analyses. Claiming such a large effect size justifies having smaller sample. That means that the trial will be highly underpowered from the get-go in terms of being able to generate reliable effect sizes.
Mumbo-jumbo, voodoo statistics used to create the appearance of significant effects.
In order to average outcomes across all assessment points, multivariate statistics were used to invent data for the majority of patients who were no longer being followed at the last assessment. Recall that that was when most patients already had been lost to follow-up. It is also when the largest differences appeared between the intervention and control group. Those differences seem to of been due to inexplicable deterioration in the minority of control patients still around to be assessed. The between-group differences were thus due to the control group looking bad, not any impressive results for the intervention group.
Multivariate statistics cannot work magic with a small sample with so few participants remaining the last follow-up.
The Lancet article reported a seemingly sophisticated plan for data analysis:
Covariates included site, sex, age, and the baseline value of the relevant outcome measure. Use of these models allowed for analysis of all available data, in the assumption that data were missing at random, conditional on adjustment for centre, age, sex, and baseline scores. The missing-at-random assumption seems to be the most realistic, in view of the planned variation in maximum follow-up times and the many other factors likely to affect drop-out; additionally, the assumption is routinely used in analyses of data from longitudinal trials.
Surely, you jest.
Anyone smart enough to write this kind of text is smart enough to know that it is an absurd plan for analyzing data in which many patients will not be followed after the end of intervention and with such a small sample size to begin with. It preserves the illusion of a required intent to treat analysis in the face of most participants have not been lost to follow-up.
Sure, multilevel analysis allows compensation for the loss of some participants from follow-up, but requires a much larger sample. Furthermore, the necessary assumption that the participants who were not available are missing at random is neither plausible nor testable with in a small sample. Again, one can test the assumption that missing is at random, but that require a starting sample size at least four or five times as large. Surely the statistician for this project knew that.
And then there is the issue of including control for the four covariates in analyses for which there are only 17 participants in the two groups at the end data point being analyzed.
As I noted in my last blog post, from the start, there were such huge differences among participants that summary statistics based on all of them were not applied to individual participants or subgroups.
- Participants in the control group came from a very different settings with which their personal characteristics were associated. Particular participants came from certain settings and got treated differently.
- Most participants were no longer around the final assessment, but we do not know how that is related to personal characteristics.
- Most participants who were around in both the intervention and control group had accepted medication, and this is not random.
- There was a strange deterioration going on in the control group.
Yup, the investigators are asking us to believe that being lost to follow-up was random, and so participants that were still around could be randomly replaced with participants who had been lost, without affecting the results.
With only 17 participants per group, we cannot even assess whether the intended effects of randomization had occurred, in terms of equalizing baseline differences between intervention and control groups. We do not even have the statistical power to detect whether baseline differences between the two groups might determine differences in the smaller numbers still available.
We know from lots of other clinical trials that when you start with so few patients has this trial did, uncontrolled baseline differences can still prove more potent than any intervention. That is why many clinical trialists refuse to accept any studies with less than 35 to 50 participants per cell.
Overall, this is sheer voodoo, statistical malpractice that should have been caught by the reviewers at Lancet. But it does make for putting an impressive spin on an otherwise null trial.
Judging the trial by the investigators’ own standards
A meta-analysis by the last author Paul Hutton and colleagues argued that trials of antipsychotic medication with more than 20% attrition do not produce usable data. Hutton cite some authorities
Medical epidemiologists and CONSORT statement authors Kenneth Schulz and David Grimes, writing in The Lancet in 2002, stated: _a trial would be unlikely to successfully withstand challenges to its validity with losses of more than 20% [Sackett et al., 2000]_
But then, in what would be a damning critique of the present study, Hutton et al declares
Although more sophisticated ways of dealing with missing continuous data exist, all require data normally unavailable to review authors (e.g. individual data or summary data for completers only). No approach is likely to produce credible results when more than half the summary outcome data are missing.
Then there is the YouTube presentation from 2012 from first author Tony Morrison himself. He similarly argued that if a trial retains only half of the participants who initially enrolled, most of the resulting data are made up and results are not credible.
The 2012 presentation by Morrison also dismisses any mean differences between active medication and a placebo of less than 15 points as clinically insignificant.
Okay, if we accept these criteria, what we say about a difference in CBT trial this claim to be only 6.5 after loss of most participants enrolled in the study, putting aside for a moment the objection that that even this is an exaggerated estimates?
We should have known ahead of time what can and cannot learned from an small, underpowered exploratory study.
In a pair of now classic methodological papers [1,2], esteemed clinical trialist Helena Kraemer and her colleagues defended the value, indeed the necessity of small preliminary exploratory, feasibility studies before conducting larger clinical trials.
A pilot study can be used to evaluate the feasibility of recruitment, randomization, retention, assessment procedures, new methods, and implementation of the novel intervention. A pilot study is not a hypothesis testing study. Safety, efficacy and effectiveness are not evaluated in a pilot. Contrary to tradition, a pilot study does not provide a meaningful effect size estimate for planning subsequent studies due to the imprecision inherent in data from small samples. Feasibility results do not necessarily generalize beyond the inclusion and exclusion criteria of the pilot design.
However, they warned against accepting effect sizes from such trials because they are underpowered. It would be unfair to judge the efficacy of intervention based on negative findings with a grossly underpowered trial. Yet, it would be just as unacceptable to judge an intervention favorably on the result of unexpected positive findings with the sample size was small. Such positive findings typically do not replicate and can easily be the result of chance, unmeasured baseline differences between intervention and control groups, and flexible rules of analysis and interpretation by investigators. And Kraemer and her colleagues did not even deal with small clinical trials in which most participants were no longer available for follow-up.
Even if this trial cannot tell us much about the efficacy of CBT for persons with unmedicated schizophrenia. We can still learn a lot, particularly if the investigators give us information that they had promised in their preregistration.
The preregistration promised information important for evaluating the trial that is not delivered in the Lancet paper. Importantly, we are given no information from the log that recorded all the treatments received by the control group and the intervention group.
This information is essential to evaluating whether group differences are really do to receiving or not receiving the intervention as intended, or influenced by uncontrolled treatment including medication was selective dropout. This information could shed light on when and how patients accepted antipsychotic medication and whether the conditions were different between groups.
We cannot torture the data from this study to reveal whether or not CBT was efficacious. But we can learn much about what would need to be done differently in a larger trial if results were to be interpretable. Given what we learned this trial, but would have to be done about participants deciding to take antipsychotic medication after randomization? Surely we could not refuse them that option.
More information would expose just how difficult it is to find suitable participants and community settings and enroll them in such a study. Certainly the investigators should have learned not to rely on such diverse settings as they did in the present study.
Hopefully in the next trial, investigators will have the courage to really test whether CBT has clinically significant advantages over supportive therapy or befriending. There would be risk would to investigator egos and bragging rights, but that would be compensated by the prospect of producing results that persons with schizophrenia and their families, as well as clinicians and policymakers could believe.
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 3.0 Unported License.