Once again I interrupt my planned sequence of blog posts to cover a controversial study. I had intended to continue a discussion of claims about long-term psychodynamic psychotherapy being superior to shorter forms of therapy. Among other things, I would have told how some psychoanalysts have complained that I am part of a conspiracy of cognitive behavior therapists intent on discrediting legitimate claims about the effectiveness of psychodynamic psychotherapy. I would enter my plea that I am not and I have never been a cognitive behavior therapist [evidence here, here, and here], despite being proud of having written an article with Aaron T Beck and my cordial relationship with this rare scholar who remained so magnanimous in the face of my criticism of his work.
All that will have to wait, but please read on. You will see me saying things that, within the context of the uproar concerning a recent study of CBT in Lancet, might be misconstrued as evidence that I am part of another plot, this time to discredit CBT.
Oh well, you will just have to judge the consistency of the standards that I apply to what is claimed to be “evidence,” whether the claims come from psychoanalysts, cognitive behavior therapists, or sources entirely different.
The story was prompted by a meta-analysis of 50 trials that found unimpressive outcomes of CBT with patients with schizophrenia.
CBT did have a small benefit in treating delusions and hallucinations – which is what the therapy was originally developed to target.
But the researchers said even this small effect disappeared when only studies using ‘blind testing’ were taken into account – this is where researchers do not know which group of patients are receiving the therapy.
Yet in February, the headline was
It was prompted by report of a single randomized trial in Lancet with a loss to follow up so only 17 participants per group were left from an initial total randomization of 74. That is less than the number of authors on the Lancet paper.
From the BBC story:
Prof Tony Morrison, director of the psychosis research unit at Greater Manchester West Mental Health Foundation Trust, said: “We found cognitive behavioural therapy did reduce symptoms and it also improved personal and social function and we demonstrated very comprehensively it is a safe and effective therapy.”
It worked in 46% of patients, approximately the same as for antipsychotics – although a head-to-head study directly comparing the two therapies has not been made.
Obviously, somebody at BBC does not listen to the Bayesian R&B group, Honey Cone, who could have told them
One monkey don’t stop no show [Youtube video here].
You don’t revise expectations derived from 50 trials on the basis of a single trial crippled by high loss to follow up. If we agree that a particular finding is apriori unlikely, based on past research, then the outcome of a small sample study is never convincing.
But then soon afterwards the headline disappeared, replaced by
What should consumers think, especially those that are facing difficult choices about whether they or their family members should accept antipsychotic medication with modest efficacy and obvious side effects?
The change in the BBC headline reflected post-publication peer review, an intense debate in the social media concerning the results and significance of this trial. The exchanges on Twitter, Facebook, and blogs were polarized, often fueled by commentators, even recognized experts, who demonstrated they were unfamiliar with the article in Lancet beyond its press coverage and abstract.
I uploaded a preliminary assessment of the trial at my secondary blog, Quick Thoughts. You can read here about the US$500 wager I offered, modeled after the $50L bet that an author of the Lancet study, Paul Hutton, had publicly made and lost. I subsequently withdrew the bet because of no takers. And you can read here about how a troll hacked my blog post and spread hostile comments about me across other blog sites before being blocked. Just goes to show the passion aroused by this study.
Meanwhile, the crowdsourced review of the Lancet paper on social media proceeded inefficiently, with often inaccurate claims being made about the study by both supporters and critics. But the debate nonetheless gradually uncovered and amplified serious concerns about the article.
BBC reacted with the changed headline and Lancet reacted by inviting blogger Keith Laws to submit a letter to the editor. A number of us banded together to write some letters after some intensely probing back channel changes.
I continued to reread the Lancet article and its press coverage. I listened to an audio tape interview with Tony Morrison, the lead author of the Lancet paper. I viewed a revealing YouTube video of his 2012 keynote presentation at the British Psychological Society Division of Clinical Psychology. I also benefited from extended discussions with Keith Laws, Peter McKenna, Sameer Jauhar, and especially Henry Strick van Linschoten. If you have time, scroll down and read the astute comments that Henry made at Mental Elf. You will see just how much he is the inspiration for some of the ideas expressed in this blog post.
But in fairness to Henry and everyone else whom I consulted in writing this post, I have sole responsibility for any inaccuracies in what follows:
|My Take on the Trial and its Reporting|
The abstract on which so many relied for the understanding of the study is misleading.
At the end of the intervention, there were no differences in terms of primary outcome between CBT and the control group.
But the trial really did not producing usable data concerning the efficacy of cognitive behavior therapy for patients with unmedicated schizophrenia because of
- An unusually mixed group of patients participating in the study.
- An inappropriately constructed control group that does not represent conditions in routine care nor allow meaningful comparisons with an active intervention.
- Substantial loss to follow-up from an already small exploratory study.
- A decision of the investigator team to abort long-term follow-up but proceed with data analysis as if this decision had not been made.
- A substantial number of patients in both the intervention and control group receiving antipsychotic medication, including those in the control group who showed the greatest improvement.
An elaborated “preregistration” of the trial, which should have dictated crucial features of the design and explicitly planned analyses did not do so, and actually occurred after data collection had begun.
At various times and places, the investigators have made comments about other trials, which if applied to their Lancet study, would require that they concur with the decision that the results are not usable.
The study’s investigator team owes a few things to the professional community and the lay public.
But don’t accept my word without considering whether I actually can substantiate my points. Please read on.
The abstract states:
Findings. 74 individuals were randomly assigned to receive either cognitive therapy plus treatment as usual (n=37), or treatment as usual alone (n=37). Mean PANSS total scores were consistently lower in the cognitive therapy group than in the treatment as usual group, with an estimated between-group effect size of −6·52 (95% CI −10·79 to −2·25; p=0·003). We recorded eight serious adverse events: two in patients in the cognitive therapy group (one attempted overdose and one patient presenting risk to others, both after therapy), and six in those in the treatment as usual group (two deaths, both of which were deemed unrelated to trial participation or mental health; three compulsory admissions to hospital for treatment under the mental health act; and one attempted overdose).
You can find a description of the primary outcome, the PANSS (Positive and Negative Syndrome Scale) here.
Here is the full report of the actual study and here is the extended registration of the trial, and a link to the brief formal registration. Please click on it to enlarge.
Pay particular attention to
- The shrinking number patients actually followed up across the study shown at the bottom.
- The fluctuation in the mean outcomes across the course of the study, particularly the marked deterioration that occurred in the control group between 12 months and 18.
- And the large standard deviation in outcomes for the control group at the end of the study.
A misleading abstract
What is said in abstracts is crucially important because many people form opinions about the content of an article solely on the basis of the abstract.
When abstracts appear in electronic bibliographies like PubMed, and a small minority of those who view the abstract actually proceed to view the full article.
Most persons alerted to an article by media coverage cannot access anything more than the abstract of the article because of a pay wall.
There are often discrepancies between findings as they are reported in abstracts and what is actually contained in the results sections. CONSORT has developed standards for what is reported in abstracts, but the standards are unfortunately usually ignored.
It has been shown that the exaggerations in media coverage can often be traced to hype and otherwise misleading claims in abstracts.
The abstract of the Lancet article is misleading in a number of crucial ways.
While the abstract states that 74 participants were randomized, it does not indicate the small minority (17 in each group) who were available for the last assessment. That is the more important figure for interpreting findings.
The abstract describes results in terms of a “between group effect size.” Most readers would assume that such a term refers to standardized mean differences between-group effect sizes. If that were so, this 6.52 would be an extraordinarily large effect size. In the psychotherapy literature, it would be comparable only to the effect (6.9) claimed for long-term psychodynamic therapy in a meta-analysis, which have been sharply criticized as exaggerated and miscalculated.
Readers would normally expect “mean group differences” to be described as exactly that, not as “effect sizes.”
Readers would also expect that in an abstract for a randomized trial, outcomes would be reported for the end of the intervention. Earlier outcomes might be unfair because the intervention had not yet been delivered in its full intensity and later would be unfair because of a possible decay in effect once participants are no longer being exposed to the intervention. That is, unless authors knew their results before deciding what to report.
As Keith Laws noted in his blog, the differences between the intervention and control group immediately after the end of the intervention were not significant.
CBT group showed a reduction from 70.24 to 57.95 =12.29
TAU group showed a reduction from 73.27 to 63.26 =10.01
So, although you would never know from the abstract, at the end of the nine month intervention there were no significant differences between CBT and treatment as usual. In interpreting this nonsignificant difference, it might be useful to know that in other contexts, the lead author of the Lancet study have declared differences between treatments of less than 15 points on the PANSS are clinically insignificant.
It is exceedingly odd for authors to calculate construct overall effect size for primary outcomes by averaging from three months into the intervention until nine months later. I cannot find reports of other studies where this has been done.
It is also misleading to make such calculations when there is so much loss to follow-up for the later assessments that outcomes have to be estimated for most patients. To take a point made by the lead author of the study in another context, that would mean that data for most patients are being made up. I will come back later to that.
Basically we are dealing with an abstract spun to portray the trial as being a strongly positive trial, when there were no significant differences between groups at the end of the intervention. The misleading comments to the BBC that the trial showed CBT to be “very comprehensively” effective is rooted in a confusing portrayal of results of the study in the abstract.
The unusual mixture of participants in the study.
Like a lot of things about the study, we’re not given sufficient details about the participants who enrolled in the study. But we are told enough to see that it is an unusually mixed sample. Important things that might be said about some participants would not apply to others.
Eligible participants aged 16–65 years were in contact with mental health services, and either met International Classification of Diseases–tenth revision (ICD-10) criteria for schizophrenia, schizoaffective disorder, or delusional disorder, or met entry criteria for an early intervention for psychosis service (operationally defined with the Positive and Negative Syndrome Scale [PANSS]) to allow for diagnostic uncertainty in early phases of psychosis.
To understand what this means, you need to know something about schizophrenia and also about the early intervention for psychosis services in the UK.
“Schizophrenia is a chronic, severe, and disabling mental disorder characterized by deficits in thought processes, perceptions, and emotional responsiveness” with a peak age of onset from 20 to 24 years of age. Recipients of early intervention for psychosocial services in the UK often do not yet have a diagnosis and are not yet receiving medication. Such persons would be overrepresented in the younger age range of the sample, which goes down to 16 years of age. You have to consult the study protocol to learn that 59% of the sample came from these specialized services.
On the other hand, participants at the other end of the age spectrum for the study, which goes up to 65 years of age, would likely have had a diagnosis of schizophrenia for decades and to have received medication for much of the time since diagnosis. If they are eligible for a study requiring that they not have been taking antipsychotic medications for the past six months, they are going to be unusual and quite different from the younger participants in the study who might never have been offered antipsychotic medication.
We are not told enough about how participants of different ages fared in this study or even whether there was differential drop out.
We do not know how many of these participants did not have a diagnosis of schizophrenia or schizophrenia spectrum disorder or how many had never had medication or among those who had been on medication, how many stopped because of horrific experiences.
We do know from the mean baseline PNASS scores, this is an sample with only mild to moderate psychotic symptoms, as been noted in the accompanying editorial in Lancet.
With a 59:41 split between very different recruitment settings, this is particularly important information. Actually, measures of central tendency like means or median can become misleading in the context of such a mixed group of participants sharply divided on some related characteristics. Just think about the typical participant in a physical anthropology study being characterized as female with a mean penis length of 4.4 cm. Of course, that would be ridiculous and misleading, but that is what happens when you try to characterize the typical person in such a mixed sample with single figures.
The bottom line is that the mean or modal participant in this study does not represent the typical person with a diagnosis of schizophrenia or schizophrenia spectrum disorder in the community. It is not just a matter of the sample being unrepresentative, but of it being heterogeneous in that begs for breaking down both outcomes and loss to follow-up by age, previous experience with antipsychotic medication, diagnosis, etc. That must be done for any meaningful generalizations to persons in the community or to comparisons to results obtained for participants in other studies, and most notably for comparisons to persons with schizophrenia receiving neuroleptics that the authors of the study seem to so eager to talk about.
All of the missing information about which I am complaining is readily available to the authors and could be made available to readers, especially when controversial claims were going to be made about the outcome of the study.
Compared to what? The control group does not allow meaningful comparisons
The study compared participants randomized to CBT combined with treatment as usual (CT + TAU) versus TAU alone.
The study was conducted at two sites, one in Manchester and one in Newcastle. Participants at both sites randomized to the comparison/control group of TAU were in one of two very different settings, which provided very different experiences.
Treatment as usual was variable across both sites, although both were chosen partly because these regions had some comprehensive early intervention services. In practice, participants within these services received regular care-coordination and psychosocial interventions, including the offer of family interventions, whereas individuals from other community-based services often received little more than irregular contact with care coordinators, and many of these participants were discharged by their clinical teams during the trial for non-attendance or continued reluctance to accept medicine.
Although the assignment to CBT versus TAU was random, the assignment to one or the other type of TAU was not. Participants being treated in the comprehensive early intervention services were quite different than those treated in other community-based services in terms of age, likelihood of a diagnosis of schizophrenia, previous exposure to antipsychotic medication, and other important features. If you knew something about these characteristics of particular patients, you probably could predict quite well which type of control condition they were receiving, starting with younger patients being more likely to get the enhanced TAU and older patients being more likely to be getting care and conventional settings where they might be discharged get no care if they refused medication.
We are told very little in the Lancet article about the experiences of control/comparison participants in comprehensive early intervention services versus other kinds of community-based services.
We do not know about difference in outcomes or even about the nature of services received.
We do not know how many of the patients assigned to conventional community-based services were either immediately or later discharged for refusal of medication.
We do not even know if there was differential drop out between the two types of treatment settings or how the minority of participants still available for follow-up at the end of the study may have differed according to what kind of TAU they were receiving.
Tony Morrison told an interviewer from Lancet that the early intervention services offered a variety of treatments including supportive counseling, family therapy, and even CBT. Although there was blinding of outcome assessments for this trial, it is highly unlikely that there was blinding within treatment settings as to whether participants were assigned to CBT + TAU or TAU alone. We do not know how being assigned to CBT affected receipt of other services.
Again, these complexities add to a very confusing picture and makes group differences exceedingly difficult to interpret.
The between-group effect sizes for CBT were calculated for differences found between participants assigned to CBT + TAU versus those assigned to TAU alone. The goal is to be able to make statements about the advantage of having CBT to TAU that can be generalized beyond the study. Yet, the outcomes recorded for TAU are some composite of outcomes for what could be considered an enhanced TAU combined with what could be considered inadequate TAU or even no TAU, because of the likelihood that participants refusing medication would be simply discharged.
Recall that discussion we had about the fictional composite participant in the study and how generalizations about this composite might not adequately characterize individual participants. Well, things just got even more complicated. We are now talking about a composite control comparison TAU that is associated with participant characteristics in nonrandom ways.
Overall outcomes for TAU might not adequately characterize outcomes for participants with specific important characteristics.
Overall outcomes for TAU in this particular study would not adequately generalize to TAU in the general community.
Recall that Manchester and Newcastle were chosen as sites for the study because of the availability of special, enhanced comprehensive early intervention services.
Investigators and consumers of their reports of clinical trials need to consider what is accomplished by selection of a TAU as the comparison/control condition.
In this particular study, TAU sometimes involved exposure to a rich set of services. If we compared patients in this context alone, we might be able to make some statements about whether adding CBT has an advantage. However, other participants were assigned to a TAU that actually represented no treatment or quite inadequate treatment. In this context, we might only be learning about the nonspecific factors associated with CBT or any credible treatment such as a continuing relationship, positive expectations, and accountability made a difference in outcome. Apparent effects of CBT may simply be due to the correction of deficiencies in the TAU that could have been accomplished by a number of less intensive and presumably less expensive means.
At some point, and the interpretation of the differences found in the study between intervention and control comparison groups involves a great deal of speculation and assumptions, many of which cannot be tested, and certainly not within the limits of the information provided in the Lancet article.
Why did the investigators not anticipate this unresolvable interpretive mess and avoid it by providing a more suitable control comparison condition?
We will start with this important question in my next blog post. The authors attempted to stack the deck in terms of finding a superiority of CBT by choosing a particular comparison control condition, but their efforts ultimately prove self-defeating.
But for now, I think I am progressing in building a case that
- The abstract of the Lancet article is missing vital basic information and otherwise misleading.
- Claims are premature and exaggerated about this study having produced decisive information about the value of providing CBT to unmedicated persons with schizophrenia.
- Readers of the Lancet article are denied information that is crucial in understanding what went on in this trial and its implications for messages to the community and for future research.
I welcome your comments in the interim