On bitopertin and sharing data from clinical trials

While modern antipsychotics are doing an OK job improving the positive symptoms of schizophrenia – such as auditory hallucinations or fixed delusional ideas – there is a lot left to be desired in term of how these medications – or any psychotropics for that matter – work against negative symptoms. colorful-pillsThe negative symptoms of schizophrenia, including deficits in a variety of mental functions such as will power, ability to enjoy things, express emotions, or associate with others, have been an elusive target from a treatment perspective. In other words, available medications for schizophrenia might help a patient deal with “the voices,” but, because they do not also improve that patient’s negative symptoms, his ability to return to a meaningful and productive life is not substantially helped.

A promising new medication for negative symptoms — almost

Not surprisingly, any intervention that promises to improve negative symptoms makes the news. This was exactly what happened with bitopertin, a new Roche medication aimed at negative symptoms, which generated enough enthusiasm during its preliminary testing (Phase I and II clinical trials) to move to Phase III trials.

So, one wonders, why didn’t a brand new bitopertin study (ClinicalTrials.gov identifier NCT00616798), with a solid design and impressive action on negative symptoms get any attention – from the media, psychiatrist gurus, patient’s advocates, anyone? The study, which just came out in the flagship psychiatric journal (JAMA Psychiatry) generated no press/media coverage that I am aware off.

Actually, this rather surprising silent treatment is not an accident.

It turns out that this study has been preempted by a press announcement that Roche made this last January stating that  bitopertin did not hold up to its promise following its Phase III testing.  Meaning that… the just published JAMA Psychiatry study — the very study which generated enough enthusiasm to justify the seemingly well-deserved promotion bitopertin received further down the pipeline — is old, outdated news.

To summarize: on one hand, great positive results from a just published study, which, as it turns out, are actually quite old. On the other hand, negative results in Phase III testing, according to a news conference, but no publications to date.

The result: bitopertin is at the center of what is essentially a skewed and likely misleading state of affairs – at least from the point of view of official scientific validation via publication in peer reviewed journals.

How did we get here?

It seems that this bitopertin situation is the predictable result of a series of unfortunate events including:

  1. delays in reporting NCT00616798 study results : the original study, according to the official clinicaltrials.gov data, was completed in February 2010, more than 3 1/2 years prior to submission for  publication (original submission date August 30, 2013)
  2. an extremely lengthy peer review process: only a bit shy of 1/2 year: the original submission date was August 30, 2013; final revision received January 24, 2014; accepted January 24, 2014.
  3. No shared public data for any of the relevant studies that have been completed.

There is a fairly straightforward solution to prevent such complications: opening up the data from clinical trials to the public.alltrials_basic_logo2 Europe is already  pursuing an initiative to make reporting of all clinical trials data mandatory. This is an important step forward. But real progress will not occur unless and until patient level data is made available to the general public as close in time as possible to the date of data collection completion. Adrian Preda, M.D.

Category: mental health care, Publication bias, research, schizophrenia | Tagged , | 5 Comments

Claire Underwood From Netflix’s House of Cards: Narcissistic Personality Disorder?

Last month I used the character of Frank Underwood as a “case study” to illustrate the misunderstood psychiatric diagnosis of Antisocial Personality Disorder, and many of you asked: Well, what about his wife, Claire?

Good question!  You asked, and so today I will do my best to  answer.


SPOILER ALERT: For those of you who have not been on a streaming binge and watched all of Season 2 yet, consider yourself warned. 


Image: Netflix

Image: Netflix

Clinical lore would certainly support that Claire, herself, must have a personality disorder of some kind – a sort of fatal attraction, where a couple is drawn to each other because there is something in their personality patterns which is complementary and reciprocal.

She does appear to have mastered the art of turning a blind eye to Frank’s more antisocial exploits.  She is a highly intelligent woman, and she must have some inkling that her husband may be involved in the death of Zoe Barnes and Peter Russo.  But if she has an inkling, she does not show it.

Claire, from what we know, does not engage in outright antisocial behavior.  Unlike Frank, she has not murdered anyone and we have not seen her engage in very reckless or impulsive outbursts.

However, she rarely shows emotion—her smiles seem fake, her laugh empty, and her expressions are bland.  She is more restrained and guarded than Frank, and she does not reveal her inner thoughts to the viewer the way Frank does so it is much harder to know what could be going on in her mind.

Still, I think I have seen enough to venture forth with an assertion that she may have a Narcissistic Personality Disorder.


What is Narcissistic Personality Disorder?


A pervasive pattern of grandiosity, need for admiration, and lack of empathy beginning by early adulthood and present in a variety of contexts, as indicated by five (or more) of 9 criteria.


Below are the five criteria that I think apply to Claire:


1) Has a sense of entitlement (i.e. unreasonable expectations of especially favorable treatment or automatic compliance with his or her expectations)


Image: Netflix

Image: Netflix

She expected Galloway to take the blame for the photos that were leaked and eventually claim it was all a “publicity stunt,” thus ruining his own reputation and image.  She expressed no regret that her ex-lover was cornered into having to do this, on her behalf, and no remorse that it almost ruined his life and his relationship with his fiancé. She was entitled to this act because she is “special” and expects that people will “fall on their swords” for her.


2) Is interpersonally exploitative (i.e. takes advantage of others to achieve his or her own ends)


Claire manipulates the first lady, Tricia Walker, into believing Christina (a White House aide) is interested in the president. She pretends to be a friend, wangles her way into becoming the first lady’s confidant, and persuades her to enter couples therapy with the president.  All of this is actually part of an elaborate plan to help Frank take the President down so that he can become president and she (Claire) can usurp Tricia as first lady.

Another example: Claire is pressured by the media into revealing that she once had an abortion, but she lies and states that the unborn child was a result of rape (presumably to save political face).  Again, she shows no remorse about her lie and instead profits from it, gaining much sympathy and public support.


3) Lacks empathy: is unwilling to recognize or identify with the feelings and needs of others


Image: Netflix

Image: Netflix

This was best seen in the way Claire deals with her former employee Gillian Cole’s threat of a lawsuit –  she pulls a few strings and threatens the life of Gillian’s unborn baby.  In fact, in addition to the obvious lack of empathy was the simmering rage she had toward Gillian for daring to cross her.  Again, entitlement, narcissistic rage, and a lack of empathy would explain that evil threat she made, to Gillian’s face, about the baby.


4) Is often envious of others or believes that others are envious of him or her


I think part of the reason Claire was so angry at Gillian was because, deep down, she was envious of her pregnancy.  We know that, in parallel, Claire is consulting a doctor about becoming pregnant and is told that her chances are slim.  This is such a narcissistic injury to Claire that she directs her rage at Gillian.  I don’t think she was even consciously aware of how envious she is of Gillian for being pregnant.

Another example would be the look on her face when Galloway indicates he is madly in love with his fiancé and wishes to make a life with her.  For a second her face darkens – a flash of jealous rage – which then translates to indifference and almost pleasure at his eventual public humiliation.


5) Shows arrogant, haughty behaviors or attitudes 


Image: Netflix

Image: Netflix

Correct me if I am wrong, but Claire just does not appear to be that warm or genuine and has an almost untouchable air about her. Furthermore, we only ever see her with people who work for her (i.e. have less power than her) or with people more powerful than her (i.e. whose power she wants for herself). Other than Frank, where are her equals? Her oldest friends and colleagues? Her family? People who might not be influenced by her title or power?


One last comment – in Season 2 Claire certainly comes across as more ruthless and power hungry than the Claire in Season 1—whether she is now showing her true colors and is dropping her facade or just becoming more lost in Frank’s world and hence looking more like him is unclear to me…


I suppose we will find out in Season 3!

Category: Commentary, Psychiatry, Uncategorized | Tagged , , , , , | 11 Comments

Are meta analyses conducted by professional organizations more trustworthy?

gold standard

Updated April 18, 2014 (See below)

A well done meta-analysis is the new gold standard for evaluating psychotherapies. Meta-analyses can overcome the limitations of any single randomized controlled trial (RCT) by systematically integrating results across studies and identifying and contrasting outliers. Meta-analyses have the potential to resolve inevitable contradictions in findings among trials. But meta analyses are constrained by the quality and quantity of available studies. The validity of meta analyses also depends on the level of adherence to established standards in their conduct and reporting, as well as the willingness of those doing a meta analysis to concede the limits of available evidence and refrain from going beyond it.

Yet, meta-analytic malpractice is widespread. Authors with agendas  strive to make their point more strongly than the evidence warrants. I have shown misuse of meta-analysis to claim that long-term psychoanalytic psychotherapy (LTPP) is more effective than briefer alternatives. And then there are claims of a radical American antiabortionist made via a meta analysis in British Journal of Psychiatry that abortion accounts for much of the psychiatric disturbance among women of childbearing age.

Funnel Plot

Funnel Plot

Meta-analyses often seem to have intimidating statistical complexity and bewildering graphic display of results. What are consumers to do, when they have neither the time nor the ability to interpret findings for themselves? Is there particular reassurance that a meta-analysis was commissioned by professional organization? Does being associated with a professional organization certify a meta-analysis as valid?

That is the question I am going to take up in this blog post. The article I am going to be discussing is available here.

Hart, S. L., Hoyt, M. A., Diefenbach, M., Anderson, D. R., Kilbourn, K. M., Craft, L. L., … & Stanton, A. L. (2012). Meta-analysis of efficacy of interventions for elevated depressive symptoms in adults diagnosed with cancer. Journal of the National Cancer Institute, 104(13), 990-1004.

In the abstract, the authors declare

 Our findings suggest that psychological and pharmacologic approaches can be targeted productively toward cancer patients with elevated depressive symptoms. Research is needed to maximize effectiveness, accessibility, and integration into clinical care of interventions for depressed cancer patients.

Translation: The evidence for the efficacy of psychological interventions for cancer patients with elevated depressive symptoms is impressive enough to justify dissemination of these treatments and integration into routine cancer care. Let’s get on with the rollout.

The authors did a systematic search, identifying

  • 7700 potentially relevant studies, narrowing this down to
  • 350 fulltext articles that they reviewed, from which they so selected
  • 14 trials from 15 published reports for further analysis.
  • 4 studies lacked the data for calculating effect sizes, even after attempts to contact the authors
  • 10 studies were at first included, but
  • 1 then had to be excluded as an extreme outlier in its claimed effect size, leaving
  • 9 studies to be entered in the meta analysis, but with one of them yielding 2 effect sizes.

The final effect sizes entered into the meta-analysis were 6 for what the authors considered psychotherapy from 5 different studies and 4 pharmacologic comparisons. I will concentrate on the 6 psychotherapy effect sizes that came from five different studies. You can find links to their abstracts or the actual study here.

Why were the authors left with so few studies? They had opened their article claiming over 500 unique trials of psychosocial interventions for cancer patients since 2005, of which 63% involved RCTs. But most evaluations of psychosocial intervention do not recruit patients having sufficient psychological distress or depressive symptoms to register an improvement. Where does that leave claims that psychological interventions are evidenced-based as effective? The literature is exceedingly mixed as to whether psychosocial interventions benefit cancer patients, at least those coming to clinical trials. So, the authors are left having to decide with these few studies recruiting patients on the basis of heightened depressive symptoms.

Independently evaluating the evidence.thumb on scale

Three of the 6 effect sizes classified as psychotherapeutic—including the 2 contributing most of the patients to the meta analysis—should have been excluded.

The three studies (1,2,3)  evaluated collaborative care for depression, which involves substantial reorganization of systems of care, not just providing psychotherapy. Patients assigned to the intervention groups of each of these studies received more medication and better monitoring. In the largest study, the low income patients assigned to the intervention group had to pay out-of-pocket care, whereas care was free for patients assigned to the intervention group. Not surprising, patients assigned to the intervention group got more and better care, including medication management. There was also a lot more support and encouragement being offered to the patients in the intervention conditions. Improvement that was specifically due to psychotherapy, and not something else,  in these three studies cannot be separated out.

I have done a number of meta-analyses and systematic reviews of collaborative care for depression. I do not consider such wholesale systemic interventions as psychotherapy, nor am I aware of other articles in which collaborative care has been treated as such.

Eliminating the collaborative care studies leaves effect sizes from only 2 small studies (4, 5).

One (4) contributed  2 effect sizes based on comparisons of 29 patients receiving cognitive behavior therapy (CBT) and 23 receiving supportive therapy to the same 26-patient no-treatment control group. There were problems in the way this study was handled.

  • The authors of the meta-analysis considered the supportive therapy group as an intervention, but supportive therapy is almost always considered a comparison/control group in psychotherapy studies.
  • The supportive therapy had better outcomes than CBT. If the supportive therapy were re-classified as a control comparison group, the CBT would have had a negative effect size, not the positive one that was entered into the meta-analysis.
  • Including two effect sizes violates the standard assumption for performing a meta analysis that all of the effect sizes being entered into are independent.

Basically, the authors of the meta-analysis are counting the wait-list control group twice in what was already a small number of effect sizes. Doing so strengthened the authors’ case that the evidence for psychotherapeutic intervention for depressive symptoms among cancer patients is strong.

The final study (5)  involved 45 patients being randomly assigned to either problem-solving or waitlist control, but results for only 37 patients were available for analyses. The study had a high risk of bias because analyses were not intent to treat.  It was seriously underpowered, with less than a 50% probability of detecting a positive effect even if it were present.

Null findings are likely with such a small study. If the authors reported null findings, the study would not likely be published because being too small to find anything but a no effect is a reasonable criticism. So we are more likely find positive results from such small studies in the published literature, but they probably will not be replicated in larger studies.

Once we eliminate the three interventions misclassified as psychotherapy and deal with use of the waitlist control group of one study counted twice as a comparator, we are left with only two small studies. Many authorities suggest this is insufficient for a meta-analysis, and it would certainly not serve as the basis for the sweeping conclusions which these authors wish to draw.

How the authors interpreted their findings

The authors declare that they find psychotherapeutic interventions to be

reliably superior in reducing depressive symptoms relative to control conditions.

They offer reassurance that they have checked for publication bias. They should have noted that tests for publication bias are low powered and not meaningful with such small numbers of studies.

They suddenly offer the startling conclusion without citation or further explanation.

The fail-safe N (the number of unpublished studies reporting statistically nonsignificant results needed to reduce the observed effect to statistical nonsignificance) of 106 confirms the relative stability of the observed effect size.

What?!  Suppose we accept the authors’ claim that they have five psychotherapeutic intervention effect sizes, not the two that I claim. How can they claim that there would have to be 106 null studies hiding in desk drawers to unseat their conclusion? Note that they already had to exclude five studies from consideration, four because they could not obtain basic data are from them, and one because the effects claimed for problem-solving therapy were too strong to be credible. So, this is a trimmed down group of studies.

In another of my blog posts  I indicated that clinical epidemiologists, as well as the esteemed Cochrane Collaboration reject the validity of failsafe N. I summarize some good arguments against failsafe N.  But just think about it: on the face of it, do you think results are so strong that it would take over 100 negative studies to change our assessment? This is a nonsensical bluff intended to create false confidence in the authors’ conclusion.

The authors perform a number of subgroup analyses that they claim show CBT to be superior to problem-solving therapy. But the subgroup analyses are inappropriate. For CBT, they take the effect sizes from two small studies in which the intervention and a control group differed in the control group not receiving the therapy. For PST, they take the effect sizes from the very different large collaborative care interventions that involved changing whole systems of care. Patients assigned to the intervention group got a lot more than just psychotherapy.

There is no basis for making such comparisons. The collaborative care studies, as I noted,sbm involve not only providing PST to some of the patients, but also medication management and free treatment when patients – who were low income – in the control condition had to pay for it and so received little. There are just too many confounds here. Recall from my previous blog posts that effect sizes do not characterize a treatment but rather a treatment in comparison to a control condition. The effect sizes that the authors cite are invalid for PST and the conditions of the collaborative care studies versus the small CBT studies are just too different.



Is you or ain’t you a meta analysis organized by Society of Behavioral Medicine?

 The authors wish to acknowledge the Society of Behavioral Medicine and its Evidence-Based Behavioral Medicine Committee, which organized the authorship group…Society of Behavioral Medicine, however, has not commissioned, supervised, sanctioned, approved, overseen, reviewed, or exercised editorial control over the publication’s content. Accordingly, any views of the authors set forth herein are solely those of the authors and not of the Society of Behavioral Medicine.

Let’s examine this denial in the context of other information. The authors included a recent President of SBM and other members of the leadership of the organization, including one person whom would soon be put forward as a presidential candidate.

The Spring/Summer 2012 SBM newsletter states

 The Collaboration between the EBBM SIG and the EBBM Committee (Chair: Paul B. Jacobsen, PhD) provided peer review throughout the planning process. At least two publications in high impact journals have already resulted from the work.

One of the articles to which the newsletter refers is the meta-analysis of interventions for depressive symptoms. I was a member of the EBBM Committee during the time of the writing of this. This and the earlier meta-analyses were inside jobs done by the SBM leadership. A number of the authors are advocates for screening for distress. Naysayers and skeptics on the EBBM Committee were excluded.

The committee did not openly solicit authors for this meta analysis in its meetings nor discuss progress. When I asked David Mohr, one of the eventual authors, why the article was not being discussed in the meetings, he said that the discussions were being held by telephone.

Notably missing from the authors of this meta-analysis is Paul Jacobson, who was head of the Evidence-based Medicine Committee during its writing. He has published meta-analyses and arguably is more of an expert on psychosocial intervention in cancer care than almost any of the authors. Why was he not among the authors? He is given credit only for offering “suggestions regarding the conceptualization and analysis” and for providing “peer review.”

It would have been exceedingly awkward if Jacobson was listed as an author. His CV notes that he was the recipient of $10 million from Pfizer to develop means of assuring quality of care by oncologists. So, he would have had to have a declaration of conflict of interest on a meta-analysis from SBM evaluating psychotherapy and antidepressants for cancer patients. That would not have looked good.

Just before the article was submitted for publication, I received a request from one of the authors asking permission for me to be mentioned in the acknowledgments. I was taken aback because I had never seen the manuscript and I refused.

I know, as Yogi Berra would say, we’re heading for déjà vu all over again. In earlier blodeja vug posts (1, 2)  I criticized a flawed meta-analysis done by this group concerning psychosocial interventions for pain. When I described that the meta-analysis as “commissioned” by SBM. I immediately got a call from the president asking me for a correction. I responded by posting a link to an email by one of the authors describing that meta-analysis, as well as this one, as “organized” by SBM.

So, we are asked to believe the article does not represent the views of SBM, only the authors, but these authors were hand-picked and include some of the leadership of SBM. Did the authors take off their hats as members of the governance of SBM during the writing of the paper?

The authors are not a group of graduate students who downloaded some free meta-analysis software. There were strong political considerations in their selection, but as a group, they have experience with meta-analyses. Journal of the National Cancer Institute (JNCI) is not a mysterious fly-by-night Journal that is not indexed in ISI Web of Science. To the contrary, it is a respected, high impact (JIF= 14.3).

As with the meta-analysis of long-term psychoanalytic psychotherapy with its accompanying editorial in JAMA, followed by publication of a clone in British Journal of Psychiatry, we have to ask did the authors have privileged access to publishing in JNCI with minimal peer-review? Could just anyone have gotten such a meta-analysis accepted there? After all, there are basic, serious misclassifications of the studies that provided most of the patients included in the meta-analysis for psychotherapeutic intervention. There are patently inappropriate comparisons of different therapies delivered in very different studies, some without the basis of random assignment. I speak for myself, not PLOS One, but if, as an Academic Editor I had received such a flawed manuscript, I would have recommended immediately sending it back to the authors without it going out for review.

Imagine that this meta-analysis were written/organized/commissioned/supported by pharmaceutical companies

What we have is exceedingly flawed meta-analysis that reaches a seemingly forgone conclusion promoting the dissemination and implementation of services by the members of an organization from which it came. The authors rely on an exceedingly small number of studies, bolstered by recruitment of some that are highly inappropriate to address the question whether psychotherapy improved depressive symptoms among cancer patients. Yet, the authors’ conclusions are a sweeping endorsement for psychotherapy in this context and unqualified by any restrictions. It is a classic use of meta-analysis for marketing purposes, branding of services being offered, not scientific evaluation. We will see more of these in future blog posts.

If the pharmaceutical industry had been involved, the risk of bias would have been obvious and skepticism would have been high.

But we are talking about a professional organization, not the pharmaceutical industry. We can see that the meta-analysis was flawed, but we should also consider whether that is because it was written with a conflict of interest.

There are now ample demonstrations that practice guidelines produced by professional organizations often serve their members’ interests at the expense of evidence. Formal standards  have been established for evaluating the process by which these organizations produce guidelines. When applied to particular guidelines, the process and outcome often comes up short.

So, we need to be just as skeptical about meta-analyses produced by professional organizations as we are about those produced by the pharmaceutical industry. No, Virginia, we cannot relax our guard, just because a meta-analysis has been done by a professional organization.

If this example does not convince you, please check out a critique of another one written/organized/commissioned/supported by the same group (1, 2).

UPDATE (April 18, 2014)

meta analyses of efficacy CUTjpgAn alert reader scrutinized the meta-analysis after reading my blog post and found something quite interesting in the table to the left. Click on the table to enlarge. What you see is that every comparison worked out extraordinarily well, too well.

The problem is, of course, that these comparisons are inappropriate, as discussed in the blog. The comparisons hinge upon studies being misclassified as psychotherapy when they were actually complex collaborative care interventions, as well as comparisons of problem-solving therapy for cognitive behavior therapy when the patients receiving problem-solving therapy were not randomized to it. Rather, their randomized to a condition in which the intervention patients got a combination of free medication, careful medication management, and the option of problem-solving therapy, whereas the control group patients had to pay for treatment and received substantially less of any kind. This is clearly meta-analysis malpractice of the highest order.

See my discussion of an exchange of letters with the authors here. Go and comment yourself about this study at PubMed Commons here.

Category: cancer, Conflict of interest, depression, psychotherapy, Uncategorized | Tagged , , , | 2 Comments

Keeping zombie ideas about personality and health awalkin’: A teaching example

Reverse engineer my criticisms of this article and you will discover a strategy to turn your own null findings into a publishable paper.

TYPE-D-HeartcurrentsHere’s a modest little study with null findings, at least before it got all gussied up for publication. It has no clear-cut clinical or public health implications. Yet, it is valuable as a teaching example showing how such studies get published. That’s why I found it interesting enough to blog about it at length.


van de Ven, M. O., Witteman, C. L., & Tiggelman, D. (2013). Effect of Type D personality on medication adherence in early adolescents with asthma. Journal of Psychosomatic Research, 75(6), 572-576. Abstract available here and fulltext here]

As I often do, I am going to get quite critical in this blog post, maybe even making some readers wince. But if you hang in there, you will see some strategies for publishing negative results as if they were positive that are widely used throughout personality, health, and positive psychology. Your critical skills will be sharpened, but you will also be able to reverse engineer my criticisms to get papers with null findings published.

Read on and you’ll see things that the reviewers at Journal of Psychosomatic Research apparently did not see, nor the editors, but they should have.  I have emailed the editors inviting them to join in this discussion and I am expecting them to respond. I have had lots of dealings with them and actually find them to be quite reasonable fellows. But peer review is imperfect, and one of the good things about blogging is that I can get the space to call out when it fails us.

The study examined whether some measures of negative emotion predicted adherence in early adolescents with asthma. A measure of negative affectivity (sample item: “I often make a fuss about unimportant things”) and what was termed social inhibition (sample item “I would rather keep other people at a distance”) were examined separately and when combined in a categorical measure of Type D personality (the D in Type D stands for distress).

Type D personality studies were once flourishing, even getting coverage in Time and Newsweek and discussion by Dr. Oz.  The claim was that a Type D personality predicted death among congestive heart failure patients so well that clinicians should begin screening for it. Type D was supposed to be a stable personality trait, so it was not  clear what clinicians could do with the information from screening. But I will be discussing in a later blog post why the whole area of research can itself be declared dead because of fundamental, inescapable problems in the conception and measurement of Type D. When I do that, I will draw on an article co-authored with Niels de Voorgd,  “Are we witnessing the decline effect in the Type D personality literature?”

John Ioannidis providing an approving commentary on my paper with Niels, with the provocative title of “Scientific inbreeding and same-team replication: Type D personality as an example.” Among the ideas attributable to Ioannidis are that most positive findings are false, as well as that most “discoveries” are subsequently proven to be false or at least exaggerated. He calls for a greater value being given to replication, rather than discovery.

Yet in his commentary on our paper, he uses the Type D personality literature as a case example of how the replication process can go awry. A false credibility for a hypothesis is created by false replications. He documented is significant inbreeding of investigators of type D personality: a quite small number of connected investigators are associated with studies with statistically improbable positive findings. And then he introduced some concepts that can be used to understand processes by which the small group could have undue influence on replication attempts by others:

… Obedient replication, where investigators feel that the prevailing school of thought is so dominant that finding consistent results is perceived as a sign of being a good scientist and there is no room for dissenting results and objections; or obliged replication, where the proponents of the original theory are so strong in shaping the literature and controlling the publication venues that they can largely select and mold the results, wording, and interpretation of studies eventually published.

Ioannidis’ commentary also predicted that regardless of any merits, our arguments would be studiously ignored and even suppressed by proponents of Type D personality. Vested interests use the review process to do that with articles that are inconvenient and embarrassing. Reviewing manuscripts has its advantages in terms of controlling the content of what is ultimately published.

Don’t get me wrong. Niels and I really did not expect everyone to immediately stop doing Type D research just because we published this article. After all, a lot of data have already been collected. In Europe, where most Type D personality data get collected, PhD students are waiting to publish their Type P articles in order to complete their dissertations.

We were very open to having Type D personality researchers pointing out why we wrong very wrongwere wrong, very wrong, and even stupidly wrong. But that is not what we are not seeing. Instead, it is like our article never appeared, with little trace of it in terms of citations even in, ah,  Journal of Psychosomatic Research, where our article and Ioannidis’ commentary appeared. According to ISI Web of Science, our article has been cited an overall whopping 6 times as of April 2014. And there have been lots more Type D studies published since our article first appeared.

Anyway, the authors of the study under discussion adopted what has become known as the “standardized method” (that means that they don’t have to justify it) for identifying “categorical” Type D personality. They took their two continuous measures of negative affectivity and social inhibition and split (dichotomized) them. They then crossed them, creating a four cell, 2 x 2 matrix.

Chart One2

 Next, they then selected out the high/high quadrant for comparison to the three other groups combined as one.

Chart 2

So, the authors made the “standardized” assumption that only the difference between a high/high group and everyone else was interesting. That means that persons who are low/low will be treated just the same as persons who are high in negative affectivity and low in social inhibition. Those who were low in negative affectivity but high in social inhibition are simply treated the same as those who are low on both variables. The authors apparently did not even bother to check– no one usually does– whether some of the people who were high in negative affectivity and low in social inhibition actually had higher scores on negative affectivity than those assigned to the high/high group.

I have been doing my own studies and reviews of personality and abnormal behavior for decades. I am not aware of any other example where personality types are created in which the high/high group is compared to everybody else lumped together. As we will see in a later blog, there are lots of reasons not to do this, but for Type D personality, it is the “standardized” method.

Adherence was measured twice in this study. At one point we readers are told that negative emotion variables were also assessed twice, but the second assessment never comes up again.

The abstract concludes that

categorical Type D personality predicts medication adherence of adolescents with asthma over time, [but] dimensional analyses suggest this is due to negative affectivity only, and not to the combination of negative affectivity and social inhibition.

Let’s see how Type D personality was made to look like a predictor and what was done wrong to achieve this. To enlarge Table 2 just below, double click on it.

table page TypeDJPR-page-0

Some interesting things about Table 2 that reviewers apparently missed:

  • At time T1, adherence was not related to negative affectivity, social inhibition, or Type D personality. There is not much prediction going on here.
  • At time T2, adherence was related to the earlier measured negative affectivity, but not to social inhibition or Type D personality.

Okay, if the authors were searching for significant associations, we have one, only one, here. But why should we ignore the failure of personality variables to predict adherence measured at the same time and concentrate on the prediction of later adherence? Basically, the authors have examined 2×3=6 associations, and seem to be getting ready to make a fuss about the one that proved significant, but was not predicted to stand alone.

Most likely this statistical significance is due to chance– it certainly was not replicated in same-time assessments of negative affectivity and adherence at T1. But this Association seems to be the only basis of claiming one of these negative emotion variables are actually predictors.

  • Adherence at time T2 is strongly predicted by adherence at time T1.

The authors apparently don’t consider this particularly interesting, but it is the strongest association in the data set. They want instead to predict change in adherence from T1 to T2 from trait negative emotion. But why should change in the relatively stable adherence be predicted by negative emotion when negative emotion does not predict adherence measured at the same time?

We need to keep in mind that these adolescents have been diagnosed with diabetes for a while. They are being assessed for adherence at two arbitrary time points. This no indication that something has happened in between those points that might strongly affect their adherence. So, we are trying to predict fluctuations in a relatively stable adherence from a trait, not any upward or downward spiral.

Next, some things we are not told that might further change our opinions about what the authors say is going on in their study.

magician_rabbit_hatLike pulling a rabbit out of a hat, the authors suddenly tell us that they measured self-reported depressive symptoms. The introduction explains this article is about negative affectivity, social inhibition or Type D personality, but only mentions depression in passing. So, depression was never given the explanatory status that the authors give to these other three variables. Why not?

Readers should have been shown the correlation of depression with the other three negative emotion variables. We could expect from a large literature that the correlation is quite high, probably as high as their respective reliabilities allow—as good, or as bad as it gets.

This no particular reason why this study could not have focused on depressive symptoms as predictors of later adherence, but maybe that story would not have been so interesting, in terms of results.

Actually, most of the explanations offered in the introduction as to why measures of negative emotion should be related to adherence would seem to apply to depression. Just go back to the explanations and substitute depression for whatever variables being discussed. See, doesn’t depression work as well?

One of the problems in using measures of negative emotion to predict other things is that these measures are related so much to each other that we can’t count on them to measure only the variable we are trying to emphasize and not something else.

Proponents of Type D personality like these authors want to assert that their favored variable does something that depression does not do in terms of predictions. But in actual data sets, it may prove tough to draw such distinctions because depressive symptoms are so highly correlated with components of Type D.

Some previous investigators of negative emotion have thrown up their hands in despair, complaining about the “crud factor” or “big mess” of intercorrelated measures of negative emotion ruining their ability to test their seemingly elegant ideas about supposedly distinctly different negative emotion variables. When one of the first Type D papers was published,   an insightful commentary complained that the concept was entering an already crowded field of negative emotion variables and asked whether we really needed another one.

In this study, the authors depressive symptoms with the self-report Hospital Anxiety and Depression Scale (HADS) The name of the scale suggests that it separately measures anxiety and depression. Who can argue with the authority of a scale’s name? But using a variety of simple and complicated statistical techniques like different variants of factor analysis, investigators have not been able to show consistently that the separates subscales for anxiety and depression actually measure something different from each other– or that the two scales should not be combined into a general measure of negative emotion/distress.

So talk about measuring “depressive symptoms” with the HADS is wrong, or at least inaccurate. But there are a lot of HADS data sets out there, and so it would be inconvenient to acknowledge what we said in the title of another Journal of Psychosomatic Research article,

The Hospital Anxiety and Depression Scale elivs citings(HADS) is dead, but like Elvis, there will still be citings.

Back to this article, if readers had gotten to see the basic correlations of depression with the other variables in Table 2, we might have seen how high the correlation of depression was with negative affectivity. This would have sent us off in a very different direction than the authors took.

To put my concerns in simple form,  data that are available to the authors but hidden from the readers’ view probably do not allow making the clean kind of distinctions that the authors would need to make if they are going to pursue their intended storyline.

Depressive symptoms are like the heal in rigged American wrestling matches, a foil for Type D personality..

Depressive symptoms are like the heal in rigged American wrestling matches, a foil for Type D personality.

But, uh, measures of depressive symptoms show up all

Type D personality is the face, intended to win against depressive symptoms.

Type D personality is the face, intended to win against depressive symptoms.

the time in studies of Type D personality. Think of such studies as if they are like the rigged American wrestling matches. Depressive symptoms are the heel (or rudo in lucha libre) that always shows up as a looking mean and threatening contender, but most always loses to the face, Type D personality. Read on and find out how supposedly head-to-head comparisons are rigged so this dependably happens.

The authors  eventually tell us that they assessed (1) asthma duration, (2) control, and (3) severity. But we were not allowed to examine whether any of these variables were related to the other variables in Table 2. So, we cannot see whether it is appropriate to consider them as “control variables” or more accurately, confounds.

There is good reason to doubt that these asthma variables are suitable “control variables” or candidates for a confounding variable in predicting adherence.

First, for asthma control to serve as a “control variable” we must assume that it is not an effect of adherence. If it is, it makes no sense to try to eliminate asthma control’s influence on adherence with statistics. It sure seems logical that if these teenagers adhere well to what they are supposed to do to deal with their asthma, asthma control will be better.

Simply put, if we can reasonably suspect that asthma control is a daughter of adherence, we cannot keep treating it as if it is the mother that needs to be controlled in order to figure out what is going on. So there is first a theoretical or simple logical objection to treating asthma control as a “control” variable.

Second, authors are not free to simply designate whatever variables they would like as control variables and throw them into multiple regression equations to control a confound. This is done all the time in the published literature, but it is WRONG!

Rather, authors are supposed to check first and determine if two conditions are met. The variables should be significantly related to the predictor variables. In the case of this study, asthma control should be shown to be associated with one or all of the negative emotion variables. But then the authors would also have to show that it was also related to subsequent adherence. If both conditions are not met, the variable should not be included as a control variable.

Reviewers should have insisted on seeing these associations among asthma duration, control, severity, and adherence. While the reviewers were at it, they should have required that the correlations be available to other readers, if the article is to be published.

We need to move on. I am already taxing readers’ patience with what is becoming a longread. But if I have really got you hooked into thinking about the appropriateness of controlling for particular confounds, you can digress to a wonderful slide show telling more.

So far, we have examined a table of basic correlations, not finding some things that we really need to decide what is going on here, but we seem to be getting into trouble. But multivariate analyses will be brought in to save this effort.

The magic of misapplied multivariate regression.

The authors deftly save their storyline and get a publishable paper with “significant” findings in two brief paragraphs

The decrease in adherence between T1 and T2 was predicted by categorical Type D personality (Table 3), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Adolescents with a Type D personality showed a larger decrease in adherence rates fromT1 to T2 than adolescents without a Type D personality.


The results of testing the dimensions NA and SI separately as well as their interaction showed that there was a main effect of NA on changes in adherence over time (Table 4), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Higher scores on NA at T1 predicted a stronger decrease in adherence over time. Neither SI nor the interaction between NA and SI predicted changes in adherence.

Wow! But before we congratulate the authors and join in the celebration we should note a few things. From now on in the article, they are going to be discussing their multivariate regressions, not the basically null findings obtained with the simple bivariate correlations. But these regression equations do not undo the basic findings with the bivariate correlations. Type D personality did not predict adherence, but it only appears to do so in the context of some arbitrariy and ill-chosen covariates. But now they can claim that Type  D won the match fair and square, without cheating.

But don’t get down on these authors. They probably even believe in their results. They have merely following the strong precedent of what almost everybody else seems do in the published literature. They did not get caught by the reviewers or editors of Journal of Psychosomatic Research.

Whatever happened to depressive symptoms as a contender for predicting adherence? It was not let into the ring until after Type D personality and its components had secured the match. These other variables got to do all the predicting they could do, and only then depressive symptoms were entered the ring. That is what happens when you have highly correlated variables and manipulate the match by picking one to go first.

And there is a second trick guaranteeing that Type D will win over depressive symptoms. Recall that to be called Type D personality, research subjects had to be high on negative affectivity and high on social inhibition. Scoring high on two (imperfectly reliable) measures of negative emotion usually bests scoring high on only (imperfectly reliable) one. But if the authors had used two measures of depressive symptoms, they could have had a more even match.

The big question: so what?

Type D personality is not so much a theory, as a tried-and-true method for getting flawed analyses published. Look at what the authors of this paper said about it in the introduction in their discussion. They really did not present a theory, but rather cited precedent and made some unsubstantiated speculations about why past results may have been obtained.

Any theory about Type D personality and adherence really does not make predictions with substantial clinical and public health implications. Think about it: if this study had worked out as the authors intended, what difference would it have made? Type D personality is supposedly a stable trait, and so the authors could not have proposed psychological interventions to change it. That has been done and does not work in other contexts.

What, then, could authors have proposed, other than that more research is needed? Should the mothers of these teenagers be warned that their adolescents had Type D personality and so might have trouble with their adherence? Why not just focus on the adherence problems, if they are actually there, and not get caught up in blaming the teens’ personality?

But Type D has been thung.

Because the authors have been saying in lots of articles that they have been studying Type D, it is tough to get heard saying “No, pal, you have studying statistical mischief. Type D does not exist except for statistical mischief.” Type D has been thung, and who can undo that?

Thing (v). to thing, thinging.   1. To create an object by defining a boundary around some portion of reality separating it from everything else and then labeling that portion of reality with a name.

One of the greatest human skills is the ability to thing. We are thinging beings. We thing all the time.


Yes, yes, you might think, but we are not really “thinging.” After all trees, branches and leaves already existed before we named them. We are not creating things we are just labeling things that already exist. Ahhh…but that is the question. Did the things that we named exist before they were named? Or more precisely, in what sense did they exist before they were named, and how did their existence change after they were named?

…And confused part-whole relationships become science and publishable.

Once we have convincingly thung Type D personality, we can fool ourselves and convince others about there being a sharp distinction with the similarly thung “depressive symptoms.”

Boundaries between concepts are real because we make them so, just like between Canada and the United States, even if particular items are arbitrarily assigned to one or the other questionnaire. Without our thinging, we do not as easily forget the various items come from the same “crud factor,” “big mess,” and could have been lumped or split in other ways.

Part-whole relationships become entities interacting with entities in the most sciencey and publishable ways. See for instance

Romppel, et al. (2012). Type D personality and persistence of depressive symptoms in a German cohort of cardiac patients. Journal of Affective Disorders, 136(3), 1183-1187.

Which compares the effectiveness of Type D as a screening tool compared to established measures of depressive symptoms measured with the (ugh) HADS for predicting subsequent HADS depression.

Lo and behold, Type D personality works and we have a screening measure on our hands! Aside from the other advantages that I noted for Type D as a predictor, negative affectivity items going into the ype D categorization are phrased as if they refer to enduring characteristics whereas items on the HADS are phrased to refer to the last week.

Let us get out of the mesmerizing realm of psychological assessment. Suppose we ask a question about whether someone ate meatballs last week or whether they generally eat meatballs. Which would question you guess better predicts meatball consumption over the next year?

And then there is

Michal,  et al. (2011). Type D personality is independently associated with major psychosocial stressors and increased health care utilization in the general population. Journal of Afective Disorders, 134(1), 396-403.

Which finds in a sample of 2495 subjects that

Individuals with Type D had an increased risk for clinically significant depression, panic disorder, somatization and alcohol abuse. After adjustment for these mental disorders Type D was still robustly associated with all major psychosocial stressors. The strongest associations emerged for feelings of social isolation and for traumatic events. After comprehensive adjustment Type D still remained associated with increased help seeking behavior and utilization of health care, especially of mental health care.

The main limitation is the reliance on self-report measures and the lack of information about the medical history and clinical diagnosis of the participants.

Yup, relied on self-report questionnaires in multivariate analyses, not interview-based diagnosis and the measure of “depression” or “depressive symptoms” asked about the last 2 weeks.

2-15-13-Zombie-run_full_600Keeping zombie ideas awalkin’

How did the study of negative emotion and adherence get published with basically null findings? With chutzpah and by the authors following the formulaic D personality  strategy for getting published. This study did not really obtain significant findings, but the availability of the precedent of  many studies of type D personality  to support claims they achieved a conceptual replication, even if not an empirical one. And these claims were very likely evaluated by members of the type D community making similar claims. In his commentary, Ioannidis pointed to how null Type D findings are gussied up with “approached significance” or, better, was “independently related to blah, blah, when x,y, and z are controlled.”

Strong precedents are often are confused with validity, and the availability of past claims relaxes the standards for making subsequent claims.

The authors were only doing what authors try to do, their damnedest to get their article published. Maybe the reviewers are from the Type D community and can cite the authority of hundreds of studies were only doing what the community tries to do– keep the cheering going for the power of Type D personality and adding another study to the hundreds. But where were the editors of Journal of Psychosomatic Research?

Just because the journal published our paper, for which we remain grateful, I do not assume that they will require authors who submit new papers to agree with us. But you would think, if the editors are committed to the advancement of science, they would request that authors of manuscripts at least relate their findings to the existing conversation, particularly in the Journal of Psychosomatic Research. Authors should dispute our paper before going about their business. If it does not happen in this journal, how can we expect to happen elsewhere?


Category: advice to junior researchers, Peer review, Publication bias | Tagged , , | 5 Comments

Bambi meets Godzilla: Independent evaluation of the superiority of long-term psychodynamic therapy

“As I was saying before I was so rudely interrupted…”— William Connor

Bambi-meets-Godzilla-513d504906daa_hiresThis is the second post in a two-part series about claims made in meta-analyses in JAMA and more recently elsewhere that long-term psychodynamic therapy is superior to shorter psychotherapies. This post was intended to be uploaded a while ago. But it got shelved until now because I felt the need to respond to hyped and distorted media coverage of a study in Lancet of CBT for persons with schizophrenia with two posts examining what turned out to be that significant, but not overwhelming study.

That digression probably helped to change the tide of opinion about that Lancet study. When I first posted about it, media coverage was dominated by wild claims about CBT having been shown to be as effective antipsychotic medication. Coverage in Science was headlined “Schizophrenia: Time to Flush the Meds?“ Alternative perspectives were largely limited to a restrained, soft-toned note of skepticism at Mental Elf and louder complaints at Keith Law’s Dystopia.

Here and elsewhere, I challenged the media coverage. I showed that this study actually had null findings of no difference between CBT and treatment as usual at the end of the intervention. Despite all the previous enthusiasm shown for the study back then, no one is now responding to my request on Twitter to come forward if they still believes the results that were claimed for the study A single holdout has been persisting with comments at my other blog site, but he increasingly seems like Hiroo Onoda still wandering in the jungle after the battle is lost.

Stay tuned for for what could be Keith Laws and my article at the new Lancet Psychiatry and likely a slew of letters at Lancet.

There is certainly more about the JAMA meta-analysis worth writing about. It was accompanied by a gushy editorial and praised by people like Peter Kramer who argued we can’t argue  with the authority of JAMA. Yet when I took a look  at the JAMA paper, it proved to be

a bizarre meta-analysis compar[ing]  1053 patients assigned to LLTP to 257 patients assigned to a control condition, only 36 of whom were receiving an evidence based therapy for their condition.

My Mind the Brain post drew heavily on a critique I co-authored with Aaron T Beck, Brett Thombs, and others.

In this post I will continue describing what happened next:

  • Leichsenring and Rabung responded to critics, dodging basic criticisms and condemning that those who reject their claims are bringing in biases of their own.

Yup, I was accused of being a part of a plot of advocates  of cognitive therapy trying to beat down  legitimate claims of long-term psychoanalysis and psychodynamic therapy being superior. Those who thought I was part of a plot against cognitive therapy during my analyses of the Lancet study, please note.

  • Leichsenring and Rabung renewed their claims in another meta-analysis in British Journal of Psychiatry for which 10 of the 11 studies were already included in the JAMA meta-analysis.

When you read my account of this recycling, you will probably wonder why they were allowed to do this. I will give some reasons that do not reflect well on that journal. But then Harvard Review of Psychiatry offered continuing education credits from those who wanted to learn how to interpret a meta-analysis from another recycling.

  • The long term psychodynamic/psychoanalytic community responded approvingly, echoing Leichsenring and Rabung’s assessment of skeptics.
  • The important question of whether long-term psychoanalytic psychotherapy is better than shorter term therapies got an independent evaluation by another group, which included the world-class meta-analyst and systematic reviewer, John Ioannidis.

When Ioannidis offers a critical analysis of conventional wisdom, it is generally worth paying attention.

Responses from Leichsenring and Rabung and the LTPP community to criticism

I’ve been a skeptical critic long enough not to expect that authors will agree with criticism or that they will substantially modify their conclusions, no matter how incisive criticisms are. But when there’s been such obvious miscalculations as Leichsenring and Rabung made that inflated results in their favor, I would expect at least some admission of error and adjustment of conclusions, even if not a retraction.

None was forthcoming in their  response to the criticisms of the JAMA paper or an extended follow up.

They admitted no computational error, not even for a claim that in one analysis, the effect size for LTPP was 6.9. They did concede that their effect size estimates were different and even larger than “commonly assessed.” But there was apparently nothing wrong with that, even if readers might be expecting something comparable to what is typically done in meta analyses.

Overall, one would never get a sense from Leichsenring and Rabung response that they had published one of the most wildest examples of meta-analysis malpractice that can be found in any recent high impact journal. Results were not in any way comparable to what a conventional, well done meta-analysis would produce. [Click here if you want a more detailed analysis.].

Leichsenring and Rabung’s response to our extended critique was also strange. For instance, we had offered the criticism that the small group of studies entered into the meta-analysis were so heterogeneous that any effect size could not be generalized back to any one of the studies what treatments going into the meta-analysis. To this they responded

Heterogeneity of control conditions and diagnoses (p. 210) are part of the discussion about effectiveness vs. efficacy, which is nearly as old as psychotherapy research itself. Although theoretically important for internal validity, the attempt to create truly homogeneous yet clinically relevant study populations leads psychotherapy ad absurdum.

Really? I guess the Cochrane Collaboration has been getting it all wrong in being so concerned about heterogeneity.

But basically Leichsenring and Rabung’s message was “why were we picking on them and their meta-analysis and not someone else?” They implied we had undisclosed conflict of interests in criticizing them:

It is quite ironic that the paper of Bhar et al. is published in close proximity to an editorial dealing with the unmasking of special interest groups [10], which are obviously not limited to somatic medicine and the pharmaceutical industry.

Really, Falk and Sven? Is it a conflict of interest for anyone but a psychoanalyst to have an opinion about a slick sell job of a meta analysis?

I joined  with other colleagues because I was deeply offended by your flaunting of standards, your abuse of statistics, your stubborn refusal to acknowledge that you had made mistakes. All in the prestigious JAMA and with a fawning editorial that becomes a further source of offense.  I wrote and continue to write about your work because I want to cultivate a gag response in others.

The authors went on to accuse us of being in some sort of a plot, of our joining

ranks with an interesting and surprising movement of others [9], who publish comments with relatively low empirical novelty but quite harsh language towards the Leichsenring and Rabung article in other journals, let alone internet blogs and pamphlets.

As of March, 2014 JAMA article has racked up over 180 citations according to ISI Web of Science, 454 according to Google Scholar. This unusually large discrepancy reflects in part proponents of psychoanalytic therapy making greater use of chapters in books, rather than peer reviewed articles. Across the LTPP literature, the Leichsenring and Rabung’s blatant miscalculations in calculating effect sizes are being uncritically accepted and praised. Recurrent themes get amplified in repetition. Although skepticism is expressed about LTPP being evaluated in RCTs and meta-analyses, the contradictory argument is made, usually in the same article, that the science is solid, effects are equal or larger than for evidence-based therapies, and critics and doubters gets slammed as having ulterior motives and undisclosed conflicts of interest.

A Psychology Today counterpoint from a University of Colorado Medical School psychologist to my complaint about the miscalculated effect sizes took the hype to a whole new level:

Indeed, the within-group effect sizes for long-term psychodynamic therapy were quite large (as a rough example: if psychiatric symptoms were SAT scores and long-term psychodynamic therapy were an SAT training program, the average student would expect to increase their score by somewhere around 90-180 points on each section).

You would think that this analogy was a cause for skepticism, but no.

Once is not enoughonce is not enough

Eighteen months after publication of the JAMA article, Leichsenring and Rabung published another meta-analysis in British Journal of Psychiatry. Ten of the 11 studies entered into it were already in the JAMA article. The article’s title identified it as an “update.”

The sole study “updating” the JAMA meta-analysis was a decade old and had been excluded from the JAMA analyses. Bateman and Fonagy comparing an 18 month “mentalization-based” therapy to structured clinical management, “a counseling model closest to a supportive approach with case management, advocacy support, and problem-oriented  psychotherapeutic  interventions (p.357).” This treatment was not manualized or evidence-based. The study did not add much except for further statistical and clinical heterogeneity and confusion.

Leichsenring and Rabung concluded from the redone meta-analysis

Results suggest that LTPP is superior to less intensive forms of psychotherapy in complex mental disorders.

Leichsenring and Rabung have continued to turn out redundant meta-analyses. Leichsenring’s article in Harvard Review of Psychiatry offers continuing education credit.

Learning Objectives: After participating in this educational activity, the reader should be better able to evaluate the empirical evidence for pre/post changes in psychoanalysis patients with complex mental disorders, and assess the limitations of the meta-analysis.

Updates and meta-analyses are justified by the passage of time and accumulation of new relevant studies. There was neither in the case of BJP or these other meta-analyses. The BJP editor should have recognized the manuscript as an attempt at duplicate publication, aimed at extending a publicity effort into new venues, not a publication justified by new science.

Critics predictably responded to the re-analysis. Only one of the Rapid Responses left on the Journal website made into the print edition and it was met a month later by an invited editorial in BJP from Jeremy Holmes, author of Introduction to Psychoanalysis– without the same strict word limits:

Leichsenring &Rabung7 found that long-term psychodynamic psychotherapies (LTPPs) produced large within-group effect sizes (average 0.8–1.2) comparable with those achieved by other psychotherapy modalities; that gains tended to accumulate even after therapy has finished, in contrast to non-psychotherapeutic treatments; and that a dose–effect pattern was present, with longer therapies producing greater and more sustained improvement.

The issue of insurance reimbursement was pushed with the reassurance:

Although expensive, psychodynamic psychiatry is able in some circumstances to ‘pay for itself’,9 thanks to offset costs of other expenses (medication, hospital stays, welfare payments, etc).

This extravagant claim was based on the single study added by Leichsenring and Rabungin their  BJP reanalysis, in which LTPP was compared to 18 months of “mentalization-based therapy” with case management, advocacy support, and problem-oriented psychotherapeutic interventions.

Elsewhere, Thombs and colleagues had noted a trial excluded by Leichsenring and Rabung because it was too short, found comparable outcomes for a mean of 232 LTPP sessions at an estimated cost of $29,000 to $40,600, according to the authors, to 9.8 sessions of a nurse-delivered solution-focused therapy at a cost of $735 to $980.

How did this recycling get into British Journal of Psychiatry? It is a matter of conjecture, but the editor at the time was Peter Tyrer, a practicing psychoanalyst. He is also a devout Catholic who gave space to American antiabortionists claiming that a significant portion of the mental health problems of women of childbearing age was due to abortion. He resisted a storm of criticism of the antiabortionists meta-analysis, which disproportionately featured their own flawed studies. He even recruited Ben Goldacre to manage the resulting crisis. Goldacre, however, joined with critics in denouncing the meta-analysis.

These are only two examples, but they are extraordinary. Maybe Tyrer had some sort of imperial sense of editorial discretion. But there are rules…

Too many flawed meta analysis by the same authors

A recent BMJ article has noted the prevalence of multiple meta-analyses of the same literature, and expressed caution about such clusters of meta-analyses often come from the same group.

Re-analyses from the same authors need special justification and risk perpetuating the same problems from one meta-analysis to another. If inadequacies, including miscalculation and biased, unconventional calculation of effect sizes require a reanalysis, it probably should be done by another group.

Moreover, if the inadequacies of the earlier analyses by particular authors are the rationale for the conducting another meta-analysis, the problems that led to the decision  to do so should be made explicit. Arguably, if the problems are sufficient to require a reanalysis, the earlier analyses should either be retracted or no longer uncritically cited. Instead, we have a pattern of laudatory self citation, minimization of any difficulties, and repeated meta-analyses lending false authority.

An independent reanalysis

The Dutch Health Care Insurance Board (CVZ) provided partial funding for an independent reanalysis of the evidence concerning the efficacy of LTPP. All the authors were Dutch, except for John Ioannidis, the Greek-American who is the author of numerous well-executed meta-analyses and systematic reviews. Some have proved game changing, like “Most Positive Findings Are False.”  Richard Smith, former Editor of BMJ endorses Ioannidis as

a brilliant researcher who has done more than anybody to identify serious problems with the publishing of science.

The resulting meta analysis is superb and a great example for graduate students to evaluate with objective criteria such as PRISMA or AMSTAR.

The authors defined LTPP as having at least 40 sessions and continuing for at least one year. This different from Leichsenring and Rabung’s requirement of at least 50 sessions, but the authors noted that weekly sessions may result in a total of less than 50 sessions in a year, allowing for patient and therapist vacations and missed sessions.

The authors struggled with the poor quality of the studies they were able to identify, but came up with an excellent solution, a sensitivity analysis. The meta-analysis was conducted with each of the poor quality studies included and then again with each of these studies excluded. As it turned out, whether the studies were included did not influence the results.

They explicitly rejected the validity of effect sizes calculated on the basis of within-group differences:

To reliably assess the effectiveness of any treatment, it is necessary to evaluate its outcomes compared to a control group. The change in severity or intensity of a mental disorder over time cannot be attributed solely to the treatment that took place during that time, unless the treatment is controlled for. This is especially so with long-term treatments where the course of symptoms may change (more or less) spontaneously over time, even in personality disorders that were previously thought to be stable and incurable, such as borderline personality disorder.

Results were unambiguous and negative, in terms of the efficacy of LTPP:

The recovery rate of various mental disorders was equal after LTPP or various control treatments, including treatments without a specialized psychotherapy component. Similarly, no statistically significant differences were found for the domains target problems, general psychiatric problems, personality pathology, social functioning, overall effectiveness or quality of life.


Control conditions were heterogeneous and frequently of low quality, e.g. without a specialized psychotherapy component. If anything, this suggests that LTPP is often compared against relatively ineffective “straw man” comparator… LTPP comparisons to specialized non-psychodynamic treatments, like dialectical behavior therapy and schema-focused therapy, suggest that LTPP might not be particularly effective.

The bottom line is that available evidence suggests that LTPP is not worthwhile, at least in terms of the conventional ways of evaluating therapies. The authors noted that many of the studies made comparisons between LTPP and a control condition, which is inappropriate if the critical question is whether LTPP is superior to other psychotherapies.

The authors included a provocative quote from Freud expressing doubt whether LTPP really produces much change:

One has the impression that one ought not to be surprised if it should turn out in the end that the difference between a person who has not been analyzed and the behavior of a person after he has been analyzed is not so thorough-going as we aim at making it and as we expect and maintain it to be´ (Freud, 1937/1961).

Bewildered  yet? Tips for evaluating other meta-analyses

I have been encouraging a healthy skepticism about the quality and credibility of articles published in even the most high-impact prestigious journals.  The push to secure insurance coverage for LTPP has produced a lot of bad science, both at the level of poorly done clinical trials intended to prove rather than test the efficacy of LTPP, as well as the horrific meta-analyses that take bizarre steps to ensure LTPP uber alles.

It is troubling to see that bad science repeatedly gets into prestigious, high impact journals with its flaws brazenly displayed. Not only that, is accompanied by laudatory editorials in various efforts to block and neutralize criticism. Anyone who has participated in the debate described in this blog has to be aware of the presence of an old boy network of aging psychoanalysts and their patients that seeks to control the evidence that is available concerning LTPP. Few will articulate readers’ dilemma as clearly has Peter Kramer, but many readers suffer a discomfort knowing something is wrong with these clinical trials and meta-analyses, but they are intimidated by the sheer authority of JAMA, British Journal of Psychiatry, and Harvard Review of Psychiatry. Why, if there is such consensus about the efficacy of LTPP, who can argue?

Reviewing meta analyses of long-term psychoanalytic and psychodynamic psychotherapies reinforces some points that I have made and will be making in future blog posts.

  • We can’t necessarily decide on the authority and credibility of articles solely on the basis of the journals in which they appear, and an accompanying editorial does not necessarily give added reassurance.
  • We have standards such as CONSORT for guiding the reporting of clinical trials and the Cochrane collaboration risk of bias criteria for judging whether the reporting has a risk of bias.
  • We similarly have standards such as PRISM for guiding organization, conduct, and reporting of meta-analysis and systematic reviews and AMSTAR for their evaluation.

It’s a good idea to familiarize yourself with these readily available standards.

  • We should be wary of the use of meta-analysis for propaganda and marketing of particular interventions and services. Conflicts of interests unconfined to the usual consideration of whether there’s industry support for a clinical trial or meta-analysis.
  • We should be concerned about multiple meta-analysis coming from the same group of authors. We need to ask what justification there is for multiple publications journals and the be alert to uncritical self citation.

But what if one does not have time or inclination to scrutinize bad meta-analyses with formal rating scales? That is certainly true of a lot of consumers, whether they be clinicians, policymakers, or patients trying to make intelligent decisions about whether they really need long-term treatment. I think that they are some basic things to look for that can serve as a first screen of meta-analyses so the decision could be made whether to accept them, dismiss them, or subject them to further evaluation. The screen can be seen as a good rule-out tool and maybe the first stage of a two-stage process involving a further look, including going back to the original studies and other relevant meta-analyses.

When I pick up a meta-analysis, some first things that I look for that can be immediately seen as missing in Leichsenring and Rabung:

  1. Is there a reasonable number of reasonably sized trials making head-to-head comparisons between the two types of treatments that are being pitted against each other?
  2. Did the authors rely on conventional between-group calculation of effect sizes, rather than jumping to biased and easily misinterpreted within-group effect sizes?
  3. Aside from technical questions of statistical heterogeneity, does the lumping and splitting of both intervention and comparison groups make sense in terms of similarities and differences in interventions, patient characteristics, and clinical context?

It has taken years of discussion for all of the problems of Leichsenring and Rabung’s meta-analysis to become apparent, but I think their failure to meet basic criteria should have been apparent in a less than 30 minute close read of their JAMA paper. An effect size of 6.9? Come on!

Category: mental health care, psychotherapy | Tagged , | 4 Comments