Salvaging psychotherapy research: a manifesto

NOTE: Additional documentation and supplementary links and commentary are available at What We Need to Do to Redeem Psychotherapy Research.

Fueling Change in Psychotherapy Research with Greater Scrutiny and Public Accountability

John Ioannidis’s declarations that most positive findings are false and that most breakthrough discoveries are exaggerated or fail to replicate apply have as much to with psychotherapy as they do with biomedicine.

BadPharma-Dec2012alltrials_basic_logo2We should take a few tips from Ben Goldacre’s Bad Pharma and clean up the psychotherapy literature, paralleling what is being accomplished with pharmaceutical trials. Sure, much remains to be done to ensure the quality and transparency of drug studies and to get all of the data into public view. But the psychotherapy literature lags far behind and is far less reliable than the pharmaceutical literature.

As it now stands, the psychotherapy literature does not provide a dependable guide to policy makers, clinicians, and consumers attempting to assess the relative costs and benefits of choosing a particular therapy over others. If such stakeholders uncritically depend upon the psychotherapy literature to evaluate the evidence-supported status of treatments, they will be confused or misled.

Psychotherapy research is scandalously bad.

Many RCTs are underpowered, yet consistently obtain positive results by redefining the primary outcomes after results are known. The typical RCT is a small, methodologically flawed study conducted by investigators with strong allegiances to one of the treatments being evaluated. Which treatment is preferred by investigators is a better predictor of the outcome of the trial than the specific treatment being evaluated.

Many positive findings are created by spinning a combination of confirmatory bias, flexible rules of design, data analysis and reporting and significance chasing.

Many studies considered positive, including those that become highly cited, are basicallycherrypicking null trials for which results for the primary outcome are ignored, and post-hoc analysis of secondary outcomes and subgroup analyses are emphasized. Spin starts in abstracts and results that are reported there are almost always positive.

noceboThe bulk of psychotherapy RCTs involve comparisons between a single active treatment and an inactive or neutral control group such as wait list, no treatment, or “routine care” which is typically left undefined but in which exposure to treatment of adequate quality and intensity is not assured. At best these studies can tell us whether a treatment is better than doing nothing at all or than patients expecting treatment because they have enrolled in a trial and not getting it (nocebo).


Meta-analyses of psychotherapy often do not qualify conclusions by grade of evidence, ignore clinical and statistical heterogeneity, inadequately address investigator allegiance, downplay the domination by small trials with statistically improbable rates of positive findings, and ignore the extent to which positive effect sizes occur mainly in comparisons between active and inactive treatments.

Meta-analyses of psychotherapies are strongly biased toward concluding that treatments work, especially when conducted by those who have undeclared conflicts of interest, including developers and promoters of treatments that stand to gain financially from their branding as “evidence-supported.”

Overall, meta-analyses too heavily depend on underpowered, flawed studies conducted by investigators with strong allegiances to a particular treatment or to finding that psychotherapy is in general efficacious. When controls are introduced for risk of bias or investigator allegiance, affects greatly diminish or even disappear.

Conflicts of interest associated with authors having substantial financial benefits at stake are rarely disclosed in the studies that are reviewed or the meta-analyses themselves.

Designations of Treatments as Evidence-Supported

There are low thresholds for professional groups such as the American Psychological Association Division 12 or governmental organizations such as the US Substance Abuse and Mental Health Services Administration (SAMHSA) declaring treatments to be “evidence-supported.” Seldom are any treatments deemed ineffective or harmful by these groups.

Professional groups have conflicts of interest in wanting their members to be able to claim the treatments they practice are evidence-supported, while not wanting to restrict practitioner choice with labels of treatment as ineffective. Other sources of evaluation like SAMHSA depend heavily and uncritically on what promoters of particular psychotherapies submit in applications for “evidence supported status.”

"Everybody has won, and all must have prizes." Chapter 3 of Lewis Carroll's Alice's Adventures in Wonderland

“Everybody has won, and all must have prizes.” Chapter 3 of Lewis Carroll’s Alice’s Adventures in Wonderland

The possibility that there are no consistent differences among standardized, credible treatments across clinical problems is routinely ridiculed as the “dodo bird verdict” and rejected without systematic consideration of the literature for particular clinical problems. Yes, some studies find differences between two active, credible treatments in the absence of clear investigator allegiance, but these are unusual.

The Scam of Continuing Education Credit

thought field therapyRequirements that therapists obtain continuing education credit are intended to protect consumers from outdated, ineffective treatments. There is inadequate oversight of the scientific quality of what is offered. Bogus treatments are promoted with pseudoscientific claims. Organizations like the American Psychological Association (APA) prohibit groups of their members making statements protesting the quality of what is being offered and APA continues to allow CE for bogus and unproven treatments like thought field therapy and somatic experiencing.

Providing opportunities for continuing education credit is a lucrative business for both accrediting agencies and sponsors. In the competitive world of workshops and trainings, entertainment value trumps evidence. Training in delivery of manualized evidence-supported treatments has little appeal when alternative trainings emphasize patient testimonials and dramatic displays of sudden therapeutic gain in carefully edited videotapes, often with actors rather than actual patients.

Branding treatments as evidence supported is used to advertise workshops and trainings in which the particular crowd-pleasing interventions that are presented are not evidence supported.

Those who attend Acceptance and Commitment (ACT) workshops may see videotapes where the presenter cries with patients, recalling his own childhood.  They should ask themselves: “Entertaining, moving perhaps, but is this an evidence supported technique?

Psychotherapies with some support from evidence are advocated for conditions for which there is no evidence for their efficacy. What would be disallowed as “off label applications” for pharmaceuticals is routinely accepted in psychotherapy workshops.

We Know We Can Do Better

Psychotherapy research has achieved considerable sophistication in design, analyses, and strategies to compensate for missing data and elucidate mechanisms of change.

Psychotherapy research lags behind pharmaceutical research, but nonetheless hasCONSORT recommendations and requirements for trial preregistration, including specification of primary outcomes; completion of CONSORT checklists to ensure basic details of trials are reported; preregistration of meta-analyses and systematic reviews at sites like PROSPERO, as well as completion of the PRISMA checklist for adequacy of reporting of meta-analyses and systematic reviews.

nothing_to_declare1Declarations of conflicts of interest are rare and exposure of authors who routinely failed to disclose conflicts of interest is even rarer.

Departures from preregistered protocols in published reports of RCTs are common, and there is little checking of discrepancies in abstracts from results that were actually obtained or promised in preregistration by authors.  There is  inconsistent and incomplete adherence to these requirements. There is little likelihood that noncompliant authors will held accountable and  high incentive to report positive findings in order for a study is to be published in a prestigious journal such as the APA’s Journal of Consulting and Clinical Psychology (JCCP). Examining the abstracts of papers published in JCCP gives the impression that trials are almost always positive, even when seriously underpowered.

Psychotherapy research is conducted and evaluated within a club, a mutual admiration society in which members are careful not to disparage others’ results or enforce standards that they themselves might want relaxed when it comes to publishing their own research. There are rivalries between tribes like psychodynamic therapy and cognitive behavior therapy, but suppression of criticism within the tribes and in strenuous efforts to create the appearance that members of the tribes only do what works.

Reform from Without

Journals and their editors have often resisted changes such as adoption of CONSORT, structured abstracts, and preregistration of trials. The Communications and Publications Baord of the American Psychological Association made APA one of the last major holdout publishers to endorse CONSORT and initially provided an escape clause that CONSORT only applied to articles explicitly labeled as a randomized trial. The board also blocked a push by the Editor of Health Psychology for structured abstracts that reliably reported details needed to evaluate what had actually been done in trials and the results were obtained. In both instances, the committee was most concerned about the implications for the major outlet for clinical trials among its journals, Journal of Consulting and Clinical Psychology.

Although generally not an outlet for psychotherapy trials, the journals of the Associationvagal tone for Psychological Science (APS) show signs of even being worse offenders in terms of ignoring standards and commitment to confirmatory bias. For instance, it takes a reader a great deal of probing to discover that a high-profile paper of Barbara Fredrickson in Psychological Science was actually a randomized trial and further detective work to discover that it was a null trial. There is no sign that a CONSORT checklist was ever filed the study. And despite Frederickson using the spun Psychological Science trial report to promote her workshops, there is no conflict of interest declared.

The new APS Clinical Psychological Science show signs of even more selective publication and confirmatory bias than APA journals, producing newsworthy articles, to the exclusion of null and modest findings. There will undoubtedly be a struggle between APS and APA clinical journals for top position in the hierarchy publishing only papers that that are attention grabbing, even if flawed, while leaving to other journals that are considered less prestigious, the  publishing of negative trials and failed replications.

If there is to be reform, pressures must come from outside the field of psychotherapy, from those without vested interest in promoting particular treatments or the treatments offered by members of professional organizations. Pressures must come from skeptical external review by consumers and policymakers equipped to understand the games that psychotherapy researchers play in creating the appearance that all treatments work, but the dodo bird is dead.

Specific journals are reluctant to publish criticism of their publishing practices.  If we at first cannot gain publication in the offending journals of our concerns, we can rely on blogs and Twitter to call out editors and demand explanations of lapses in peer review and upholding of quality.

We need to raise stakeholders’ levels of skepticism, disseminate critical appraisal skills widely and provide for their application in evaluating exaggerated claim and methodological flaws in articles published in prestigious, high impact journals. Bad science in the evaluation of psychotherapy must be recognized as the current norm, not an anomaly.

We could get far by enforcing rules that we already have.

We need to continually expose journals’ failures to enforce rules about preregistration, disclosure of conflicts of interest, and discrepancies between published clinical trials and their preregistration.

There are too many blatant examples of investigators failing to deliver what they promised in the preregistration, registering after trials have started to accrue patients, and reviewers apparently not ever checking if the primary outcomes and analyses promised in trial registration are actually delivered.

Editors should

  • Require an explicit statement of whether the trial has been registered and where.
  • Insist that reviewers consult trial registration, including modifications, and comment on any deviation.
  • Explicitly label registration dated after patient accrual has started.

spin noCONSORT for abstracts should be disseminated and enforced. A lot of hype and misrepresentation in the media starts with authors’ own spin in the abstract . Editors should insist that main analyses for the preregistered primary outcome be presented in the abstract and highlighted in any interpretation of results.

No more should underpowered in exploratory pilot feasibility studies be passed off as RCTs when they achieve positive results. An orderly sequence of treatment development should occur before conducting what are essentially Phase 3 randomized trials.

Here as elsewhere in reforming psychotherapy research, there is something to be learned from drug trials. A process of intervention development ought to include establishing the feasibility and basic parameters of clinical trials needs to proceed phase 3 randomized trials, but cannot be expected to become phase 3 or to provide effect sizes for the purposes of demonstrating efficacy or comparison to other treatments.

Use of wait list, no treatment, and ill-defined routine care should be discouraged as control groups. For clinical conditions for which there are well-established treatments, head-to-head comparisons should be conducted, as well as including control groups that might elucidate mechanism. A key example of the latter would be structured, supportive therapy that controls for attention and positive expectation. There is little to be gained by further accumulation of studies in which the efficacy of the preferred treatment is assured by comparison to a lamed control group that lacks any conceivable element of affective care.

Evaluations of treatment effects should take into account prior probabilities suggested by the larger literature concerning comparisons between two active, credible treatments. The well-studied treatment of depression literature suggests some parameters: effect size is associated with a treatment are greatly reduced when comparisons are restricted to credible, active treatments; better quality studies; and controls are introduced for investigator allegiance. It is unlikely that initial claims about a breakthrough treatment exceeding the efficacy of existing treatments will be sustained in larger studies conducted by investigators independent of developers and promoters.

Disclosure of conflict of interest should be enforced and nondisclosure identified in correction statements and further penalized. Investigator allegiance should be considered in assessing risk of bias.

Developers of treatments and persons with significant financial gain from a treatment being declared “evidence-supported” should be discouraged from conducting meta-analyses of their own treatments.

Trials should be conducted with sample sizes adequate to detect at least moderate effects. When positive findings from underpowered studies are published,  readers scrutinize the literature for similarly underpowered trials that achieve similarly positive effects.

Meta-analyses of psychotherapy should incorporate p-hacking techniques to evaluate the likelihood that pattern of significant findings exceeds likely probability.

Adverse events and harms should routinely be reported, including lost opportunity costs such as failure to obtain more effective treatment.

We need to shift the culture of doing and reporting psychotherapy research. We need to shift from praising exaggerated claims about treatment and faux evidence generated to  promote opportunities for therapists and their professional organizations.  Instead, it is much more praiseworthy to provide  robust, sustainable, even if more modest claims and to call out hype and hokum in ways that preserve the credibility of psychotherapy.


Click to Enlarge

The alternative is to continue protecting psychotherapy research from stringent criticism and enforcement of standards for conducting and reporting research. We can simply allow the branding of psychotherapies as “evidence supported” to fall into appropriate disrepute.

Category: evidence-supported, mental health care, meta analysis, psychotherapy, Publication bias, research | Tagged , , , | 13 Comments

Critical analysis of a meta-analysis of a treatment by authors with financial interests at stake

trust me 2

Update June 2, 2014. This blog has been updated to  respond to information provided in comments on the blog, as well as my examination of the membership of International Scientific Advisory Committee for Triple P Parenting that these comments prompted.

The case appears stronger than first thought that this is a thoroughly flawed meta-analysis conducted by persons with substantial but undeclared,  financial interests in portraying triple P parenting as “evidence-supported” and effective. Moreover, they rely heavily on their own studies and those conducted by others with undeclared conflicts of interest. The risk of bias is high as result of including nonrandomized trials in the meta-analysis, likely selected exclusion of contrary data, likely spinning of the data that is presented, and the reliance on small, methodologically flawed studies whose authors have undeclared conflicts of interest.But don’t take my word for it. Read and see if you are persuaded also.

I recommend clinicians, policymakers, and patients not make decisions about Triple P Parenting on the basis of results of this meta-analysis.

This meta-analysis should serve as a wake-up call for greater enforcement of existing safeguards on generating and integrating evidence concerning the efficacy of psychological interventions, as well as a heightened suspicion that what is observed here is more widespread in the literature and true of evaluations of other treatments.

Do you have the time and expertise to interpret a meta-analysis of a psychological treatment? Suppose you don’t, but you are a policymaker, researcher, clinician, or potential patient. You have to decide whether the treatment is worthwhile. Can you rely on the abstract in a prestigious clinical psychology journal?

What if the authors had substantial financial gains to be made from advertising their treatment as “evidence-supported”?

 In this blog post I am going to provide critical analysis of a meta-analysis of Triple P Parenting Programs (3P) published by its promoters. The authors have a lucrative arrangement with their university [updated] for sharing profits from dissemination and implementation of 3P products. As is typical for articles authored by those benefiting financially from 3P, these authors declare no conflict of interest.

You can read this post with a number of different aims.

(1) You can adjust your level of suspicion when you encounter an abstract of a meta-analysis by authors with a conflict of interest. Maybe be even more suspicious when you aren’t informed by the authors and find out on your own.

(2) You can decide more how much credence to give to claims in meta-analyses, simply because they are in prestigious journals.

(3) You can decide whether you have adequate skills to independently evaluate claims in meta-analysis.

(4) Or you can simply read the post to pick up some tips and tricks for scrutinizing a meta-analysis.

You will see how much work I had to do. Decide whether it would have been worth you doing it. This article appears in a prestigious, peer-reviewed journal. You’d think “Surely the reviewers would have caught obvious misstatements or simple hanky-panky and either rejected the manuscript or demanded revision.” But, if you have been reading my posts, you probably are no longer surprised by gross lapses in the oversight provided by peer review.

This post will identify serious shortcomings in a particular meta analysis. However, this article made it through peer review. After my analysis, I will encourage you to consider how this article reflects on some serious problems in the literature concerning psychological treatment. Maybe there is even a bigger message here.

The Clinical Psychology Review article I will be critiquing is behind a pay wall. You request a PDF from the senior author at . You can consult an earlier version of this meta-analysis placed on the web with a label indicating it was under review at another journal. The manuscript was longer than the published article, but most of the differences between the two other than shortening are cosmetic. Robin Kok has now highlighted the manuscript to indicate overlap with the article. Thanks, Robin.

Does this abstract fairly represent the conduct and conclusions of the meta-analysis?

This systematic review and meta-analysis examined the effects of the multilevel Triple P-Positive Parenting Program system on a broad range of child, parent and family outcomes. Multiple search strategies identified 116 eligible studies conducted over a 33-year period, with 101 studies comprising 16,099 families analyzed quantitatively. Moderator analyses were conducted using structural equation modeling. Risk of bias within and across studies was assessed. Significant short-term effects were found for: children’s social, emotional and behavioral outcomes (d = 0.473); parenting practices (d =0.578); parenting satisfaction and efficacy (d = 0.519), parental adjustment (d = 0.340); parental relationship (d = 0.225) and child observational data (d = 0.501). Significant effects were found for all outcomes at long-term including parent observational data (d = 0.249). Moderator analyses found that study approach, study power, Triple P level, and severity of initial child problems produced significant effects in multiple moderator models when controlling for other significant moderators. Several putative moderators did not have significant effects after controlling for other significant moderators. The positive results for each level of the Triple P system provide empirical support for a blending of universal and targeted parenting interventions to promote child, parent and family wellbeing.

On the face of it, wow! This abstract seems to provide a solid endorsement of 3P based on a well done meta-analysis. Note the impressive terms: “multiple search strategies,”… “moderator analyses”… “Structural equation modeling,”… “risk of bias.”

I have to admit that I was almost taken. But then I noticed a few things.

The effect sizes were unusually high for psychological interventions, particularly treatments delivered

  • in the community with low intensity.
  • by professionals with typically low levels of training.
  • to populations that are mandated treatment and often socially disadvantaged and unprepared to do what is required to benefit from treatment.

The abstract indicates moderator analyses were done. Most people familiar with moderator analyses would be puzzled why results for the individual moderators were not reported except where they turned up in a multivariate analysis. That is not relevant to the question of whether there was a moderator effect for a particular variable.

The abstract mentions attention to “risk of bias,” but does not report what was found. It also leaves you wondering how promoters can analyze the bias of studies of evaluating a psychological treatment when they are the investigators in a substantial number of the trials. Of course, they would say they conduct and report their intervention studies very well. Might they have a risk of bias in giving themselves high marks?

penalty1So, as I started to read the meta-analysis, there is a penalty flag on the playing field. Actually, a lot of them.penalty_flags

I had to read this article a number of times. I had to make a number of forays into the original literature to evaluate the claims that were being made. Read on and discover what I found but here are some teasers:

Two of the largest trials of 3P ever done were excluded, despite claims of comprehensiveness. Both have been interpreted as negative trials.

One of the largest trials ever done involved one of the authors of the meta-analysis. A description of the trial was published while the trial was ongoing. Comparison of this description to the final report suggests highly selective and distorted reporting of basic outcomes and misrepresentation of the basic features of the design. Ouch, a credibility problem!

Very few of the other evaluation studies were pre-registered, so we do not know whether they have such a high risk of bias. But we now have evidence that is unwise to assume they don’t.

The meta-analysis itself was pre-registered in PROSPERO, CRD42012003402. Preregistration is supposed to promise what analyses will and will not be reported in the published meta-analysis. Yet, in this meta analysis, the authors failed to report what was promised and provided analyses other than what were promised in answering key questions. This raises concerns about hypothesizing after results are known (H.A.R.K.I.N.G). Ouch!

The authors claim impressive effects for a variety of outcomes. But these supposedly different outcomes are actually highly intercorrelated. These variables could not have all been the primary outcomes in a given trial. People who do clinical trials and meta-analyses worry about these sorts of selective reporting of outcomes from a larger pool of variables.

There is high likelihood that the original trials and should into the meta-analysis selectively reported positive outcomes from a larger pool of candidates variables. Because these trials do not typically have registered protocols, we do not know what was the designated primary outcome and so we have to particularly careful in accepting multiple outcomes from the same trial. Yet, the authors of this meta-analysis are basically bragging about doing multiple, non-independent re-analyses of data from the same trial. This is not cool. In fact, very fishy.

The abstract implies a single meta-analysis was done. Actually, the big population-level studies were analyzed separately. That is contrary to what was promised in the protocol. We should have been told this would be done at the outset, especially because the authors also claim that they were testing for any bias from including overly small studies (at least one with only 6 patients) with larger ones, and these studies are the largest. So, the authors cannot fully evaluate that bias which they themselves have indicated is a threat to the validity of the findings

The meta-analyses included lower quality, nonrandomized trials, lumping them together with the higher quality evidence of randomized trials. Bad move. Adding poor quality data does not strengthen conclusions from stronger studies.

Inclusion of nonrandomized trials also negates any risk of bias assessment of the randomized trials. Judged by the same standards applied to the randomized trial, the nonrandomized trials all have high of bias. It is no wonder that the authors did not disclose results of risk of bias noted in the abstract. It makes no sense when they have included nonrandomized trials that would score poorly, but it makes no sense to examine the risk of bias in the randomized trials without applying the same standards to the nonrandomized trials with which they are being integrated.

Combining effect sizes from nonrandomized trials with those of randomized trials involve some voodoo statistics that were consistently biased in the direction of producing better outcomes for 3P. Basically, the strategies used to construct effect sizes from nonrandomized trials will exaggerate the effects of 3P.

The bottom line is that “Hey, Houston, we’ve got a problem.” Or two or three.

Introduction: is this meta-analysis the biggest and best ever?

In the introduction, the authors are not shy in their praise for their 3P. They generously cite themselves as the authority for some of the claims. Think about how it would look if they had used first person instead of citing themselves, simply said “we think” and “we conclude” rather than using a citation that usually implies the support of someone else thinking or finding this. Most importantly, they claim that this meta-analysis of 3P is the most comprehensive one ever and they criticize past meta-analyses for being incomplete.

a compilation of systematic review and meta-analysis, based on more than double the number of studies included any prior meta-analysis of triple P or other parenting interventions, provides a timely opportunity to examine the impact of a single, theoretically-integrated system of parenting support on the full range of child, parent and family outcome variables

But it is not a single meta-analysis, but a couple of meta-analyses with some inexplicable and arbitrary lumping and splitting of studies, some of quite poor quality that pass reviewers would have left out.

This meta-analysis includes more studies than recent past meta-analyses because it accepts low-quality data from nonrandomized trials, as well as combining data from huge trials with studies with as few as six patients. It may be bigger, but is unlikely to give a better answer as to whether 3P works because the authors have done unusual things to accommodate poor quality data.

Authorities about how to do a meta-analysis agree it should represent a synthesis of the evidence with careful attention to the quality of that evidence. That involves distinguishing between conclusions based on best evidence, and those dependent on introducing poor quality evidence.

The introduction misrepresents other authors as agreeing with the strategy used in this meta analysis. In particular, if I were Helena Kraemer, I would be very upset that the authors suggest that I endorsed a strategy of ‘the more studies, the better, even if it means integrating data from poor quality studies,’ Or any of a number of things that they imply with which Helena Kraemer would agree:

all possible evaluation designs were included to provide the most comprehensive review of triple P evidence, and to avoid exclusion and publication bias (Sica, 2006; Kraemer et al. 1998)…. To provide the most comprehensive meta-analytic assessment of triple P studies, inclusion-based approach was adopted (Kraemer et al. 1998).

It would help if readers knew that this meta-analysis was published in response to another meta-analysis that disputes the level and quality of evidence that 3P is effective. In a story I have recounted elsewhere, promoters (or someone in direct contact with them) of 3P blocked publication of this other meta-analysis in Clinical Psychology Review and tried to intimidate the author, Philip Wilson, M.D. PhD. His work nonetheless got published elsewhere and, amplified by my commentary on it, set off a lot of reevaluation of the validity of the evidence for 3P. It stimulated discussion about undisclosed conflicts of interest and caused at least one journal editor to announce greater vigilance about nondisclosure.

It is interesting to compare what Wilson says about 3P with how the authors portray his work in this paper. If you did not know better, you would conclude that Wilson’s most important finding was a moderate effect for 3P, but that he was limited by considering too few studies and too narrow a range of outcomes.

Actually, Wilson made well-reasoned and well-and described decisions to focus on primary outcomes, not secondary ones, and only from randomized trials. He concluded:

In volunteer populations over the short term, mothers generally report that Triple P group interventions are better than no intervention, but there is concern about these results given the high risk of bias, poor reporting and potential conflicts of interest. We found no convincing evidence that Triple P interventions work across the whole population or that any benefits are long-term. Given the substantial cost implications, commissioners should apply to parenting programs the standards used in assessing pharmaceutical interventions.

In the manuscript that was posted on the web, these authors cite my paper without indicating its findings and they cite my criticism of them without fairly portraying what I actually said. In the published meta-analysis, they dropped the citation. They instead cite my earlier paper which they say I “claimed” that clinical trials with less than 35 participants per condition have less than 50% probability of obtaining a moderate effect even if there was one present.

No, if these authors had bothered to cite power analyses tables they would see that this is not merely my “claim,” it is what every power analysis table shows. And my point was that if most small published trials are positive, as they usually are, there must be a publication bias, with negative trials missing.

What I say in in the abstract of my critique of 3P is

Applying this [at least 35 participants in the smallest cell] criterion, 19 of the 23 trials identified by Wilson et al. were eliminated. A number of these trials were so small that it would be statistically improbable that they would detect an effect even if it were present. We argued that clinicians and policymakers implementing Triple P programs incorporate evaluations to ensure that goals are being met and resources are not being squandered.

So, most of the available evidence for 3P from randomized trials is from small trials with a high risk of bias, including trials having been conducted by people who have financial interests at stake. The 3P promoters are trying to refute this finding by introducing nonrandomized trials, largely done with their own involvement.

Promoters of 3P have also come under withering criticism from Professor Manuel Eisner of Cambridge University. Reading this article, you would think his main gripe was that developers of 3P are sometimes involved in studies and that needs to be taken into account as a moderator variable.

I invite you to compare the summary in the article to what Professor Eisner says here and here.

The authors set about to counter Professor Eisner by introducing 3P “developer involvement” as a moderator variable. The problem is that Eisner is complaining about undisclosed conflicts of interest. We know that there is a lot of developer involvement by consulting the authorship of papers, but we do not know to what extent conflict of interest exists beyond that, i.e., persons doing trials getting financial benefit from promoting 3P products. But conflicts of interest statements almost never accompany trials of 3P, even when developers are involved.

I sent a couple of emails to Nina Heinrichs, a German investigator who has been involved in a number of studies, conducted a key meta-analysis, and spent considerable time with 3P developers in Australia. I asked her directly if she met criteria for needing to disclose a conflict of interest. She is not yet replied (I will correct this statement if she responds and indicates she does meet criteria for conflict of interest) [Update: Dr. Heinrichs is listed as a member of the Triple P Parenting International Scientific Advisory Committee, as are a number of other authors of 3P intervention trials who fail to disclose conflicts of interest. This meets critteria for having a conflict to declare. But these trials were not coded as positive for developer involvement in this meta analysis and so this designation is not a proxy for conflict of interest. ] So, we have a moderator variable that cannot be independently  checked, but about which there is considerable suspicion. And it does not correspond to the undisclosed conflicts of interest about which critics of 3P complain.

In the introduction we see the authors using a recognizable tactic of misciting key sources and ignoring others in order to suggest there are some consensus about their assessment of 3P and the decisions in conducting a meta-analysis. Basically, they are claiming unfounded authority from the literature by selective and distorted citation and ignoring of others, attributing to sources agreement that is not there and misrepresenting what actually is there. Steven Greenberg has provided an excellent set of tools for detecting when this is being done. I encourage you to study his work and acquire these tools.

We will see more distorted citation in this meta-analysis article as we proceed into the method section. Note, however, that use of this powerful technique and having access to the information it provides requires going back to original sources and looking for differences between what is said in them and what is said about them in the article. This takes time that you might not want to commit, but it is vital to get at some issues.

Methods: Explaining what was done and why

I will try not to bore you with technical details nor get you lost as I almost did in the authors’ complex and contradictory description of how they decided to do the meta-analysis. But I will highlight a few things so that you can get a flavor. I am sure that if you go back you can find more such things.

The methods section starts with reassurance that the protocol for the meta-analysis is preregistered, suggesting that we should relax and assume that the protocol was followed. Actually, the published paper deviates from the protocol in treating randomized trials with active control groups as nonrandomized and by keeping the population-level intervention separate.

Those of you who have been following my blog post may recall my saying again and again that interventions do not have effect sizes, only comparisons do. When these authors take the data for the intervention condition out of randomized trials, they destroy the benefit of the design and use statistics that exaggerate the effects of the treatment.

The authors make a spirited defense of including nonrandomized trials as being better. They justify the voodoo statistics that they used to calculate effect sizes from these trials without control groups. They cite some classic work by Scott Morris and Richard DeShon as the basis for what they do, but I am quite confident that these authorities would be annoyed being invoked as justification.

Basically, if you use data for intervention groups without a comparison, you overestimate the effect of the intervention because you take all the change that occurs and attributed to the intervention, not the passage of time or some nonspecific factors. If there is any hope of getting around this, you have to have an estimate of what change would be experienced in the absence of intervention and test your assumptions in making this estimate.

With the population served by 3P, this is particularly a problem because many of these families come into treatment at a time of crisis and it is only with a control group that you can calculate how much they will decline anyway in their problems without getting treatment. Similarly, if you take an intervention group and drop its comparator, you do not get to see what is specific to the intervention.

We already know that much of the evidence for the efficacy of psychological interventions involves comparing it to an active control groups like waiting lists or remaining in treatment as usual that is either no care or thoroughly in adequate care. The authors of this meta-analysis suppressed information that would have allowed us to examine whether that was the case for 3P.

This is all very bad form and serves to inflate the effect sizes obtained by the authors.

The methods section seems to give an impressive account of thoughtful selection criteria and thorough searches. It indicates that published as well as unpublished papers will be included in the the meta-analysis.

The preregistration does not indicate that population-level intervention trials will be kept separate, but that is what is done. If you do not look carefully, you will not find the two brief disclosures of this in the method section.

The methods section indicates that seven different outcomes will be considered. This may sound comprehensive, but presents a number of problems. Clinical trials have only one or two or so declared primary outcomes and need to be evaluated on the basis of whether there are changes in those outcomes. Many evaluations of psychological treatment, and especially of 3P, include administering a battery of potential outcome variables and leave to after the outcomes of being analyzed to choose which one to report. That is why there is such a push to preregister protocols for trials and commit investigators to one or two outcomes. You can click on the image below depicting child outcomes and 3P to enlarge it. Despite many of these trials being conducted by the same investigators, note the wide variety of outcomes. Without preregistration, we do not know if all of the outcomes are reported in a particular trial or if the one that is emphasized was originally primary.

measures 3supplement.-page-0

Seven outcome categories are a lot, especially when five depend on parent self-report. So, we are dealing with multiple outcomes from the same trial, often with the same responded, often highly intercorrelated, with no independent validation as to whether these outcomes were designated primary or picked after analyses had begun from a larger array. There is a high risk of confirmatory bias.

More preferable would be the strategy adopted by Wilson and others of centering on one outcome, child adjustment in evaluating 3P on that basis. Even here, though, we run into a problem. Child outcomes can be social emotional or behavioral and we do not know which was the primary.

You may not be bothered by all of this, but consider the possibilities of picking the outcomes that make 3P look the best and using that as a basis for committing millions of dollars to funding it, when it may not be effective.

Art Garfunkel’s Mr Shuck ‘N Jive

Art Garfunkel’s Mr Shuck ‘N Jive

Results: shuckin and jivin’

One of first things you should do when you get the results of a meta-analysis is checked the analysis of heterogeneity to determine whether this group of studies can be integrated to produce a valid summary effect size. The authors report two measures, I2 and Q, both of which indicates considerable heterogeneity, particularly for the very important child outcomes. This should be like a warning light flashing on your dashboard: you should pull over and see what is wrong. All that the authors do is read to these measures for the individual levels 1 – 5 of the intervention. Because a number of these levels have only a few trials, measures of heterogeneity look better but are useless because of low power. The problem is considered solved, except that the authors have missed an important opportunity to throw the hood up on their meta-analysis and see what is going wrong.

The authors also conduct moderator analyses and find that study design (nonrandom versus random), investigator involvement, and having a small number of participants all strongly contribute to more positive findings. Again, the author should have been greatly troubled by this. Instead, they do two things to get rid of this potential embarrassment. First, they emphasize not on whether individual moderator variables have a significant impact, but whether any impact survives inclusion in a multiple regression equation with the other moderator variables. Second, they emphasize that even when they look at conditions when one of these moderator variables puts 3 PM the disadvantage, effects are still significant.

These tactics suggest they are really interested making the case that 3P is effective, not examining all the relevant data and potentially rethinking some of their decisions that artificially improve the appearance of 3P being effective.

The authors separate out what they identify as the three large-scale population trials of 3P. These are extremely expensive trials that adopt a public health approach to reducing child behavior problems. All three were done by the authors and all appraised in glowing terms.

The problem is that we have an independent check on the accuracy of what is reported in the one American trial. As I have detailed elsewhere, an earlier paper was published while the trial was ongoing gives specifics of its design, including primary outcomes. It does not agree with what was reported in the article providing data for the meta analysis.

Prinz, R. J., Sanders, M. R., Shapiro, C. J., Whitaker, D. J., & Lutzker, J. R. (2009). Population-based prevention of child maltreatment: The US Triple P system population trial. Prevention Science, 10(1), 1-12.

Then there is the problem of the two large unreported trials, both of which have been interpreted as being negative. One of them is published and you can find it here.

Little, M., Berry, V., Morpeth, L., Blower, S., Axford, N., Taylor, R., … & Tobin, K. (2012). The impact of three evidence-based programmes delivered in public systems in Birmingham, UK. International Journal of Conflict and Violence, 6(2), 260-272.

While the other is unpublished. It was conducted by investigators who have published other trials and it could be readily identified by either contacting them or from the considerable press coverage that it received. [Update: One of the authors is a member of International Scientific Advisory Committee for Triple P Parenting]

Schönenberger, M., Schmid, H., Fäh, B., Bodenmann, G., Lattmann, U. P., Cina, A., et al. (2006). Projektbericht “Eltern und Schule stärken Kinder” (ESSKI); Ein Projekt zur Förderung der Gesundheit bei Lehrpersonen, Kindern und Eltern und zur Prävention von Stress, Aggression und Sucht – Ergebnisse eines mehrdimensionalen Forschungs- und Entwicklungsprojekts im Bereich psychosoziale Gesundheit in Schule und Elternhaus

Both the media release and the research report can be downloaded from the website of the ESSKI project ( Subsequent analyses have cast doubt [Updated] about whether there were positive findings and it has remained unpublished.

Taken together, we have

  • Serious doubts about the validity of reports of the only US study.
  • No way of independently checking the validity of the two other studies conducted by the authors of the meta-analysis.
  • The unexplained absence of two negative trials.

Update 6/2/2104 7:51 am The authors bolster the case for the strength of their findings by bringing in the so-called failsafe N to argue hundreds of unpublished studies would have to be sitting out there in desk drawers to reverse their conclusions.

Orwin’s failsafe N was as follows for each outcome: child SEB outcomes = 246, parenting practices = 332, parenting satisfaction and efficacy = 285, parental adjustment = 174, parental relationship = 79, child observations = 76. It is highly unlikely that such large numbers of studies with null results exist, indicating the robustness of the findings to publication bias. For parent observations, Orwin’s failsafe N could not be computed as the overall effect size was below 0.10, the smallest meaningful effect size.

Bringing in failsafe N in an evaluation of an effect size  is still a common tactic in psychology, but the practice is widely condemned and Cochrane Collaboration specifically recommends against it as producing unreliable and invalid results.

I have discussed this in other blog posts, but let me point to some key objections. The first and most devastating is that their analyses have not provided an estimate of effect sizes in well done, adequately sized trials. Rather, they produced a biased estimate based on some poor quality studies that should not have been included, including nonrandomized studies. Second, they have not dispensed with the heterogeneity that they found in the published studies, and so cannot generalize to whatever studies remain unpublished. Third, they assume that the on published studies are only null findings, where as some of them could have actually demonstrated that 3P as a negative effect, particularly in comparison to active control groups, which were suppressed in this meta-analysis.

I am confident that clinical epidemiologists and those doing meta-analyses with biomedical data would reject out of hand this argument for  the strengthen of 3P effects from failsafe N .


Despite all these objections, the article ends with a glowing conclusion, suitable for dissemination to funding agencies and posting on the websites promoting 3P.

The evolution of a blended system of parenting support involving both universal and targeted elements has been built on a solid foundation of ongoing research and development, and the testing of individual components comprising the intervention. The present findings highlight the value of an integrated multilevel system of evidence-based parenting programs and raise the real prospect that a substantially greater number of children and parents can grow up in nurturing family environments that promote children’s development capabilities throughout their lives.

I suggest that you now go back to the abstract and reevaluate whether you accept what it says.

The problems with this meta-analysis are serious but reflect larger problems in the evaluation of psychological treatments

I have identified serious problems with this meta-analysis, but we need to keep in mind that it got published in a respectable journal. That could only have happened if some reviewers let it through. Our finding it in the literature points to some more pervasive problems in the research evaluating psychological treatments. Just as this meta-analysis does not provide valid estimates of the effectiveness of 3P, other meta-analyses are done by persons with conflicts of interest, and many of the studies providing evidence entered into meta-analyses are conducted by persons with conflicts of interest and have selective reporting of data.

Journals have policies requiring disclosures of conflict of interest under the circumstances, but at least in the psychological literature, you can find only few examples that these policies are enforced.

The authors registered their protocol for conducting this meta-analysis, but then did not adhere to the protocol in important ways. Obviously, reviewers did not pay any attention. It is actually uncommon for meta-analyses of psychological treatments to be preregistered, but any benefit to the recommendation or even requirement for this is lost if reviewers do not pay attention to whether authors delivered to what they promised.

A considerable proportion of the clinical trials included in the meta-analysis either had one of authors involved or another person with financial interests at stake. I did not do a thorough check, but in reviewing this literature I found no examples of conflicts of interest being declared. Again, journals have policies requiring disclosures, but these policies are worthless without enforcement.

Funding agencies increasingly require pre-registration of the protocols for clinical trials before the first patient is enrolled, including their key hypothesis and primary outcomes. Yet, other than a few like PLOS One, most journals do not require that protocols be available or alert reviewers to the need to consult protocols. Authors are free to assess many candidates for what will be decided to be the primary outcome after examining their data. The bottom line is that we cannot determine that there is selective reporting of outcomes with a strong confirmatory bias. Meta-analyses of the existing literature concerning psychological treatments may offer exaggerated estimates of their efficacy because they integrate selectively reported outcomes.

The authors of this meta-analysis need to be called out on their failure to declare conflicts of interest and the claims that they make for the efficacy of 3P should be dismissed, particularly when they are at odds with other assessments not done by people with financial interests at stake. However, we should take this blatant failure in peer review in editorial oversight as an opportunity to demand reform. Until those reforms are achieved, some of which involve simply enforcing existing rules, the literature evaluating psychological treatments is suspect, and especially meta-analyses.

Category: Conflict of interest, mental health care, meta analysis, parenting, Publication bias | Tagged , , | 7 Comments

Patients’ perspectives on schizophrenia: A Beautiful Mind, the movie


John Forbes Nash, Jr. photographed by Peter Badge

An old movie by now, Ron Howard’s 2001 highly awarded “A Beautiful Mind” graciously stands the test of time when it comes to great movies about severe mental illness illustrating the patient’s perspective.

This biographical movie shows the life story of John Forbes Nash, Jr., a Nobel Memorial Prize Laureate in Economics.

Nash, a brilliant mathematician, was diagnosed with schizophrenia in his early thirties. The movie follows Nash from his starting at Princeton as a young graduate student, to his late sixties, ending with Nash leaving the auditorium in Stockholm after accepting the Nobel Prize.

Directed by Ron Howard and starting Russell Crowe as John Nash, winner of four Oscars and 4 Golden Globes, A Beautiful Mind is in a category of its own.

There are few movies that manage to win universal applause while also providing an accurate, non-romanticized record of the mental angst and despair that are part of severe mental illness.

A Beautiful Mind manages to do all the above in an elegantly and deeply-touching manner. Furthermore, the movie offers a unique, insider view on what schizophrenia feels like from the patient’s perspective. The movie not only illustrate what hallucinations are but also the quid pro quo relationship between hallucinated perceptions and a delusional interpretation of reality. The world as seen through Nash’s eyes in the mists of his psychosis is a terrifyingly confusing and threatening place. Things are connected in ways that are meant to be secret.  Nash believes that he is both given permission and has the unique ability to see these connections. The expected secrecy is a good explanation why no one else can see what he sees.

The Reality of Illusion

In a sense, we all fabricate our realities. The way we know that our brand of “reality” is indeed real is by confronting it with the “realities” of other people out there. Family, friends, co-workers, the neighbor next door…  Do our realities match? Bingo. Chance is that we are seeing it as it is; that our subjective reality matches the objective reality, whatever that is.

Nash is an able builder of his own reality. He lives in a complex word, full of abstract meaning and symbols, which may be invisible to the untrained eye. But lack of visibility does not equal nonexistence. That is his first challenge.

Further, it just so happens that part of Nash’s reality is genuinely secret. Thus, Nash does not the benefit or those checks and balances that most of us have; he misses out on the opportunity to confront and match his subjective perceptions to re-calibrate and fine tune his reality check. This is his second challenge.

Add to that grave disappointment rooted in tremendous expectations fertilizing a mind which strength is in moving fluidly between concepts that do not seem to be even remotely related. A mind who remains organized in the most disorganized way, so to speak.

And that is when John Nash’s tenuous reality breaks down into paranoia and hallucinations.

Of note, while auditory hallucinations are frequent, the visual hallucinations that carry on a good part of the movie are in fact rare in schizophrenia. In Nash’s own words:

Something like [visual hallucinations] this may appear in the movie [A Beautiful Mind], but of course the delusion pattern in the movie is going to be somewhat different, because the person sees things that aren’t there. It’s not just ideas or the idea of a plot or something, he thinks he sees certain persons. I never saw anything.

As the story unfolds there is no surprise to see that Nash only finds further confirmation of his paranoid fears in the fact “they” are successful in locking him up.

The plot thickens in a gut sickening way. Viewers themselves are bewildered and traumatized by the too many overlapping and self-concealing layers of whatever reality is. Being inside Nash’s mind is no easy treat, even when that actually happens from the comfort and safety of the spectator’s seat.

What is it “really” that a good treatment for schizophrenia does? 

We further understand why many psychiatric patients make the decision to not take their medications. Following insulin-shock therapy and a full course of anti-psychotic medications Nash’s beautiful mind seems shattered to pieces. It is like an extravagantly colored, unreal Flamingo – Bird of Paradise combination of a majestic bird, who once used to fly close to the sun, got his long wings suddenly clipped. The bird once soaring aimlessly in the deeps of the sky is now orderly but also mindlessly dragging his feet in the mud.

It is important to note that this rather dire perspective of the effects of treatment is in fact not supported by John Nash’s own recollection of that period.

And it did happen that when I had been long enough hospitalized that I would finally renounce my delusional hypotheses and revert to thinking of myself as a human of more conventional circumstances and return to mathematical research. In these interludes of, as it were, enforced rationality, I did succeed in doing some respectable mathematical research.

The movie bluntly documents the devastating effects of schizophrenia on Nash’s personal and professional live. The struggle to take/not to take medications, the challenge of facing one’s demons without medication support, with the predictable yet still heart-wrenching return of the nightmarish hallucinations, the ambiguous relationship with the hallucinated reality itself, despite repeated reality “infusions” provided by family, friends and doctors – we see all of it in its terrible glory. And we get to understand.

The End?

A Beautiful Mind closes with an old Nash who, after painstakingly realigning the pieces of his life’s puzzle, is honored by his Princeton fellow professors. The movie closes with Nash seeing three of his old hallucinated friends/tormentors as he leaves the auditorium in Stockholm after receiving his Nobel prize.

After being treated early on with medications John Nash’s schizophrenia course is one of partial recovery.  Nash eventually chooses to stay off medications – this decision leads to”25 years of partially deluded thinking” in Nash’s own words.

Nash does attain a stable state off medication, but it appears that his symptoms do not completely go away – instead they decrease in intensity and go into partial remission. This illustrates the fact that the prognosis of schizophrenia is not universally dire and stability without medications can be achieved in specific cases. The movies also makes the counterpoint that Nash lost many years of productive life to his untreated psychosis before he reached a final state of relative stability.

Further reading:

1. “John F. Nash Jr. – Biographical”. Nobel Media AB 2013. Web. 30 May 2014

Note: For an educational question and answer version of this post you can go here.

Category: antipsychotics, schizophrenia, Uncategorized | Tagged , , , , | Leave a comment

Using “Nashville” to Demonstrate The Truth about Flashbacks

I think it would be fair to say that most mental health professionals groan when watching how mental illness is represented on T.V shows. Too often popular culture portrays the lives of individuals living with mental illness (or the symptoms of psychiatric disorders) in a one dimensional way that lacks nuance or, worse, is outright misleading and only serves to perpetuate the many myths and misperceptions about psychiatric illness that already exist in our society.


Image Credit: ABC

Image Credit: ABC

Last month, whilst watching episode 19 of Season 2 of the hit TV show Nashville, I could not help but be pleasantly surprised at the show’s careful depiction of the psychiatric symptom known as a “flashback.” In this episode the musical prodigy, Scarlett O’Connor, (played by the Australian actress Claire Bowen) experiences a flashback of childhood trauma whilst performing live on stage in front of tens of thousands of people.


Whilst the term “flashback” is used loosely in everyday culture to describe casual recollections of memories from earlier on in life, mental health professionals have a different definition of the word flashback. When we use the word flashback, we use it to describe a phenomenon where an individual experiences involuntary recurrent memories.  The experience is often sudden and usually a powerful re-experiencing of a past experience or elements of a past experience.  The term is used particularly when the memory is so intense that the person “relives” the experience, and this reliving contributes to a serious disruption in the person’s life.

From my perspective, as a Posttraumatic Disorder (PTSD) expert, I find this psychiatric symptom particularly fascinating as it is commonly associated with the traumatic experiences associated with PTSD.


Definition of flashback


2012-04-05-ptsd1In the DSM V, references to flashbacks are to be found under the heading that describes PTSD, i.e. it is a symptom typically associated with PTSD.  A flashback is an example of a dissociative reaction – i.e. when the individual feels or acts as if the traumatic events were recurring. Such reactions occur on a continuum, with the most extreme expression being a complete loss of awareness of one’s present surroundings

A related quote from DSM V: (please note, this text refers to the definition of flashback as it pertains to adults)


“The individual may experience dissociative states that last from a few seconds to several hours or even days, during which components of the event are relived and the individual behaves as if the event were occurring at that moment. Such events occur on a continuum from brief visual or other sensory intrusions about part of the traumatic event without loss of reality orientation, to complete loss of awareness of present surroundings.  These episodes, often referred to as “flashbacks” are typically brief but can be associated with prolonged distress and heightened arousal”


There are surprisingly few empirical publications on flashbacks and even fewer articles on the phenomenology of flashbacks. Flashbacks are a defining feature of posttraumatic stress disorder (PTSD), but there have been few studies of their neural basis. But the study of flashbacks is becoming important, as they are given an increasingly prominent role in the diagnosis of posttraumatic stress disorder .

Indeed, the precise definition or clinical nature of a flashback remains a matter of debate, even amongst neuroscientists and mental health professionals, and this was part of the reason I was so impressed with the depiction in Nashville.


What does it feel like to have a flashback?


If you were experiencing a flashback of a traumatic experience it could be so realistic that it feels as though you are living through the experience all over again. You experience the flashback in your mind, but may also feel the emotions and physical sensations – fear, sweating, smells, sounds, and pain – associated with the original trauma.

Flashbacks are often triggered by some kind of reminder of the original trauma; it can be something as simple as a sensory experience associated with the original trauma e.g. the scent of a particular perfume, the feel of raindrops on a wet day, or a sudden loud street noise.


I have witnessed patients who are having a flashback of psychologically traumatic memories and the depiction in Nashville was pretty authentic.


The Nashville episode was classic not only for the scenes that depicted the actual flashback, as experienced by Scarlett O’Connor, but the narrative leading up to the terrible event was equally compelling and authentic.  The weeks and months leading up to the “meltdown” on stage shows a Scarlett who is working hard on a serious music album; she writes about her complicated, and often tortured, relationship with her mother.  Whilst her creative output is good, the process has been stressful and she has isolated herself from her closest friends and her maternal uncle—the key emotional supports in her life.  She starts to take illegally diverted prescription stimulants to help her stay awake so she can finish the album, this combined with the stresses and strains of an intensive tour schedule start to take its toll.

Photo by ABCNetwork/YouTube (screen capture)

Photo by ABCNetwork/YouTube (screen capture)

The “trigger” for Scarlett is the surprise arrival of her mother when she is on tour in San Francisco. Her mother, Beverly, is the main perpertrator of her childhood abuse.  Initially Beverly is civil but we soon start to see the side of her which is emotionally abusive and physically intimidating toward Scarlett. Scarlett starts to experience brief flashbacks of childhood physical abuse and neglect where she was locked in cupboards for hours at a time with no food, water, or access to a bathroom. Startled by the flashbacks and under pressure to perform on stage she starts to consume alcohol to “deal” with the flashbacks.


This is common amongst individuals experiencing symptoms of posttraumatic stress – they “self medicate” their symptoms with alcohol or illicit drugs.  Unfortunately, intoxication often contributes to an overall worsening of symptoms, and this is what happens to Scarlett.


The weeks of stress, lack of sleep, and abuse of prescription stimulants combined with the pressure of performing on stage and a grueling tour schedule make her susceptible to experiencing mental health distress. The arrival of her mother in “real life,” who is the perpetrator of the childhood abuse, combined with Scarlett’s alcohol intoxication triggers a horrifically intense flashback which, unfortunately, occurs whilst she is on stage.

The episode is particularly valuable as it shows the experience of a flashback from Scarlett’s perspective. She is no longer the country music star performing on stage for a live audience, she is a seven year old girl, locked in a closet, terrified for her life as she listens to her mother full of rage, ranting and raving outside the door.  As a result Scarlett behaves that way, retreating from light and noise and eventually curling up under her grand piano.

There are problems with the storyline – the episode is tactlessly titled, “Crazy” and Scarlett’s brief “hospitalization” where she is admitted to “get everything out of her system” returns the storytelling to the familiar levels of inaccurate and overly simplistic portrayals of mental illness that I am used to groaning and moaning about.  Nonetheless, when it comes to mental health issues in TV shows, Scarlett’s performance scene was a refreshingly truthful depiction of a flashback, and one that I had nothing to groan about.

Category: Commentary, Psychiatry, PTSD, Uncategorized | Tagged , , , | Leave a comment

Are meta-analyses done by promoters of psychological treatments as tainted as those done by Pharma?

We would not waste time with a meta-analysis from Pfizer claiming the superiority of its antidepressant. Particularly when it’s a meta analysis of trials mostly done by Pfizer. Bah, just another advertisement. What, the review was published? And without a declaration of conflict of interest? We would should be outraged and doubt the integrity of the review process.

If we were not distrustful already, Ben Goldacre’s Bad Pharma taught us to distrust a drug company’s evaluation of its own product. Most randomized drug trials are industry funded. In these trials, the sponsor’s drug almost always triumphs. Overall, industry funded placebo-controlled trials of psychotropic medication are five times more likely to get positive results than those free of such funding.

My colleagues and I have successfully pushed for formalizing what was previously informal and inconsistent: in conducting a meta-analysis, source of funding for an RCT should routinely be noted in the evaluation using the Cochrane Collaboration  risk of bias criteria. Unless this risk of bias is flagged, authors of meta-analyses are themselves at risk for unknowingly laundering studies tainted by conflict of interest and coming up with seemingly squeaky clean effect sizes for the products of industry.

Of course, those effect sizes will be smaller if industry funded trials are excluded. And maybe the meta analysis would have come to a verdict of “insufficient evidence” if they are excluded.

My  colleagues and I then took aim at the Cochrane Collaboration itself. We pointed out that this had been only inconsistently done in past Cochrane reviews. Shame on them.

They were impressed and set about fixing things and then gave us the Bill Silverman Award. Apparently the Cochrane Collaboration is exceptionally big on someone pointing out when they are wrong and so they reserve a special award for who does it best in any given year.

Bill Silverman was a founding member of the Cochrane collaboration and pointed out thatcertified pain in the ass lots of people were making supposedly evidence-based statements that were wrong. That is why some sort of effort like Cochrane collaboration was needed. Silverman was a certified troublemaker. Our getting the award certifies us as having made trouble. I am taking that as a license to make some more.

Meta-analyses everywhere, but not a critical thought…

What is accepted as necessary for drug trials is routinely ignored for trials of psychotherapy treatments and meta-analyses integrating their results. Investigator allegiance has been identified as one of the strongest predictors of outcomes, regardless of the treatment that is being evaluated. But this does not get translated into enforcement of disclosures of conflict of interests by authors or by readers having heightened skepticism.

Yet, we routinely accept claims of superiority of psychological treatments made by those who profit from such advertisements. Meta-analyses allow them to make even bigger claims than single trials. And journals accept such claims and pass them on without conflict of interest statements. We seldom see any protesting letters to the editor. Must we conclude that no one is bothered enough to write?

Meta-analyses of psychological treatments with undisclosed conflicts of interest are endemic. We already know investigator allegiance is a better predictor of the outcome of the trial than whatever is being tested. But this embarrassment is explained away in terms of investigator’s enthusiasm for their treatment. More likely, results are spurious or inflated or spun.  There is high risk of bias associated with investigators having a dog in the fight. And a great potential for throwing the match with flexible rules of data selection, analysis, interpretation (DS, A, and I). And then there is hypothesizing after results are known (HARKING).

It is scandalous enough that the investigators can promote their psychological products by doing their own clinical trials, but they can go further, they can do meta-analysis. After all, everybody knows that a meta-analysis is a higher form of evidence, a stronger claim, than an individual clinical trial.

Because meta-analyses are considered the highest form of evidence for interventions, they provided an important opportunity for promoters to brand their treatments as evidence-supported. Such a branding is potentially worth millions in terms of winning contracts from governments for dissemination and implementation, consulting, training, and sale of materials associated with the treatment. Readers need to be informed of potential conflict of interest of the authors of meta-analyses in order to make independent evaluations of claims.

And the requirement of disclosure ought to apply to reviewers who act as gatekeepers for what gets into the journals. Haven’t thought of that? I think you soon will be. Read on.

This will be the first of two posts concerning the unreliability of a meta-analysis done by authors benefiting from the perception of their psychological treatment, Triple P Parenting Programs (3P) as “evidence-supported.” By the end of the second post, you will see how much work it takes to determine just how unreliable the meta-analysis is and how inflated an estimate it provides of the efficacy and effectiveness of 3P. Maybe you will learn to look for a few clues in examining other meta-analyses. But at least you will have learned to be skeptical.

nothing_to_declare1No conflict of interest statement was included with the article. Nothing_to_DeclareFurthermore, the bulk of the participants included were from studies in which investigators got financial benefits from a positive outcome, including very often one or more authors of this particular meta-analysis.

Nothing said in the article, but an elaborate statement was made in the preregistration of the meta-analyses at International Prospective Register of Systematic Reviews (PROSPERO) CRD42012003402 [Click on statement to enlarge.]

conflict of interest

Many journals, like PLOS One routinely remind editors and reviewers to consult preregistration and be alert to any discrepancies between the details of the preregistration and the published article, which are common and need to be clarified by the authors. Apparently this was not done.

goods to declare_redIt is not that Clinical Psychology Review lacks a policy nothing to declareconcerning conflict of interest.

All authors are requested to disclose any actual or potential conflict of interest including any financial, personal or other relationships with other people or organizations within three years of beginning the submitted work that could inappropriately influence, or be perceived to influence, their work. See also Further information and an example of a Conflict of Interest form can be found at:

Interlude: What we learn from the example of 3P, we can apply elsewhere

In the early days of the promotion of acceptance and commitment therapy (ACT), small-scale studies produced positive results at a statistically improbable rate, because of DS, A, and I and HARKING. Promoters of ACT therapy then conducted a meta-analysis that was cited in claims made to Time Magazine that ACT was superior to established treatments. Yet, in the short time since I blogged about this, it has become apparent that ACT is not superior to other credible, structured therapies. Promoters’ praise of ACT continues to depend disproportionately on small, methodologically inadequate studies that very often they themselves conducted.

But promoters of ACT really do not need to worry. They are off doing workshops and selling merchandise. In the workshops, they often show dramatic interventions like crying with patients that are not demonstrated to be integral to any efficacy of ACT but that are sure crowdpleasers.

And their anecdotes and testimonials often involve applications of ACT that are off label, i.e., not supported by studies of similar patients with similar complaints. They have gotten the point that workshops are not primarily about “how to” and “when to” but about drama and entertainment. They have learned from Dr. Phil.

But no one’s going to revoke their uber alles evidence-supported branding.

In a future post, I will show the same phenomenon is beginning to be seen with mindfulness therapy. There was already an opening shot at this in my secondary blog.

I am sure there are other examples. What is different about 3P is the literature is huge and the issue of conflict of interest has now been so squarely placed on the table.

In this and the next post, I will show meta-analysis as self-promotion on a grand scale. And I will begin my analysis with a juicy tale of misbehavior by editors, reviewers, and authors.

I hope the story will anger those interested in the integrity of the literature concerning psychological treatments, be they promoters, policymakers, taxpayers footing the bill for treatment, or consumers who are often receiving treatment because it is mandated.

Why you should not be reading meta-analyses…unless …

Analyses of altmetrics suggest when most people find an interesting study with an internet search, most get only as far as abstracts, not downloading and reading articles. But chances are that they form opinions about those articles based solely on the reading of the abstract.

In the case of meta-analyses, forming independent opinions can take multiple reads and application of critical skills guided by healthy dose of skepticism. And maybe going back to the original studies.

If you do not have the time, the skills, and the skepticism, you should not be reading meta-analyses. And you should not be accepting what you find in abstracts.

Should you stop reading meta-analyses?

Suspend judgment. Let me lay out for you a critical analysis and see if you would be willing to undertake it yourself. Or simply walk away, asking “WTF, why bother?”


Triple P Parenting is described by its developers

The Triple P – Positive Parenting Program is one of the most effective evidence-based parenting programs in the world, backed up by more than 30 years of ongoing research. Triple P gives parents simple and practical strategies to help them confidently manage their children’s behaviour, prevent problems developing and build strong, healthy relationships. Triple P is currently used in 25 countries and has been shown to work across cultures, socio-economic groups and in all kinds of family structures.

They proudly proclaim its branding as evidence supported–

No other parenting program in the world has an evidence base as extensive as that of Triple P. It is number one on the United Nations’ ranking of parenting programs, based on the extent of its evidence base.

An ugly and incredible story

Along came Phil Wilson M.D., PhD. He tried to express skepticism but he got slapped down.

He sent off manuscript describing a meta-analysis to Clinical Psychology Review. The manuscript noted how little evidence that was for the efficacy of 3P that was not tainted by investigator conflict of interest.

It is reasonable to presume that someone associated with the promoters of 3P was invited to review his manuscript.

  • The author’s employer was contacted and informed that Wilson had written a manuscript critical of 3P.
  • The manuscript was savaged.
  • Promoters of 3P sent Wilson some studies that had been published after the period covered by his meta-analysis.
  • The journal refused Wilson’s request for ascertainment of whether reviewers had disclosed conflicts of interest.

Wilson filed a formal complaint with the Committee on Publication Ethics (COPE). You can read it here and the letter from COPE to Clinical Psychology Review here.

Wilson’s rejected manuscript was nonetheless published in BMC Medicine. I praised it in a blog post, but noted that it had been insufficiently tough on the quality and quantity of the studies cited in widely touted branding of 3P has evidence supported. With Linda Kwakkenbos, I then wrote the blog post as an article that appeared in BMC Medicine.

Wilson’s meta-analysis

The meta-analysis is well reasoned and carefully conducted, but scathing in its conclusion:

In volunteer populations over the short term, mothers generally report that Triple P group interventions are better than no intervention, but there is concern about these results given the high risk of bias, poor reporting and potential conflicts of interest. We found no convincing evidence that Triple P interventions work across the whole population or that any benefits are long-term. Given the substantial cost implications, commissioners should apply to parenting programs the standards used in assessing pharmaceutical interventions.

My re-evaluation

My title says it all: Triple P-Positive Parenting programs: the folly of basing social policy on underpowered flawed studies.

My abstract

Wilson et al. provided a valuable systematic and meta-analytic review of the Triple P-Positive Parenting program in which they identified substantial problems in the quality of available evidence. Their review largely escaped unscathed after Sanders et al.’s critical commentary. However, both of these sources overlook the most serious problem with the Triple P literature, namely, the over-reliance on positive but substantially underpowered trials. Such trials are particularly susceptible to risks of bias and investigator manipulation of apparent results. We offer a justification for the criterion of no fewer than 35 participants in either the intervention or control group. Applying this criterion, 19 of the 23 trials identified by Wilson et al. were eliminated. A number of these trials were so small that it would be statistically improbable that they would detect an effect even if it were present. We argued that clinicians and policymakers implementing Triple P programs incorporate evaluations to ensure that goals are being met and resources are not being squandered.

You can read the open access article, but here is the crux of my critique

Many of the trials evaluating Triple P were quite small, with eight trials having less than 20 participants (9 to 18) in the smallest group. This is grossly inadequate to achieve the benefits of randomization and such trials are extremely vulnerable to reclassification or loss to follow-up or missing data from one or two participants. Moreover, we are given no indication how the investigators settled on an intervention or control group this small. Certainly it could not have been decided on the basis of an a priori power analysis, raising concerns of data snooping [14] having occurred. The consistently positive findings reported in the abstracts of such small studies raise further suspicions that investigators have manipulated results by hypothesizing after the results are known (harking) [15], cherry-picking and other inappropriate strategies for handling and reporting data [16]. Such small trials are statistically quite unlikely to detect even a moderate-sized effect, and that so many nonetheless get significant findings attests to a publication bias or obligatory replication [17] being enforced at some points in the publication process.

In response, promoters of 3P circulated on the Internet a manuscript that was labeled as being under review at Monographs of the Society for the Study of Child Development. The  manuscript is here. You can see that it explicitly cited Wilson’s meta analysis and my commentary.  But our main points are not recognizable.

Captured from Google Scholar

Click to enlarge


Many journals, including those of the American Psychological Association expressly forbid  circulating a manuscript with a label of it being under review at that particular journal. Perhaps that contributed to the rejection of the manuscript. But it was resubmitted to Clinical Psychology Review and published – without any conflict of interest statement.

Here is the abstract to the abortive Monographs of the Society submission.

This systematic review and meta-analysis examined the effects of the multilevel Triple P-Positive Parenting Program system on a broad range of child, parent and family outcomes. Multiple search strategies identified 116 eligible studies conducted over a 33-year period, with 101 studies comprising 16,099 families analyzed quantitatively. Effect sizes for controlled and uncontrolled studies were combined using a random effects model for seven different outcomes. Moderator analyses were conducted using structural equation modeling. Risk of bias within and across studies was assessed. Significant short-term effects were found for: children’s social, emotional and behavioral outcomes (d = 0.473); parenting practices (d = 0.578); parenting satisfaction and efficacy (d = 0.519), parental adjustment (d = 0.340); parental relationship (d = 0.225) and child observational data (d = 0.501). Significant effects were found for all outcomes at long-term including parent observational data (d = 0.249). Separate analyses on available father data found significant effects on most outcomes. Moderator analyses found that study approach, study power, Triple P level, and severity of initial child problems produced significant effects in multiple moderator models when controlling for other significant moderators. Several putative moderators did not have significant effects after controlling for other significant moderators, including country, child developmental disability, child age, design, methodological quality, attrition, length of follow-up, publication status, and developer involvement. The positive results for each level of the Triple P system provided empirical support for a blending of universal and targeted parenting interventions to promote child, parent and family wellbeing.

It is instructive to compare the above abstract to what was subsequently published in Clinical Psychology Review. Presumably, they should be some differences arising from the manuscript having been peer-reviewed. The original manuscripts submitted as a monograph, and so it is length had to be cut to conform to the limits of Clinical Psychology Review. But if you compare the two, some obvious flaws in the original manuscript were retained in the published version. Obviously, peer review was deficient in not noticing them and otherwise leaving little mark on the final, published version.

Unfortunately, the subsequent Clinical Psychology Review article is behind a pay wall. But you can write to the senior author at and request a PDF. But here is the abstract:

This systematic review and meta-analysis examined the effects of the multilevel Triple P-Positive Parenting Program system on a broad range of child, parent and family outcomes. Multiple search strategies identified 116 eligible studies conducted over a 33-year period, with 101 studies comprising 16,099 families analyzed quantitatively. Moderator analyses were conducted using structural equation modeling. Risk of bias within and across studies was assessed. Significant short-term effects were found for: children’s social, emotional and behavioral outcomes (d = 0.473); parenting practices (d =0.578); parenting satisfaction and efficacy (d = 0.519), parental adjustment (d = 0.340); parental relationship (d = 0.225) and child observational data (d = 0.501). Significant effects were found for all outcomes at long-term including parent observational data (d = 0.249). Moderator analyses found that study approach, study power, Triple P level, and severity of initial child problems produced significant effects in multiple moderator models when controlling for other significant moderators. Several putative moderators did not have significant effects after controlling for other significant moderators. The positive results for each level of the Triple P system provide empirical support for a blending of universal and targeted parenting interventions to promote child, parent and family wellbeing.

Impressive, hey? Certainly the effect sizes are and they contrast sharply with Wilson. If you had not been sensitized by my blog post, would you be inclined to simply accept the conclusion that is conveyed that the meta-analysis produces resounding support for 3P?

You would never know it from the abstract, but effect sizes from nonrandomized studies were combined with effect sizes from RCTs. But RCTs with head comparisons bs.etween 3P and other active treatments had the comparison groups dropped, so that the studies were considered not to be RCTs.

I do hope that you form your own opinion by obtaining a copy of the full article. You can also compare it to the PROSPERO registration.

Within 10 days of today, May 20, 2014,I will be posting my critique and we can compare notes.

A much briefer version of this blog post was previewed at my secondary blog. It was narrowly focused on the scandalous events concerning Wilson’s manuscript and not the larger context of how we view meta-analyses conducted by authors who have vested financial interest.

Category: Uncategorized | Tagged , , , | 8 Comments