“Strong evidence” for a treatment evaporates with a closer look: Many psychotherapies are similarly vulnerable.

Note: BMC Medicine subsequently invited a submission based on this blog post.

Coyne, J. C., & Kwakkenbos, L. (2013). Triple P-Positive Parenting programs: the folly of basing social policy on underpowered flawed studies. BMC Medicine, 11(1), 11.

It is now available here:

Promoters of Triple P parenting enjoy opportunities that developers and marketers of other “evidence-supported” psychosocial interventions and psychotherapies only dream of. With a previously uncontested designation as strongly supported by evidence, Triple P is being rolled out by municipalities, governmental agencies, charities, and community-based programs worldwide. These efforts generate lots of cash from royalties and license fees, training, workshops, and training materials, in addition to the prestige of being able to claim that an intervention has navigated the treacherous path from RCT to implementation in the community.

With hundreds of articles extolling its virtues, dozens of randomized trials, and consistently positive systematic reviews, the status of the Triple P parenting intervention as evidence supported would seem beyond being unsettled by yet another review. Some of the RCTs are quite small, but there are public health level interventions, including one involving 7000 children from child protective services. Could this be an instance in which it should be declared “no further research necessary”? Granting agencies have decided not to fund further evaluation of interventions on the basis of a much smaller volume of seemingly less unanimous data.

But the weaknesses revealed in a recent systematic review and meta-analysis of the Triple P by Philip Wilson and his Scottish colleagues show how apparently strong evidence can evaporate when it is given a closer look. Other apparently secure “evidence supported” treatments undoubtedly share these weaknesses and the review provides a model of where to look. But when I took careful look, I discovered that Wilson and colleagues glossed over a very important weakness in the body of evidence for Triple P. They noted it, but didn’t dwell on it. So, weakness in the body of evidence for Triple P is much greater than a reader might conclude from Wilson and colleagues’ review.

 WARNING! Spoiler Ahead. At this point, readers might want to download the article and form their own impressions, before reading on and discovering what I found. If so, they can click on this link and access the freely available, open access article.

Wikipedia describes Triple P as

a multilevel parenting intervention with the main goal of increasing the knowledge, skills, and confidence of parents at the population level and, as a result, reduce the prevalence of mental health, emotional, and behavioral problems in children and adolescents. The program is a universal preventive intervention (all members of the given population participate) with selective interventions specifically tailored for at risk children and parents.

A Triple P website for parents advertises

the international award winning Triple P – Positive Parenting Program®, backed by over 25 years of clinically proven, world wide research, has the answers to your parenting questions and needs. How do we know? Because we’ve listened to and worked with thousands of parents and professionals across the world. We have the knowledge and evidence to prove that Triple P works for many different families, in many different circumstances, with many different problems, in many different places!

The Triple P website for practitioners declares

As an individual practitioner or a practitioner working within an organisation you need to be sure that the programs you implement, the consultations you provide, the courses you undertake and the resources you buy actually work.

Triple P is one of the only evidence-based parenting programs available worldwide, founded on over 30 years of clinical and empirical research.

Disappearing positive evidence

In taking stock of Triple P, Wilson and colleagues applied objective criteria in a way that readily allows independent evaluation of their results.

They identified 33 eligible studies, almost all of them positive in indicating that Triple P has positive effects on child adjustment.

  • Of the 33 studies, most involving media-recruited families so that participants in the trials were self-selected and more motivated than if they are clients referred from community services or involuntarily getting treatment mandated by child protection agencies.
  • 31/ 33 studies compared Triple P interventions with waiting list or no-treatment comparison groups. This suggests that Triple P may be better than doing nothing with these self-referred families, but doesn’t control for simply providing attention, support, and feedback. The better outcomes for families getting Triple P versus getting than wait list or no treatment may reflect families assigned to these control conditions registering the disappointment with not getting what they had sought in answering the media ads.
  • In contrast, the two studies involving an active control group showed no differences between groups.
  • The trials evaluating Triple P typically administered a battery of potential outcomes, and there is no evidence for any trials that particular measures were chosen ahead of time as the primary outcomes. There was considerable inconsistency among studies using the same instruments in decisions about which subscales were reported and emphasized. Not declaring outcomes ahead of time provides a strong temptation for selective reporting of outcomes. Investigators analyze the data, decide what measures puts Triple P in the most favorable light, and declare post hoc those outcomes as primary.
  • Selective reporting of outcomes occurred in the the abstracts of these studies. Only 4/33 abstracts report any negative findings and 32/33 abstracts were judged to give a more favorable picture of the effects of Triple P.
  • Most papers only reported maternal assessments of child behavior and the small number of studies that obtained assessments from fathers did not find positive treatment effects from the father’s perspective. This may simply indicate the detachment and obliviousness of the fathers, but can also point to a bias in the reports of mothers who had made more of an investment in getting treatment.
  • Comparisons of intervention and control groups beyond the duration of the intervention were only possible in five studies. So, positive results may be short-lived.
  • Of the three trials that tested population level effects of Triple P, two were not randomized trials, but had quasi-experimental designs with significant intervention and control group differences at baseline. A third trial reported a reduction in child maltreatment, but examination of results indicate that this was due to an unexplained increased in child maltreatment in the control area, not a decrease in the intervention area.
  • Thirty-two of the 33 eligible studies were authored by Triple-P affiliated personnel, but only two had a conflict of interest statement. Not only is there strong possibility of investigator allegiance exerting an effect on the reported outcome of trials, there are undeclared conflicts of interest.

The dominance of small, underpowered for quality studies

Wilson and colleagues noted a number of times in their review that many of the trials are small, but they do not dwell on how many, how small, or with what implications. My colleagues have adopted the lower limit of 35 participants in the smallest group for inclusion of trials in meta-analyses. The rationale is that any trial that is smaller than this does not have a 50% probability of detecting a moderate sized effect, even if it is present. Small trials are subject to publication bias in that if results are not claimed to be statistically significant, they will not to get published because the trial was insufficiently powered to obtain a significant effect. On the other hand, when significant results are obtained, they are greeted with great enthusiasm precisely because the trials are so small. Small trials, when combined with flexible rules for deciding when to stop a trial (often based on a peek at the data), failure to specify primary outcomes ahead of time, and flexible rules for analyses, can usually be made to appear to yield positive findings, but that will not be replicated. Small studies are vulnerable to outliers and sampling error and randomization does not necessarily equalize group differences they can prove crucial in determining results. Combining published small trials  in a meta-analysis does not address these problems, because of publication bias and because of all or many of the trials sharing methodological problems.

What happens when we apply the exclusion criterion to Triple P trials of <35 participants in the smallest group? Looking at table 2 in Wilson and colleagues’ review, we see that 20/23 of the individual papers included in the meta-analyses are excluded. Many of the trials quite small, with eight trials having less than 20 participants (9 -18) in the smallest group. Such trials should be statistically quite unlikely to detect even a moderate sized effect, and that so many nonetheless get significant findings attests to a publication bias. Think of it: with such small cell sizes, arbitrary addition or subtraction of a single participant can alter results. Figure 2 in the review provides the forest plot of effect sizes for two of the key outcome measures reported in Triple P trials. Small trials account for the outlier strongest finding, but also the weakest finding, underscoring sampling error. Meta-analyses attempt to control for the influence of small trials by introducing weights, but this strategy fails when the bulk of the trials are small. Again examining figure 2, we see that even with the weights, small trials still add up to over 83% of the contribution to the overall effect size. Of the three trials that are not underpowered, two have nonsignificant effects entered into the meta-analysis. The confidence intervals for the one moderate size trial that is positive barely excludes zero (.06).

Wilson and colleagues pointed to serious deficiencies in the body of evidence supporting the efficacy of Triple P parenting programs, but once we exclude underpowered trials, there is little evidence left.

Are Triple P parenting programs ready for widespread dissemination and implementation?

Rollouts of the kind that Triple P is now undergoing are expensive and consume resources that will not be available for alternatives. Yet, critical examination of the available evidence suggests little basis for assuming that Triple P parenting programs will have benefits commensurate with their cost.

In contrast to the self-referring families stayed in randomized trials, the families in the community are likely to be more socially disadvantaged, often single parent, and often coming to treatment only because of pressure and even mandated attendance. Convenience samples of self-referred participants are acceptable in the early stages of evaluation of an intervention, but ultimately the most compelling evidence must come from participants more representative of the population who will be treated in the community.

Would other evidence supported interventions survive this kind of scrutiny?

Triple P parenting interventions have the apparent support of a large literature that is unmatched in size by most treatments claiming to be evidence supported. In a number of articles and blog posts, I have shown that other treatments claimed to be evidence supported often have only weak evidence. Similar to Triple P, other treatments are largely evaluated by investigators who have vested financial and professional interests in demonstrating their efficacy, in studies that are underpowered, and with a high risk of bias, notably in the failure to specify which of many outcomes that are assessed are primary. Similar to Triple P, psychotherapies routinely get labeled as having strong evidence based solely on studies that involve comparisons with no treatment or waitlist controls. Effect sizes exaggerate the advantage over these therapies over patient simply getting nonspecific, structured opportunities for attention, support, and feedback under conditions of positive expectations. And, finally, similar to what Wilson and colleagues found for Triple P, there often large gaps between the way findings are depicted in abstracts for reports of RCTs and what can be learned from the results sections of the actual articles.

In a recent blog post, I also showed that American Psychological Association Division 12 Clinical Psychology had designated Acceptance and Commitment Therapy (ACT) as having strong evidence for efficacy n hospitalized psychotic patients, only to have that designation removed when I demonstrated that the basis for this judgment was two null flawed and small trials. Was that shocking or even surprising? Stay tuned.

In coming blog posts, I will demonstrate problems with claims of other treatments being evidence-based, but hopefully this blog provides readers with tools to investigate for themselves.

Creative Commons License
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 3.0 Unported License.

This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

13 Responses to “Strong evidence” for a treatment evaporates with a closer look: Many psychotherapies are similarly vulnerable.

  1. june conway beeby says:

    Scientific brain research has unequivocally demonstrated that schizophrenia, manic depression and related disorders are biological diseases of the brain.

    Families cannot prevent these diseases of the brain nor can they cause them.
    So any advice ,suggestion or finger pointing at families to convince them they have the power to prevent serious mental illnesses in their offspring is quack advice.

    It’s time fanilies made it a point to learn more about the diseases that affect their loved ones so they will no longer be easy targets for mistaken or useless suggestions for treatment.

    A reading of well-known science writer Jay Ingram’s book “Fatal Flaws: How a misfolded protein baffled scientists and changed the way we look at the brain”
    is a perfect way to understand the new knowledge.

    We are living in an age where science can reliable inform us about the reality of serious mental illnesses. We should no longer swallow what author and psychologist Steven Pinker calls “the usual social science rat’s nest of confounded variables.”

    We must become wise parents to protect our mentally ill loved ones from the false prophets.

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  2. JDS says:

    In a commentary article the arguments of Wilson et al are exposed as weak themselves.
    Can you give a reaction?

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • James Coyne PhD says:

      I will be limited in my comments because BMC Medicine has invited me to publish a commentary on both the Wilson article and the reply from the Sanders et al, who, by virtue of their professional and financial benefits from the promotion of Triple P, have a serious conflict of interest in struggling to protect a positive evaluation of Triple P from critics’ skepticism and scrutiny.
      I was quite disappointed in the reply from the Triple P group in terms of its defensiveness and evasiveness. Their main complaints with Wilson et al seem to be that the critical systematic review and meta-analysis were not pre-registered; (b) that the deficiencies In the Triple P literature that Wilson and colleagues identified are shared by of their claims for treatments being evidence- supported; and (c) that Wilson and colleagues ignored the literature not published in English.
      My response is that Wilson and colleagues were sufficiently explicit and transparent in their procedures and criteria that anyone, including Sanders and colleagues, can readily replicate them. I challenge Sanders and colleagues to do so and see if they come up with a contradictory assessment. I’ll bet they do not. Secondly, the argument that “everyone does it” does not let proponents of Triple P off the hook, particularly when they make such extravagant claims on their websites around the world. Finally, it is quite common that meta-analyses and systematic reviews are limited to what is available in English. If Sanders at el can draw upon non-English articles to refute the strong conclusions of Wilson and colleagues, I welcome them to do that. Having done systematic reviews that were both limited to English and also conducted without that restriction, I don’t recall any conclusions that were unseated by consideration of a non-English literature, particularly when interventions have originated and are so heavily promoted in English speaking countries.
      However, both Wilson and colleagues and Sanders and colleagues ignored what for me was the most important issue. Wilson and colleagues were far too restrained and gentle in their critique and their avoidance of the most devastating criticism of the Triple P literature. Namely, as I’ve explained in my blog post, the evidence base for promoting and disseminating Triple P is largely dependent on small, grossly underpowered and flawed studies. When I set a reasonable criterion of limiting consideration to studies that had more than 50% probability of detecting a moderate sized effect if it were there, the evidence for Triple P disappeared into thin air. Eight of the Triple P trials had only 9 to 18 participants per cell. This is grossly inadequate to achieve the benefits of randomization and such trials are extremely vulnerable to reclassification, lost to follow-up or missing data from a single participant. That such small studies consistently report positive findings in their abstracts is statistically improbable, and strongly indicates gross confirmatory bias. Consistent positive findings reported in the abstracts of such small studies raise suspicions that investigators have manipulated results by data snooping, harking, cherrypicking, and other inappropriate strategies for handling and reporting data. As Wilson and colleagues showed, you just cannot trust what you read in the abstracts of clinical trials evaluating Triple P.
      I think it is incumbent upon Sanders and colleagues to demonstrate, perhaps drawing on non-English sources, that there is a basis other than the flawed underpowered studies for the extravagant claims on Triple P websites or bring the claims on the websites into better congruence with the best evidence.
      Overall, I find it Sanders et al’s unjustified dismissal of Wilson and colleagues’ critique is a clear demonstration that we cannot depend on authors with serious conflicts of interest and vested interests in particular treatments to offer objective response to criticism.
      The the fundamental issue is that communities around the world may be pinning their hopes and investing their scarce resources in the the expectation That Triple P will eliminate the problems of child adjustment related to poor parenting. I think that Wilson and colleagues have done a public service by raising the possibility that their hopes are misplaced and the resources are being squandered. Certainly, I do not think that an Investment in Triple P on a population basis is justified without a substantial commitment of resources to evaluating whether the program is indeed effective in populations with which it has not been adequately evaluated.

      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  3. R Schappin says:

    Unfortunately, the “belief” in Triple P is still widespread. We recently conducted an independent RCT on Triple P level 3, in which Triple P was not more effective than a wait-list control group. However, publishing these results is difficult, even in high-impact journals we get biased reviews based on the outcome of our trial (such as suggestions to see our results as exploratory). When negative results of studies remain to be difficult to publish, problems with interpreting interventions as ‘evidence based’ will remain to exist.

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • James Coyne PhD says:

      I have received emails from other groups they complain of bias and discrimination in reviews of their trials of Triple P with null or weak findings. Given the positive findings that are statistically improbable but nonetheless obtained and published trials, I think there is a strong confirmatory bias at play here. I think that there is a culture of obligatory replication, a concept introduced by John Ioannidis, operating in the Triple P literature.

      I suggest that you do two things. First, exercise your right to request exclusion of reviewers whom you feel may be biased, including based on their conflicts of interest. Second, submit to a journal that is quite receptive to null findings and failures to replicate, such as PLOS One.

      Good luck!

      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  4. JDS says:

    Thank you for your clear answer. I am looking forward to your article in BMC Medicine. Which I hope, will be published on PubMed soon.

    To be honest, not just for the important issue of evidence and the demasqué of financial motives behind such a pervasive parenting method, but also for my disbelief in the behavior-centered approach Triple-P seems to stand for…

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
  5. Pingback: Raising kids by Triple P, not that sure if it works… | From experience to meaning…

  6. Bruce Thyer says:

    I edit a bimonthly journal called “Research on Social Work Practice” It has the highest impact factors of the ‘real’ social work journals and about 7500 subscribers. It is produced bimonthly by Sage Publications and is in its 21st year of publication.

    RSWP WELCOMES well designed studies with negative or null results. Anyone having problems publishing such a study should feel free to contact me about the possibility of submitting it to RSWP.

    Negative studies can be immensely valuable. Look, for example, at the numerous studies which failed to document an association between childhood vaccinations and autism. Or that Facilitated Communication is not an effective method to help persons with autism communicate.

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)
    • Bravo, Bruce. and your example of facilitated communication with autism not being effective is an example of the kinds of negative findings that need this kind of exposure.

      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  7. Pingback: Some thoughts on Triple P and evidence based practice | Dr Jackie Kirkham

    • This is a wonderful, heartfelt response to my blog post. The image of being required to warn low income minority persons to keep their children away from the family swimming pool when is unattended, as has to be done with triple P, is absolutely precious and will stick with me.

      VN:F [1.9.22_1171]
      Rating: 0 (from 0 votes)
  8. Tonya says:

    Your оwn ρost pгοvides established necessary to me pегsonally.
    It’s еxtremely useful and you’re simply obviously extremely knowledgeable of this type. You have got opened my eyes for you to various thoughts about this specific matter along with intriguing, notable and strong articles.

    My web-site … Tonya

    VA:F [1.9.22_1171]
    Rating: 0 (from 0 votes)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>