Reanalysis: No health benefits found for pursuing meaning in life versus pleasure

NOTE: After I wrote this blog post, I received via PNAS the reply from Steve Cole and Barbara Fredrickson to our article.  I did not have time to thoroughly digest it, but will address it in a future blog post. My preliminary impression is that their reply is, ah…a piece of work. For a start, they attack our mechanical bitmapping of their data as an unvalidated statistical procedure. But calling it a statistical procedure is like Sarah Palin calling Africa a country. And they again assert the validity of  their scoring of a self-report questionnaire without documentation. As seen below, I had already offered to donate $100 to charity if they can produce the unpublished analyses that justified this idiosyncratic scoring. The offer stands. They claim that our factor analyses were in appropriate because the sample size was too small, but we used their data, which they claimed to have factor analyzed. Geesh. But more on their reply later.

Our new PNAS article questions the reliability of results and interpretations in a high profile previous PNAS article.

Fredrickson, Barbara L., Karen M. Grewen, Kimberly A. Coffey, Sara B. Algoe, Ann M. Firestine, Jesusa MG Arevalo, Jeffrey Ma, and Steven W. Cole. “A functional genomic perspective on human well-being.” Proceedings of the National Academy of Sciences 110, no. 33 (2013):   13684-13689.

 

From http://theoaklandjournal.com/oaklandnj/health-happiness-vs-meaning/

Oakland Journal http://tinyurl.com/lpbqqn6
Click to enlarge

Was the original article a matter of “science” made for press release? Our article poses issues concerning the gullibility of the scientific community and journalists regarding claims of breakthrough discoveries from small studies with provocative, but fuzzy theorizing and complicated methodologies and statistical analyses that apparently even the authors themselves do not understand.

  •  Multiple analyses of original data do not find separate factors indicating striving for pleasure versus purpose
  • Random number generators yield best predictors of gene expression from the original data

[Warning, numbers ahead. This blog post contains some excerpts from the results section that contain lots of numbers and require some sophistication to interpret. I encourage readers to at least skim these sections, to allow independent evaluation of some of things that I will say in the rest of the blog.]

A well-orchestrated media blitz for the PNAS article had triggered my skepticism. The Economist, CNN, The Atlantic Monthly and countless newspapers seemingly sang praise in unison for the significance of the article.

objecrtive approach to moralMaybe the research reported in PNAS was, as one the authors, Barbara Fredrickson claimed, a major breakthrough in behavioral genomics, a science-based solution to an age-old philosophical problem of how to lead one’s life.  Or, as she has later claimed in a July 2014 talk in Amsterdam, the PNAS article provided an objective basis for moral philosophy.

Maybe it showed

People who are happy but have little to no sense of meaning in their lives—proverbially, simply here for the party—have the same gene expression patterns as people who are responding to and enduring chronic adversity.

Skeptical? Maybe you are paying too much attention to your conscious mind. What does it know? According to author Steve Cole

What this study tells us is that doing good and feeling good have very different effects on the human genome, even though they generate similar levels of positive emotion… “Apparently, the human genome is much more sensitive to different ways of achieving happiness than are conscious minds.”

Or maybe this PNAS article was an exceptional example of the kind of nonsense, pure bunk, you can find in a prestigious journal.

Assembling a Team.

I blogged about the PNAS article. People whom I have yet to meet expressed concerns similar to mine. We began collaborating, overcoming considerable differences in personal style but taking advantage of complementary skills and background.

It all started with a very tentative email exchange with Nick Brown. He brought on his co-author from his American Psychologist article demolishing any credibility to a precise positivity ratio, Harris Friedman. Harris in turn brought on Doug McDonald to examine Fredrickson and Cole’s claims that factor analysis supported their clean distinction between two forms of well-being with opposite effects on health.

Manoj Samanta found us by way of my blog post and then a Google search that took him electric fishto Nick and Harris’ article with Alan Sokal. Manoj cited my post in his own blog. When Nick saw it, he contacted him. Manoj was working in genomics, attempting to map the common genomic basis for the evolution of electric organs in fish from around the world, but was a physicist in recovery. He was delighted to work with a couple of guys who had a co-authored a paper with his hero from grad school, Alan Sokal. Manoj interpreted Fredrickson and Cole’s seeming unnecessarily complicated approach to genomic analysis. Nick set off to deconstruct and reproduce Cole’s regression analyses predicting genomic expression.  He discovered that Cole’s procedure generated statistically significant (but meaningless) results from over two-thirds of the thousands of ways of splitting the psychometric data.  Even using random numbers produced huge numbers of junk results.

The final group was Nick, Doug, Manoj, Harris, and myself. Others came and went from our email exchanges, some accepting our acknowledgment in the paper, while others asked us explicitly not to acknowledge them.

The team gave an extraordinarily careful look at the article, noting its fuzzy theorizing and conceptual deficiencies, but we did much more than that. We obtained the original data and asked the authors of the original paper about their complex analytic methods. We then reanalyzed the data, following their specific advice. We tried alternative analyses and even re-did the same analyses with randomly generated data. Overall, our hastily assembled group performed and interpreted 1000s of analyses, more than many productive labs do in a year.

The embargo on our paper in PNAS is now off.

I can report our conclusion that

Not only is Fredrickson et al.’s article conceptually deficient, but more crucially statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.

A summary of our PNAS article is available here and the final draft is here.

Fuzzy thinking creates theoretical and general methodological  problems

Fractal_FunFredrickson et al. claimed that two types of strivings for well-being, eudaimonic and hedonic have distinct and opposite effects on physical health, by way of “molecular signaling pathways” or genomic expression, despite an unusually high correlation for two supposedly different variables. I had challenged the authors about the validity of their analyses in my earlier blog post and then in a letter to PNAS, but got blown off. Their reply dismissed my concerns, citing analyses that they have never shown, either in the original article or the reply.

In our article, we noted a subtlety in the distinction between eudamonia and hedonia.

Eudaimonic well-being, generally defined (including by Fredrickson et al.) in terms of tendencies to strive for meaning, appears to be trait-like, since such striving for meaning is typically an ongoing life strategy.

Hedonic well-being, in contrast is typically defined in terms of a person’s (recent) affective experiences, and is state-like; regardless of the level of meaning in one’s life, everyone experiences “good” and “bad” days.

The problem is

If well-being is a state, then a person’s level of well-being will change over time and perhaps at a very fast rate.  If we only measure well-being at one time point, as Fredrickson et al. did, then unless we obtain a genetic sample at the same time, the likelihood that the well-being score will actually accurately reflect level of genomic expression will be diminished if not eliminated.

In an interview with David Dobbs, Steven Cole seems to suggest an irreversibility to thecole big slide changes that eudaimonic and hedonic strivings produce:

“Your experiences today will influence the molecular composition of your body for the next two to three months,” he tells his audience, “or, perhaps, for the rest of your life. Plan your day accordingly.”

Hmm. Really? Evidence?

Eudaimonic and hedonic well-being constructs may have a long history in philosophy, but empirically separating them is an unsolved problem. And taken together, the two constructs by no means capture the complexity of well-being.

Is a scientifically adequate taxonomy of well-being on which to do research even possible? Maybe, but doubts are raised when one considers the overcrowded field of well-being concepts available in the literature—

General well-being, subjective well-being, psychological well-being, ontological well-being, spiritual well-being, religious well-being, existential well-being, chaironic well-being, emotional well-being, and physical well-being—along with the various constructs which treated as essentially synonymous with well-being, such as self-esteem, life-satisfaction, and, lest we forget, happiness.

No one seems to be paying attention to this confusing proliferation of similar constructs and how they are supposed to relate to each other. But in the realm of negative emotion, the problem is well known and variously referred to as the “big mush” or “crud factor”. Actually, there is a good deal of difficulty separating out positive well-being concepts from their obverse concepts, negative well-being.

Fredrickson and colleagues found that eudaimonia and especially hedonic well-being were strongly, but negatively related to depression. Their measures of depression qualified as a covariate or confound for their analyses, but somehow disappeared from further consideration. If it had been retained, it would have further reduced the analyses to gobbledygook. Technically speaking, the residual of hedonia-controlling-for (highly correlated)-eudaimonia-and-depression does not even have a family resemblance to hedonia and is probably nonsense.

Fredrickson et al. measured well-being with which they called the Short Flourishing Scale, AKA and better known in the literature as the Mental Health Continuum-Short Form (MHC-SF).

We looked and we were not able to identify any published evidence of a two factor solution in which distinct eudaimonic and hedonic well-being factors adequately characterized MHC-SF data.

The closest thing we could find was

Keyes et al. (10) referred to these groupings of hedonic and eudaimonic items as “clusters,” an ostensibly neutral term that seems to deliberately avoid the word “factor.”

However, his split of the MHC-SF items into hedonic and eudaimonic categories appears to have been made mainly to to allow arbitrary classifying of persons as “languishing” versus “flourishing.” Yup, positive psychology is now replacing the stigma of conventional psychology’s deficiency model of depressed versus not depressed with a strength model of languishing versus flourishing.

In contrast to the rest of the MHC-SF literature,  Fredrickson at el referred to a factor analysis of – implicitly in their original PNAS paper, and then explicitly in reply to my PNAS letter – yielding two distinct factors (“Hedonic” and “Eudaimonic”), corresponding to Keyes’ languishing versus flourishing diagnoses (i.e., items SF1–SF3 for Hedonic and SF4–SF14 for Eudaimonic).

The data from Fredrickson et al were mostly in the public domain. After getting further psychometric data from Fredrickson’s lab, we set off set off on a thorough reanalysis that should have revealed whatever basis for their claims there might be.

In exploratory factor analyses, which we ran using different extraction (e.g., principal axis, maximum likelihood) and rotation (orthogonal, oblique) methods, we found two factors with eigenvalues greater than 1 with all items producing a loading of .50 on at least one factor.

That’s lots of analyses, but results were consistent:

Examination of factor loading coefficients consistently showed that the first factor was comprised of elevated loadings from 11 items (SF1, SF2, SF3, SF4, SF5, SF9, SF10, SF11, SF12, SF13, and SF14), while the second factor housed high loadings from 3 items (SF6, SF7, and SF8).

Click to enlarge

Click to enlarge

If this is the factor structure Fredrickson and colleagues claim, eudaimonic well-being would have to be the last three items. But look at them in the figure on the left and particularly look at the qualification below. The items seem to reflect living in a particular kind of environment that is safe and supportive of people like the respondent. Actually, these results seem to lend support to my complaint that positive psychology is mainly for rich people: to flourish, one must live in a special environment. If you languish, it is your fault.

Click to enlarge

Click to enlarge

Okay, we did not find much support for the claims of Fredrickson and colleagues, but we gave them another chance with a confirmatory factor analysis (CFA). With this analysis, we would not be looking for the best solution, only learning if either one or two factor models are defensible.

For the one-factor model, goodness-of-fit statistics indicated grossly inadequate fit (χ2 = 227.64, df = 77, GFI = .73, CFI = .83, RMSEA = .154).  Although the equivalent statistics for the correlated two-factor model were slightly better, they still came out as poor (χ2 = 189.40, df = 76, GFI = .78, CFI = .87, RMSEA = .135).

Thus, even though our findings tended to support the view that well-being is best represented as at least a two dimensional construct, we did not confirm Fredrickson et al.’s claim (6) that the MHC-SF produces two factors conforming to hedonic and eudaimonic well-being.

Hey Houston, we’ve got a problem.

As Ryff and Singer (15) put it, “Lacking evidence of scale validity and reliability, subsequent work is pointless” (p. 276).

Maybe we should have thrown in the towel. But if Fredrickson and colleagues could

From Hilda Bastian

From Hilda Bastian

nonetheless proceed to multivariate analyses relating the self-report data to genomic expression, we decided that we would follow in the same path.

Relating self-report data to genomic expression: Random can be better

Fredrickson et al. analytic approach to genomic expression seemed unnecessarily complicated. They repeated regression analyses 53 times (which we came to call RR53) in which they regressed each of 53 genes of interest on eudaimonic and hedonic well-being and a full range of confounding/control variables.  Recall that they had only 80 participants. This approach leaves them lots of room for capitalizing on chance.

So, why not simply regress

the scores for hedonic and eudaimonic well-being on the average expression of the 53 genes of interest, after changing the sign of the values of those genes that were expected to be down-regulated. [?]

After all the authors had said

[T]he goal of this study is to test associations between eudaimonic and hedonic well-being and average levels of expression of specific sets of genes” (p. 1)

We started with our simpler approach.

We conducted a number of such regressions, using different methods of evaluating the “average level of expression” of the 53 CTRA genes of interest (e.g., taking the mean of their raw values, or the mean of their z-scores), but in all cases the model ANOVA was not statistically significant.

Undaunted, we next applied the RR53 regression procedure to see whether it could, in contrast to our simpler “naive” approach, yield such highly significant results with the factors we had derived.

You can read the more technical description of our procedures in our article and its supplementary materials, but our results were

The t-tests for the regression coefficients corresponding to the predictor variables of interest, namely hedonic and eudaimonic well-being, were almost all non-significant (p > .05 in 104 out of 106 cases; mean p = .567, SD = 0.251), and in the two remaining cases (gene FOSL1, for both “hedonic,” p = .047, and “eudaimonic,” p = .030), the overall model ANOVA was not statistically significant (p = .146).

We felt that drawing any substantive conclusions from these coefficients is inappropriate.

Nonetheless, we continued….

We…created two new variables, which we named PWB (corresponding to items SF1–SF5 and SF9–SF14) and EPSE (corresponding to items SF6–SF8).  When we applied Fredrickson et al.’s regression procedure using these variables as the two principal predictor variables of interest (replacing the Hedonic and Eudaimonic factor variables), we discovered that the “effects” of this factor pair were about twice as high as those for the Hedonic and Eudaimonic pair (PWB: up-regulation by 13.6%, p < .001; EPSE: down-regulation by 18.0%, p < .001; see Figures 3 and 4 in the Supporting Information).

Wow, if we accept statistical significance over all other considerations, we actually did better than Fredrickson et al.

Taken seriously, it suggests that the participants’ genes are not only expressing “molecular well-being” but even more vigorously, some other response that we presume Fredrickson et al. might call “molecular social evaluation.”

Or we might conclude that living in a particular kind of environment, is good for your genomic expression.

But we were skeptical about whether we could give substantive interpretations of any kind and so we went wild, using the RR53 procedure with every possible way of splitting up the self-report data. Yup, that is a lot of analyses.

Excluding duplicates due to symmetry, there are 8,191 possible such combinations.  Of these, we found that 5,670 (69.2%) gave statistically significant results using the method described on pp. 1–2 of Fredrickson et al.’s Supporting Information (7) (i.e., the t-tests of the fold differences corresponding to the two elements of the pair of pseudo-factors were both significant at the .05 level), with 3,680 of these combinations (44.9% of the total) having both components significant at the .001 level.

Furthermore, 5,566 combinations (68.0%) generated statistically significant pairs of fold difference values that were greater in magnitude than Fredrickson et al.’s (6, figure 2A) Hedonic and Eudaimonic factors.

While one possible explanation of these results is that differential gene expression is associated with almost any factor combination of the psychometric data, with the study participants’ genes giving simultaneous “molecular expression” to several thousand factors which psychologists have not yet identified, we suspected that there might be a more parsimonious explanation.

But we did not stop there. Bring on the random number generator.

As a further test of the validity of the RR53 procedure, we replaced Fredrickson et al.’s psychometric data (6) with random numbers (i.e., every item/respondent cell was replaced by a random integer in the range 0–5) and re-ran the R program.  We did this in two different ways.  First, we replaced the psychometric data with normally-distributed random numbers, such that the item-level means and standard deviations were close to the equivalent values for the original data.  With these pseudo-data, 3,620 combinations of pseudo-factors (44.2%) gave a pair of fold difference values having t-tests significantly different from zero at the .05 level; of these, 1,478 (18.0% of the total) were both statistically significant at the .001 level.  (We note that, assuming independence of up- and down-regulation of genes, the probability of the latter result occurring by chance with random psychometric data if the RR53 regression procedure does indeed identify differential gene expression as a function of psychometric factors, ought to be—literally—one in a million, i.e. 0.001², rather than somewhere between one in five and one in six.)  Second, we used uniformly-distributed random numbers (i.e., all “responses” were equally likely to appear for any given item and respondent).  With these “white noise” data, we found that 2,874 combinations of pseudo-factors (35.1%) gave a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 893 (10.9% of the total) were both significant at the .001 level.  Finally, we re-ran the program once more, using the same uniformly distributed random numbers, but this time excluding the demographic data and control genes; thus, the only non-random elements supplied to the RR53 procedure were the expression values of the 53 CTRA genes.  Despite the total lack of any information with which to correlate these gene expression values, the procedure generated 2,540 combinations of pseudo-factors (31.0%) with a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 235 (2.9% of the total) were both significant at the .001 level.

Thus, in all cases, we obtained far more statistically significant results using Fredrickson et al.’s methods (6) than would be predicted by chance alone for truly independent variables (i.e., .052 × 8191 ≈ 20), even when the psychometric data were replaced by meaningless random numbers.  To try to identify the source of these puzzling results, we ran simple bivariate correlations on the gene expression variables, which revealed moderate to strong correlations between many of them, suggesting that our significant results were mainly the product of shared variance across criterion variables.  We therefore went back to the original psychometric data, and “scrambled” the CTRA gene expression data, reassigning each cell value for a given gene to a participant selected at random, thus minimizing any within-participants correlation between these values.  When we re-ran the regressions with these data, the number of statistically significant results dropped to just 44 (.54%).

The punchline

To summarize: even when fed entirely random psychometric punchlinedata, the RR53 regression procedure generates large numbers of results that appear, according to these authors’ interpretation, to establish a statistically significant relationship between self-reported well-being and gene expression.  We believe that this regression procedure is, simply put, totally lacking in validity.  It appears to be nothing more than a mechanism for producing apparently statistically significant effects from non-significant regression coefficients, driven by a high degree of correlation between many of the criterion variables.

poof1Despite exhaustive efforts, we could not replicate the authors’ simple factor structure differentiating hedonic versus eudaimonic well-being, upon which their genomic analyses so crucially depended. Then we showed that the complicated RR53 procedure turned random nonsense into statistically significant results. Poof, there is no there there (as Gertrude Stein once said about Oakland, California) in their paper, no evidence of “molecular signaling pathways that transduce positive psychological states into somatic physiology,” just nonsense.

How in the taxonomy of bad science, do we classify first this slipup and the earlier one in American Psychologist? Poor methodological habits, run-of-the-mill scientific sloppiness, innocent probabilistic error, injudicious hype, or simply an unbridled enthusiasm with inadequate grasp of methods and statistics?

Play nice and avoid the trap of negative psychology?

keep-calm-and-radiate-positivityOur PNAS article exposed the unreliability of the results and interpretation offered in a paper claimed to be a game changing breakthrough in our understanding of how positive psychology affects health by way of genomic expression. Science is slow and incomplete in self-correcting. But corrections, even of outright nonsense, seldom garner the attention of the original error. It is just not as newsworthy to find that claims of minor adjustments in everyday behavior modifying gene expression are nonsense as to make unsustainable claims in the first place.

Given the rewards offered by media coverage and even prestigious journals, authors can be expected to be incorrigible in terms of giving in to the urge to orchestrate media attention for ill understood results generated by dubious methods applied in small samples. But the rest of the scientific community and journalists need to keep in mind that most breakthrough discoveries are false, unreplicable, or at least wildly exaggerated.

The authors were offered a chance to respond to my muted and tightly constrained letter to PNAS. Cole and Fredrickson made references to analyses they have never presented and offered misinterpretations of the literature that I cited. I consider their response disingenuous and dismissive of any dialogue. I am willing to apologize for this assessment if they produce the factor analyses of the self-report data to which they pointed. I will even donate $100 to the American Cancer Society if they can produce it. I doubt they will.

Concerns about the unreliability of the scientific and biomedical literature have risen tothanks_for_not_kvetching_small_mugs the threshold of precipitating concern fromthe director of NIMH, Francis Collins. On the other hand,a backlash has  called out critics for encouraging a negative psychology warned to temper our criticism. Evidence of the excesses of critics include “’voodoo correlation’ claims, ‘p-hacking’ investigations, websites like Retraction Watch, Neuroskeptic, [and] a handful of other blogs devoted to exposing bad science”, to caution us that “moral outrage has been conflated with scientific rigor.” We are told we are damaging the credibility of science with criticism and that we should engage authors in clarification rather than criticize them. But I think our experience with this PNAS article demonstrates just how much work it takes to deconstruct outrageous claims based on methods and results that authors poorly understand but nonetheless promote in social media campaigns.. Certainly, there are grounds for skepticism based on prior probabilities, and to be skeptical is not cynical. But is not cynical to construct the pseudoscience of a positivity ratio and then a faux objective basis for moral philosophy?

Category: genomics, hedonia, positive psychology | Tagged , , , , | 5 Comments

Distress- the 6th vital sign for cancer patients?

A third or more of cancer patients experience significant psychological distress following their diagnosis. Yet a much smaller proportion receives any psychosocial services.

Cancer-Care-for-the-Whole-Patient-Meeting-Psychosocial-Health-NeedsThis situation is counter to recommendations from a number of sources, including the US Institute of Medicine report, Care of the Whole Patient, Meeting Psychosocial Health Needs. The gap between the proportion of patients experiencing distress and those getting services has led to widespread calls for implementing routine screening for distress. The assumption is that the breakdown between being distressed and getting services is patients getting identified and being referred to appropriate services.

There have been intense national and international campaigns by professional organizations first to recommend implementation of screening and then to mandate it as a condition of accreditation of cancer care settings. Increasingly, campaigns are organized around the slogan “distress is the sixth vital sign.” Promoters have sufficient political clout to get the slogan into the titles of articles in scientific journals, often with leaders of the field of psycho-oncology lending support as co-authors.

Holland, J. C., & Bultz, B. D. (2007). The NCCN guideline for distress management: a case for making distress the sixth vital sign. Journal of the National Comprehensive Cancer Network, 5(1), 3-7.

Bultz, B. D., & Johansen, C. (2011). Screening for distress, the 6th vital sign: where are we, and where are we going? PsychoOncology, 20(6), 569-57.

Watson, M., & Bultz, B. D. (2010). Distress, the 6th vital sign in cancer care. Psycho-oncologie, 4(3), 159-163.

The push to make distress accepted as the sixth vital sign for cancer patients is modeled after efforts to get pain designated as the fifth vital sign for general medical patients. In the late 1980s, it was recognized that pain management was grossly inadequate, with many patients’ pain going undetected and untreated. Pain was designated the fifth vital sign in general medical settings, with guidelines set for assessment and improved referral and treatment. These guidelines were first simply recommendations, but they grew to be mandated, with performance goals and quality of care indicators established to monitor their implementation in particular settings.

What is distress? Maybe not what you think

distressDistress was originally defined narrowly in terms of stress and anxiety and depressive symptoms. Early screening recommendations were for quick assessment with a distress thermometer modeled after the simple pain rating scale adopted in the campaign for pain as the fifth vital sign. Efforts were made to validate distress thermometers in terms of their equivalence to longer self-report measures of anxiety and depression, and in turn, interview-based measures of psychiatric disorder. Ratings on a distress thermometer were not seen as a substitute for longer assessments, but rather as a screening for whether longer assessments were needed. Patients who screened above a cutpoint on a distress thermometer would be identified for further assessment.

“Distress” was the preferred term, rather than stress or anxiety or depressive symptoms, because it was viewed as more neutral and not necessarily indicating that someone who wanted services had a psychiatric disorder. It was believed that many cancer patients did not seek services out of fear of being stigmatized and this vague term might avoid that raising that barrier to treatment.

However, the definition has now been extended to include a much broader range of needs and concerns. The widely cited definition of the NCCN of distress is

An unpleasant emotional experience of a psychological, social and/or spiritual nature which extends on a continuum from normal feelings of vulnerability, sadness and fears to disabling problems  such as depression, anxiety, panic, social isolation and spiritual crisis.

The emphasis is still on quick screening, but the distress thermometer is typically supplemented with a checklist of common problems. Particular items vary across recommendations for screening, but the checklist is meant to be a brief supplement to the distress thermometer. Canadians who have taken the lead in promoting distress as the sixth vital sign recommend a Modified Problem Checklist (PCL).

Click to Enlarge

Click to Enlarge

This list contains the 7 most common practical problems in our  settings (accommodation, transportation, parking, drug coverage, work/school, income/finances, and groceries); and 13 psychosocial problems (burden to others, worry about family/friends, talking with family, talking with medical team, family conflict, changes  in appearance; alcohol/drugs, smoking, coping, sexuality, spirituality, treatment decisions and  sleep). Participants indicate the presence or absence of each problem in the preceding week.

Despite the range of problems being assessed, the assumption is still that the predominant form of services that patients will seek as a result of being screened is some form of psychological counseling or psychotherapy.

Interventions usually assume one of four common forms: psychoeducation, cognitive-behavioural training (group or individual), group supportive therapy, and individual supportive therapy.

Support by professional organizations for screening of distress is quite widespread. Well-organized proponents exercise their political clout in making expression of enthusiasm for screening an obligatory condition for publishing in psycho-oncology journals. Dissenting critics and embarrassing data can be summarily sandbagged. Even when embarrassing data makes it through the review process, authors may be surprised to discover that their articles are accompanied by disparaging commentaries, to which they have not been offered an opportunity to respond. I have been publishing for over thirty years, and I have never before encountered the level of bullying that I have seen with papers concerning screening.

screeningYet, promotion of implementing routine screening for distress is based only on consensus of like-minded professionals, not the systematic evaluation of evidence that medical screening generally requires. Screening tests are common in medicine. The standard criteria for adopting a recommendation for screening is that it can be shown to improve patient outcomes beyond simply making available to patients and their providers the resources that are accessed by screening.

It is not sufficient that screening address a serious problem, it must also lead to improved outcomes. For instance, it is now recognized that although prostate cancer is a common and threatening disease, routine screening for prostate cancer of men without symptoms is likely to lead to overtreatment and side effects incontinence and loss of a sexual functioning, without any improvement in overall survival. Campaigns to promote  prostate specific antigen (PSA) testing have largely been discontinued. More generally, Choosing Wisely committees are systematically reevaluating the evidence for common forms of screening and withdrawing recommendations. Some well established screening procedures are losing their recommendation.

We undertook a systematic review of the literature concerning screening for distress and failed to find sufficient evidence to recommend it.

scrrening art

Click to Enlarge

Because of the lack of evidence of beneficial effects of screening cancer patients for distress, it is premature to recommend or mandate implementation of routine screening.

Other reviews have come to more favorable evaluations of screening,  arrived at by including nonrandomized trials or trials that chose number of referrals being made is a surrogate outcome in place of distress being reduced. However, for a patient getting referred to experience reduced distress, a complex pathway has to be crossed, starting with acceptance and completion of the referral and including receiving appropriate treatment at an appropriate level of intensity and frequency. Referrals, particularly when they are to outside the cancer’s care setting, are notoriously black holes. There are a few systematic data concerning the fate of referrals, but the general consensus is that many, perhaps most are not completed.

My colleagues and I also noted that the professional recommendations for screening have not been developed according to the standard procedures for making such recommendations. Development of professional guidelines is supposed to follow an orderly process of

  • Assembling a diverse and representative group of relevant stakeholders,
  • Systematically reviewing the literature with transparency in search criteria and method of synthesis and interpretation.
  • Publication of preliminary recommendations for public comment.
  • Revision of preliminary recommendations taking into account feedback.

In contrast, recommendations for screening have simply been released by professional groups assembled on the basis of previously known loyalty to the conclusion and with inattention to the lack of evidence showing that screening would improve outcomes. Notably missing from the “consensus” groups recommending screening are patients, frontline clinicians who will have to implement screening, and professionals from the community who would have to receive and evaluate referrals, notably primary-care physicians in most countries.

Nonetheless, the push for routine screening of cancer patients for psychological distress continues to gain momentum, with goals being set in many settings for the minimal proportion of cancer patients who must be screened.

The slogan “distress is the sixth vital sign” is not a testable empirical statement of the order of “depression is a significant risk factor for cardiac mortality.” Rather, it is best evaluated in terms of the use of terms, particularly “vital sign” and whether adopting slogan improves patient outcomes. Let us look at the notion of vital sign and then at the outcome of efforts to adopt pain has the fifth vital sign.

What is a vital sign?

According to Wikipedia

vital_signs_bot_image_blkVital signs are measures of various physiological statistics, often taken by health professionals, in order to assess the most basic body functions. Vital signs are an essential part of a case presentation. The act of taking vital signs normally entails recording body temperature, pulse rate (or heart rate), blood pressure, and respiratory rate, but may also include other measurements. Vital signs often vary by age.

In an excellent discussion of vital signs, Lucy Hornstein states

If someone other than the patient can’t see, hear, palpate, percuss, or measure it, it’s a symptom. Anything that can be perceived by someone else is a sign.

And

Vital signs are measured…and yield numeric results. Normal ranges are defined; values that fall outside those normal ranges are described with specific words (eg, bradycardia, tachypnea, hypothermia, hypertension).

A cough is a sign, but a headache is a symptom. Not that headaches are less “real,” they are just different.

With the standard definitions, pain cannot be a vital sign, only a symptom. The labeling of pain as a vital sign is metaphoric involves twisting the definition of a sign.

Something I will note now, but come back to later in the blog: patients cannot really argue against results of assessment of a vital sign. If repeated blood pressure readings indicate dystolic pressure above established cutoff, it is not left for the patient to argue “no, I do not have hypertension.” The notion is that a vital sign is objective and not subject to validation by patient self-report, although what to do about the vital sign may be interpreted in terms of other data about the patient, some of it from self-report. The point is that calling a measurement a “vital sign” claims particular authority for the results.

There have been numerous proposals for designating other vital signs in need of routine assessment and recording in medical records. These include (here I have extended the list from Wikipedia):

Pain as the fifth vital sign

Okay, so it is only metaphorically that pain can be considered a fifth vital sign. But what was nonetheless accomplished by designating it so in terms of improving its management? What can be learned that can be applied to designating distress as the sixth vital sign?

pain buttonThe pain-as-fifth-vital-sign (P5VS) campaign simply started with publicizing of high levels of untreated pain in medical populations, which in turn led to the consensus statements and mandated screening, similar to what is now occurring with declaring distress as the sixth vital sign.

1992. US Agency for Healthcare Quality Research (AHQR), issued guidelines documenting that half of surgical patients did not receive adequate post-surgical medication. The guidelines declared patients had a “right” to adequate pain measurement.

1995. American Pain Society (APS) issued a landmark consensus statement with guidelines for a quality improvement approach towards the treatment of acute and cancer pain [10], expanding upon its 1989 guidelines [11] for the treatment of pain. In his Presidential Address, Dr. James Campbell presented the idea of evaluating pain as a vital sign.

1998. The Veterans Health Administration (VHA) launched a national  P5VS initiative to pain thermometerimprove pain management for its patients, starting with documentation in the electronic medical record of assessment of patients’ self-report of pain.  It required use of a Numeric Rating Scale (NRS) in all clinical encounters.

P5VS campaigns were based on professional consensus, not appraisal of best evidence. When studies were later conducted, results did not demonstrate that the fifth vital sign campaign improved patient outcomes.

The title of a study of 300 consecutive patients before the start of the VA initiative and 300 afterwards says it all:

Measuring Pain as the 5th Vital Sign Does Not Improve Quality of Pain Management

The study examined 7 process indicators of quality of pain management fail to identify any improvement in

  • Subjective provider assessment (49.3% before versus 48.7% after).
  • Occurrence of a pain exam (26.3% versus 26.0%).
  • Orders to assess pain (11.7% versus 8.3%).
  • New pain medication prescribed (8.7% versus 11.0%)
  • Change in existing pain medication prescription (6.7%, 4.3%)
  • Other pain treatment (11.7% versus 13.7%).
  • Recording of follow-up plans (10.0%, 8.7%).

The initiative required that “patients indicating a score of four or about on a pain scale should receive comprehensive pain assessment and prompt intervention . . ’’

In a subsample of patients who reported substantial pain:

  • 22% had no attention to pain documented in the medical record of the visit which they reported.
  • 52% received no new therapy for pain at that visit.
  • 27% had no further assessment documented.

Our investigation of the P5VS…initiative at a single VA site has shown the routine documentation of pain levels, even with system-wide support and broad-based provider education, was ineffective in improving the quality-of-care.

carly-simon-havent-got-time-for-the-pain-1974

Carly Simon – Haven’t Got Time for the Pain

What was going wrong? It appears that front line clinicians making the pain assessments lacked the time or skills to take effective action. Pain assessments were typically conducted in encounters for which there are other reasons for the visit. Furthermore, pain assessments were collected outside of a clinical interaction in which the context or history could be discussed, which could likely lead to invalid ratings. When patients were asked only to rate current pain, it was unknown whether they took into account how much the pain bothered them, whether it was acute or chronic, or whether it reflected any change from past levels, all meaningful considerations. Other clinicians in the system either did not receive the ratings in a timely fashion or lacked the context to interpret the ratings.

Other studies [1, 2] similarly demonstrated that the P5VS campaign was by itself ineffective:

One potential reason for the insensitivity of routine pain screening in these studies is that all were conducted in outpatient primary and specialty care settings where chronic persistent or intermittent pain is much more common than acute pain. Routine pain screening that focuses on pain intensity “now” may not be sufficiently sensitive to detect important chronic pain that occurs episodically or varies with activity. In the VA primary care, the vast majority of pain problems are longstanding in nature, so sensitivity for chronic pain is important for any pain screening strategy in this setting.

The mandate that unrelieved pain must be addressed soon led to ineffective, inappropriate treatment that was not based on proper assessment. There was an increase in diagnostic tests that only confirmed existence of pain that was difficult to treat, notably chronic back pain. In the decade after the campaign to reduce pain was launched, costs of treating chronic back pain escalated without any demonstrated improvement in patient outcomes.

The guidelines had been promulgated with claims that addiction to opiates prescribed for acute pain was rare. But the evidence for that was only a brief 1980 letter in the New England Journal of Medicine indicating only four instances in treatment of 12,000 patients.

The campaign to improve pain control using routine ratings had an effect unanticipated by its proponents.

Dispensing opioids has almost doubled according to National Health and Nutrition Examination Survey data indicating that from 1988 to 1994 a total of 3.2% of Americans reported using opioids for pain, whereas from 2005 to 2008 a total of 5.7% reported use.

This significant increase has been associated with serious consequences, including an estimated 40 deaths per day due to prescription opioids.

Put simply

Improving pain care may require attention to all aspects of pain management, not just screening.

The pain as the fifth vital sign campaign involved mandated assessment of pain with a simple numerical scale at every clinical encounter, regardless of the reason for visit. The rating was typically obtained in the absence of any effort to talk to patients or to examine them in an effort to determine what were likely the multiple sources of their pain, its history, and their goals. Indeed, collecting these simple ratings may have become a substitute for having such discussions. The number on the rating scale came to characterize the patient for purposes of clinical decision-making, and may have led to overtreatment including escalating prescription of pain medication.

Is a good idea to consider distress the sixth vital sign?

Like pain, distress is not a vital sign by the conventional definition of the term. Yet to label it as such suggests that there is some sort of objective procedure involved in collecting ratings on a distress thermometer from patients.

Generally ignored in the promotion of screening is that most patients who indicate distress above established thresholds do not wish to receive a psychosocial service that they are not already receiving. Current guidelines for screening do not have a requirement of asking patients where they have any need for services. Instead, their response to the distress thermometer is used to tell them that they need intervention, with an emphasis on counseling.

When asked directly, most distressed patients reject the need for psychosocial services that they are not already getting, often outside the cancer center.  A rather typical study found that 14% definitely and an additional 29% maybe wanted to talk with a professional about their problems. Patients variously report

  • Already receiving services.
  • Believing they can solve the problems themselves.
  • Concentrating on the treating their physical illness takes precedence over receiving psychosocial and supportive services.
  • Services being offered to them are not needed, timely, or what they preferred.

A heightened score on a distress thermometer is a poor indication of whether patients are interested in receiving services that are listed on a screening sheet. Most do not want to receive a service, but most receiving services are not distressed. Think about it, looking at the problems listed on the screening form above. Many of the problems would be endorsed without patients having a heightened distress score. This poses for the dilemma for any interpretation of a score on a distress thermometer as if it were a vital sign.

Overall, thinking about distress as the sixth vital sign creates the illusion that a score on a distress thermometer is an authoritative, objective standalone indicator, much like a blood pressure reading. Actually, scores on a distress thermometer need to be discussed and interpreted. If distress is taken too seriously as the sixth vital sign, there is a risk that patients who do not meet the cut off for clinically significant distress will be denied an opportunity to discuss the problems that they might otherwise seek.

My colleagues and I undertook a study where we used results of screening for distress to attempt to recruit a sample of patients for an intervention trial evaluating problem-solving therapy as a way of reducing distress. It proved to be a discouragingly inefficient process.

  • We screened 970 cancer patients, of whom 423 were distressed, and, of these, 215 indicated a need for services. However, only 36 (4%) consented to participate in the intervention study.
  • 51% of the distressed patients reported having no need for psychosocial services and an additional 25% were already receiving services for their needs.
  • Overall, we had to screen 27 patients in order to recruit a single patient, with 17 hours of time required for each patient recruited.

Consider could have been accomplished if these 17 hours of were used instead to talk to patients who had indicated they wanted to talk to someone about their problems.

Designating distress as the sixth vital sign suggest false objectivity and validity to a procedure that has not demonstrated improvement in patient outcomes. It is an advertising slogan that is likely to prove ineffective and misdirect resources, just as the P5VS did.

 

 

Category: cancer, distress, evidence-supported, mental health care, palliative care, screening | Tagged , , , , | Leave a comment

Is dying a mental health issue?

Should a dying patient talking to a psychiatrist be diagnosed with adjustment disorder related to death?

 Dare we ask: Could impending death not be sufficiently psychologically distressing for patients to routinely benefit from psychotherapy?

2Presentation1Patients aware that they are dying often wish to talk to someone about their predicament. Should psychiatrists and psychologists be the first line professionals for such discussions? After all, aren’t many dying patients experiencing substantial psychological distress? Specialty mental health professionals would have the skills to differentially diagnose this distress and offer appropriate targeted treatment. And what about their ability to identify and address issues of suicidality?

Or should discussions be left to clergy and pastoral counselors, especially for patients of faith?

Raising these questions could easily get us into philosophical and spiritual issues where we might feel excused from having to bring in evidence. But a relatively large-scale and well-designed study has given us a relevant answer. Maybe not the answer that the investigators hoped to find and they probably will not like my further interpretation. And, sure, it is only one randomized trial, but results seem to speak exceptionally clearly about a number of issues.

The study was negative, but demonstrates just how much a well-designed negative trial can reveal.

The study appeared in the prestigious Lancet Oncology (Journal impact factor [JIF] = 21.12), along with a commentary.

Chochinov, H. M., Kristjanson, L. J., Breitbart, W., McClement, S., Hack, T. F., Hassard, T., & Harlos, M. (2011). Effect of dignity therapy on distress and end-of-life experience in terminally ill patients: a randomised controlled trial. The Lancet Oncology, 12(8), 753-762.

Click to enlarge

Click to enlarge

The randomized trial was conducted with a large sample (441) of patients recruited fromhospices and home-based palliative care in Australia, Canada, and the United States. Investigators compared their preferred intervention, dignity therapy (165 patients), to two comparison/ conditions – standard palliative care (140 patients) and client-centered care (136 patients).

Stated goal

This study represents the first randomized control trial of Dignity Therapy. We set out to determine if this novel psychotherapeutic approach would significantly outperform standard care, or Client Centered Care…on various measures of psychological, existential and spiritual distress.

Primary outcome

The primary outcome was distress, which sounds simple enough, but it was measured 23 different ways. No one variable was designated as the primary outcome for the purposes of evaluating the effectiveness or ineffectiveness of dignity therapy would be judged. There are also 22 different secondary outcomes, single items evaluating patients’ experience participating in the intervention.

The three conditions

Dignity Therapy provides patients with life-limiting illnesses an opportunity to speak about things that matter most to them. The conversations are recorded and transcribed to provide the basis for a document that patients can bequeath to individuals of their choosing.

Please click to enlarge

Please click to enlarge

Patients were shown the Dignity Therapy question framework [see Table to the left] and asked to consider what they might wish to speak about during their Dignity Therapy session(s); this initial introduction to, and explanation of Dignity Therapy took about 30 minutes. Within a few days, or as soon as a second meeting could be arranged, the therapist used the question framework to help elicit the patient’s recollections, hopes, wishes for loved ones; lessons learned and things they wanted remembered by those they were about to leave behind. Dignity Therapy is flexible enough to accommodate participant preferences and choices regarding content, but the ethos of questioning targets those things that might enhance a sense of meaning, purpose, continued sense of self, and overall sense of dignity. All Dignity Therapy sessions were audio-taped; these sessions usually took about 60 minutes. Upon completion, the audio-recording was transcribed verbatim and the transcript edited, to provide a clear and readable narrative. This transcript or `generativity document’ was returned to the patient..,[and] read to them in its entirety to ensure that no errors of omission or commission needed to be addressed (this final session usually took about 30 minutes). .. The final version of the generativity document was then given to the patient, to be passed along to a designated recipient of their choosing.

dignity therapy tell themTherapists for the dignity therapy were psychologists, psychiatrists, or experienced palliative care nurses.

Generativity is a key concept in dignity therapy.

Generativity or the ability to guide the next generation, encompasses how patients might find strength or comfort in knowing that they will leave behind something lasing and transcendent after death.

Some readers may recognize the similarity of the investigators’ concept of generativity and Erik Erikson’s life stage of generativity versus stagnation.

In Standard Palliative Care patients had access to a full range of palliative care support services, including specialist palliative care physicians and nurses with expertise pain and symptom management, social workers, clergy, and mental health professionals.

There were not, however, any components of standard palliative care directly comparable to the dignity therapy.

Client Centred Care was a supportive psychotherapeutic approach, in which the therapist guides patients through discussions focusing on here and now issues. These might include their illness and symptoms and what is being done to address symptom distress. However, in order to keep this condition distinct from dignity therapy, the therapist did not encourage discussion of issues of meaning and purpose. If these topics came up, the therapist redirected the conversation back to illness-related issues.

The therapist for client-centred care was a research nurse.

What was found.

nothingNo evidence was found that that dignity therapy reduced distress across 22 different measures of distress, including overall scores and subscale scores from an array of measures.

The 23 items representing secondary outcomes had only a few differences between the three groups, no more than would be expected by chance. We should be careful about the statistically significant results that were obtained, but perhaps patients were expressing appreciation that they had been randomized to the specialized treatment, as well as having the document to leave for family members. With only five items dignity therapy was found to be superior to both standard palliative care and client-centred care.

The investigators’ interpretation.

Despite the beneficial effects of Dignity Therapy, its ability to mitigate outright distress, such as depression, desire for death or suicidality, has yet to be proven. However, there is currently ample evidence supporting its clinical application for patients nearing death, as a means of enhancing their end-of-life experience.

While floor effects precluded our ability to demonstrate significant differences across study arms, our secondary outcomes revealed substantive benefits of Dignity Therapy. Using a post-study survey, patients who received Dignity Therapy were significantly more likely to report benefits, in terms of finding it helpful, improving their quality of life, their sense of dignity; changing how their family might see or appreciate them, and deeming it helpful to their family, compared to the other study arms.

The investigators were obviously passionate about their intervention and looked exhaustively for evidence of the efficacy of dignity therapy. They really did not find any.

What I liked about this trial.

I like the unusually large sample size and the inclusion of two different control/comparison groups the allowed answering different kinds of questions. This is certainly not an underpowered pilot study being passed off as a full randomized trial. The Standard Palliative Care group allowed determination of whether dignity therapy offered anything beyond routine care. And a reader definitely gets the sense that routine care in this study was neither no care or inadequate care, as it is with so many studies. The Client Centered Care condition pitted the investigators’ preferred intervention against a lower intensity intervention that provides support, but without an element that the investigators considered key to their intervention.

The intervention followed a structured standardized protocol. Standardized training was provided, along with group supervision and review of transcripts of recordings of actual sessions to ensure fidelity of delivery.

The study recruited from a variety of settings and had an excellent uptake from patients who were offered an opportunity to participate.

Patients were randomized with the investigators blinded as to group assignment.

What I like less

The investigators administered a full battery of potential outcome measures including total and subscale scores of

The Functional Assessment of Chronic Illness Therapy Spiritual Well-Being Scale, Patient Dignity Inventory, Hospital Anxiety and Depression Scale, items from the Structured Interview for Symptoms and Concerns, Quality of Life Scale, and modified Edmonton Symptom Assessment Scale.

These measures tend to be moderately to highly correlated and so having this battery available for assessment of outcome represents considerable redundancy. The investigators not designating one or two measures as the primary outcomes set the stage for the investigators selecting a primary outcome after they knew the results, risking confirmatory bias and capitalization on chance.

The investigators had to contend with substantial lots of patients, not unexpectedly, because these are palliative care patients, but they did not avail themselves of a number of ways to provide estimates for the missing data and so analyses were not intention-to-treat.

This little reason to believe that this changed the results, however, because of the floor effect that they noted.

I did not like the positive spin put on the null findings of this trial. Confirmatory bias was clear in the abstract and extended to the discussion. The structured abstract conceded a lack of effect on distress, but cherry picked positive findings to emphasize out of an overall null set of secondary outcomes. In the interpretation section of the structured abstract offered:

Although the ability of dignity therapy to mitigate outright distress, such as depression, desire for death or suicidality, has yet to be proven, its benefits in terms of self-reported end-of-life experiences support its clinical application for patients nearing death.

The laudatory accompanying commentary.

Thecommentary on this article is an embarrassingly obvious attempt to spin and refute results the commentator did not want to accept.

Commentaries in peer-reviewed journals are typically not rigorously peer-reviewed. They tend to be written with an agenda, by reviewers who nominated themselves to write them. I have previously blogged about an unusual commentary in which the writer sought to destroy the credibility of an article she did not like, despite it having been accepted for publication. Most often, however, commentaries are laudatory and written with an awareness that the authors of the commentaries can get away with praise that would not survive peer-reviewed.

The commentator sought damage control in the face of an utter lack of significant findings. She tried to undermine the validity of an RCT for studying psychosocial interventions. It is true that clinical trials restrict participation to patients with sufficient cognitive intactness, but so do any talk therapies. Her criticism that the sample was not heterogeneous and representative was contradicted by demonstration most patients who were approached participated. Differences in baseline variables are a threat to the validity of results of an RCT, but one of the strengths of such a design is that it serves to minimize such differences. And the commentator gave no evidence that baseline differences between groups undermined the validity of this particular trial.

The commentator raised issues about the standardization of the treatment across settings, but ignored the efforts made to ensure fidelity of delivery. The commentator further ignored the efforts of the investigators to control for nonspecific factors by inclusion of both a standard care and a client centred/comparison control condition. Maybe the richness of standard care precludes finding any effect for the addition of dignity therapy, but that is a valuable finding. It sng that a specialized dignity therapy is not needed. Finally, the commentator’s suggestion that outcomes may not adequately measured is bewildering in the face of the investigators administering a battery of 22 measures for primary outcome and 23 measures for secondary outcome. What does she think is missing?

Given the effort to positively spin solidly null findings in both the article and the commentary, one has to ask under what conditions the investigators and commentator with and willing to concede that this is not a promising line of treatment development?

Why I consider this study important.

Although one does not make a definitive judgment on the basis of a single trial, there is little here to encourage further consideration of dignity therapy as an evidence-based therapy.

The investigators made the testable assumption that end-of-life is a time of considerable psychological distress. They designed an intervention to be delivered by mental health professionals to relieve that distress. They evaluated their intervention in an adequately powered randomized trial, but failed to find any evidence of an effect. The most likely reason is that this large representative sample of palliative care patients was simply not distressed enough to benefit from a mental health intervention.

One could argue that the intervention was not sufficiently intense or mental health oriented, but low baseline distress would preclude a more intense intervention succeeding. There is a floor effect going on, as the investigators recognize.

Basic pilot work would have revealed the surprisingly low levels of distress in this population. A specialty mental health oriented intervention may not be warranted with patients who are not specifically selected for distress. And furthermore, the minority of palliative care patients who show clinically significant distress may not benefit from an intervention like this. They probably would require something more intensive and specifically aimed at reducing distress, with evidence of the intervention having worked in other populations.

But the assumption that cancer patients suffer substantial psychological distress, and patients in palliative care particularly so, is so well entrenched that it is difficult to challenge. Certainly, if investigators applied for a NIH grant and stated that end-of-life is not a time of great psychological distress, they would have risked a low priority for funding.

Giving dying patients a chance to talk: Going from a feature of routine care to a treatment

The investigators failed to find that their structured intervention, dignity therapy offered benefits beyond routine care or a client centred care that had key elements removed: therapists for this condition were not allowed to discuss meaning or purpose.

Suppose investigators had found a therapy mainly provided by psychiatrists and psychologists yielded substantial reductions in distress. These findings would have reinforced existing views of palliative care patients as having high levels of distress. They would further have demonstrated the inadequacy of both good quality routine care and a client centred care modeled after what pastoral counselors offer in providing dying patients with opportunities to discuss their “recollections, hopes, wishes for loved ones; lessons learned and things they wanted remembered by those they were about to leave behind.”

Dignity therapy would be on its way to being an evidence-based treatment with demonstrated ability to reduce significant psychological distress.

And this would propel our moving what had been a feature of routine care to a formal treatment that is of necessity a billable procedure that ultimately requires a diagnosis psychiatrist and psychologist do not provide treatment without diagnosis of disorder. But with what diagnosis?

Presentation1

Click to enlarge

The low levels of psychological distress do not indicate the likelihood of substantial psychiatric disorder in this population. There is already a suitable diagnostic category that is loose and limited in its validity, but that serves administrative purposes, adjustment disorder. Mental health professionals seeking to document the diagnosis of patients being treated with dignity therapy, but lacking any other formal psychiatric disorder, could always bill for adjustment reaction related to death. That mental health professionals wanted to treat the patient and the patient wanted treatment would in itself be an indication of distress and clinically significant impairment. And the cause of the disorder the dignity therapy targeted is impending death.

Making a diagnosis of adjustment disorder often is a successful resolution of acute compensation crisis of mental health professionals. They are not offered treatment in the absence of the diagnosis, they are treating adjustment disorder.

Psychiatrists and psychologists offer treatment in “sessions” that are of necessity of limited duration typically half hour to 50 minutes. They do so with a certain time pressure. They follow  schedules allowing only small time intervals between patients and no running over, even if the discussion is intense and productive.

In contrast, pastoral counselors do not require diagnoses for what they provide to patients. While there may be heavy demands on their time, they do not typically operate with rigidly timed “sessions” and will stay with the patient as long as needed, within practical limits. This particular trial of dignity therapy fails to find any evidence that what they do is less efficacious than a psychologist or psychiatrist.

Much of the content of dignity therapy will differ according to the religious faith of patients. It would seem that pastoral counselors could address these issues with greater authority and knowledge.

As an agnostic, I approached working with pastoral counselors in cancer care with skepticism as to whether they would take advantage of patients’ dying days to win them back to their faith. I never saw any of that happen. Rather, all the pastoral counselors I have ever seen have a deep respect for patients’ level of commitment to their faith – or lack thereof. I would characterize what they do is provide a presence for patients’ talking about things that matter to them, nondirectively guiding the conversation forward, but without imposition of their own views.

How do we evaluate what pastoral counselors do? If what they provide is not considered treatment, there is no issue of whether it is evidence-based treatment. It does not require that patients be distressed in order to be eligible for a discussion, nor that the distress be resolved for the discussion to be “successful.”

An example from primary care illustrates some of the “crisis” of medical care in the United States and in many countries that extends to cancer care, dignity therapy would be implemented.

Click to enlarge http://minnesota.cbslocal.com/2012/10/19/questions-can-trigger-split-visit-charge-at-docs-office/

Click to enlarge
http://minnesota.cbslocal.com/2012/10/19/questions-can-trigger-split-visit-charge-at-docs-office/

Image Methods Of Psychosocial Intervention 2 4 13-1As detailed in a CBS news story, a woman sought a physical examination from her primary care physician, after not receiving regular care for some time. She came prepared with a long list of health questions for which she would seek answers in her appointment. The physician obtained reimbursement from insurance for the physical by billing for all of the specific tests and procedures that had been done. However, he billed the woman by the minute for the discussion. A commentator in the news segment discussing this said, in effect, ‘physicians get paid for doing procedures, not for whether they solve problems and not for talking about it.”

Cancer care is expensive, which means lucrative for both for-profit and not-for-profit settings that can obtain reimbursement. There is increasing emphasis on procedures that are billable to third-party payers and for efficient use of providers’ time.   Provision of basic emotional support and time for patients discussing their concerns are endangered features of routine cancer care. But we should be careful about efforts to preserve these features by making them billable mental health procedures. That entails an inevitable need for a diagnosis and for rationing talk time, restricting it to those patients having a mental health related disorder.

More generally, psychotherapy intervention trials in cancer care do not typically attract patients with clinically significant distress in great numbers. Across trials, only about a third of the patients are clinically distressed. That is about the same as what you find in primary care medical population waiting rooms.

Correspondingly, RCTs of psychotherapy for cancer patients often failed to show a benefit because the patient samples that are recruited are overall insufficiently distressed. Why are the patients there, then?

It is likely that with the increasing scarcity of talk time in routine care, patients are simply seeking a safe place where they will be listened to, and can express and reflect on their feelings, not necessarily solve problems or reduce distress. That can be an entirely valid goal in itself. But problems arise when these discussions are of necessity provided as treatment with mental health professionals. Issues of cost effectiveness and efficacy arise, for which formal evidence is required. And such treatment is typically in short supply, with long waiting lists.

This RCT of dignity therapy came about because a  psychiatrist passionate about what he provides for palliative care patients developed and evaluated a structured mental health treatment. Maybe it is not all bad that the trial was negative. We now have a greater chance of preserving supportive elements of palliative care, including time for patients to talk about their concerns, and can hold off rationing them.

 

Category: cancer, death and dying, evidence-supported, palliative care, psychotherapy | Tagged , , , , | Leave a comment

Neurobalm: the pseudo-neuroscience of couples therapy

soothingsyrup1Special thanks to Professor Keith Laws, blogger at LawsDystopiaBlog and especially the pseudonymous Neurocritic for their helpful comments. But any excesses or inaccuracies are entirely my own responsibility.

 

You may be more able to debunk bad neuroscience than you think.

In my last blog post, I began critically examining whether emotionally focused couples therapy (EFT) could be said to sooth the brains of wives who had received it.

Claims were made in a peer-reviewed article available here and amplified in a University of Ottawa press release that EFT was a particularly potent form of couples therapy. An fMRI study supposedly demonstrated how EFT changed the way the brain encoded threatening situations.

True love creates resilience, turning off fear and pain in the brain

OTTAWA, May 1, 2014— New research led by Dr. Sue Johnson of the University of Ottawa’s School of Psychology confirms that those with a truly felt loving connection to their partner seem to be calmer, stronger and more resilient to stress and threat.

In the first part of the study, which was recently published in PLOS ONE, couples learned how to reach for their lover and ask for what they need in a “Hold Me Tight” conversation. They learned the secrets of emotional responsiveness and connection.

The second part of the study, summarized here, focused on how this also changed their brain. It compared the activation of the female partner’s brain when a signal was given that an electric shock was pending before and after the “Hold Me Tight” conversation.

The experiment explored three different conditions. In the first, the subject lay alone in a scanner knowing that when she saw a red X on a screen in front of her face there was a 20% chance she would receive a shock to her ankles. In the second, a male stranger held her hand throughout the same procedure. In the third, her partner held her hand. Subjects also pressed a screen after each shock to rate how painful they perceived it to be.

Before the “Hold Me Tight” conversation, even when the female partner was holding her mate’s hand, her brain became very activated by the threat of the shock — especially in areas such as the inferior frontal gyrus, anterior insula, frontal operculum and orbitofrontal cortex, where fear is controlled. These are all areas that process alarm responses. Subjects also rated the shock as painful under all conditions.

However, after the partners were guided through intense bonding conversations (a structured therapy titled Emotionally Focused Couple Therapy or EFT), the brain activation and reported level of pain changed —under one condition. While the shock was again described as painful in the alone and in the stranger hand holding conditions (albeit with some small change compared to before), the shock was described as merely uncomfortable when the husband offered his hand. Even more interesting, in the husband hand-holding condition, the subject’s brain remained calm with minimal activation in the face of threat.

These results support the effectiveness of EFT and its ability to shape secure bonding. The physiological effects are exactly what one would expect from more secure bonding. This study also adds to the evidence that attachment bonds and their soothing impact are a key part of adult romantic love. Results shed new light on other positive findings on secure attachment in adults, suggesting the mechanisms by which safe haven contact fosters more stability and less reactivity to threat.

You can find my succinct deconstruction of the press release here.

I invite you to carefully read the article or my last blog post and this one. This shouldhold me tight prepare you to detect some important signs this press release is utter nonsense, designed to mislead and falsely impress clinicians to whom EFT workshops and trainings are marketed. For instance, where in the procedures described in the PLOS One article is there any indication of the “Hold Me Tight” conversation? But that is just the start of the nonsense.

The PLOS One article ends with the claim that this “experiment” was conducted with a rigor comparable to a randomized clinical trial. Reading the article or these blog posts, you should also be able to see that this claim too is utter nonsense.

In my last blog post, I showed a lack of compelling evidence that EFT was better than any other couples treatment. To the extent to which EFT has been evaluated at all, the studies are quite small and all supervised by promoters of EFT. Couples in the EFT studies are recruited to be less martially dissatisfied than in other couples therapy research, and there is some evidence that improvement in marital functioning does not persist after therapy ends.

I called attention to the neuroscientist Neurocritic’s caution against expecting fMRI studies to reveal much about the process or effectiveness of psychotherapy that we do not know already.

Of course, we should expect some effects of psychotherapy to be apparent in pre-post therapy fMRI studies. But we should also expect the same of bowling or watching a TV series for equivalent amount of time. Are we really getting much more than what we what we can observe in couples’ behavior or what they report after therapy to what we can find with an fMRI? And without a comparison group, studies are not particularly revealing.

The larger problem looming in the background is authors intentionally or unintentionally intimidating readers with glib interpretations of neuroscience. Few readers feel confident in their ability to interpret such claims, especially the therapists to whom author Susan Johnson’s workshops are promoted.

This blog post could surprise you.

Maybe it will reassure you that you possess basic critical faculties with which you can debunk the journal article –if you are willing to commit the time and energy to reading and rereading it with skepticism.

I would settle, however, for leaving you thoroughly confused and skeptical about the claims in the PLOS One article. There are lots of things that do not make sense and that should be confusing if you think about them.

Confusion is a healthy reaction, particularly if the alternative is gullibility and being persuaded by pseudoscience.

I begin by ignoring that this was specifically an fMRI study.  Instead, I will look at some numbers and details of the study that you can readily discover. Maybe you would have had to look some things up on the Internet, but many of you could replicate my efforts.

In the text below, I have inserted some numbers in brackets. If you click on them, you will be taken to a secondary blog site where there are some further explanations.

The 23 wives for whom data were reported are in unrepresentative and highly select subsample of the 666 wives in couples expressing an interest in response to advertisements for the study.

With such a small number of participants–

  •  Including or excluding one or two participants can change results [1]. There is some evidence this could have occurred after initial results were known [2].
  • Any positive significant findings are likely to be false, and of necessity, significant findings will be large in magnitude, even when false positives [3].

The sample was restricted to couples experiencing only mild to moderate marital dissatisfaction. So, the study sample was less dissatisfied with their marriages, i.e.,  not comparable to those recruited by other research groups for couples intervention studies.

Given the selection procedure, it was impossible for the authors to obtain a sample of couples with the mean levels of marital dissatisfaction that they reported for baseline assessments.

They stated that they recruited couples with the criteria that their marital dissatisfactionyour sample sizes are small initially be between 80-96 on the DAS. They then report that initial mean DAS score was 81.2 (SD=14.0). Impossible. [4]

Yup, and this throws into doubt all the other results that are reported, especially when they find they need to explain results that did not occur as expected in differences between pre and post EFT fMRI, but only in a complex interaction between pre/post fMRI and initial DAS scores.

Couples therapy was continued until some vaguely defined clinical goal had been achieved.  None of the details were presented that one would expect a scientific paper for how it was decided that this was enough therapy.

We were not told who decided, by what criteria, or with what interrater reliability the judgments were made. We do know Susan Johnson, CEO of the nonprofit and profit-making companies promoting EFT supervised all therapy and the study.

Basically, Dr. Johnson was probably able to prolong the therapy and the follow-up fMRI assessment until she believed that the wives responses would make the therapy look good. And with no further follow-up, she implies that “how the brain processes threat” had been changed without any evidence that whether changes in fMRI persisted or were transient.

This might be fine for the pseudo-magic of a workshop presentation, but is unacceptable for a peer-reviewed article for which readers are supposed to be able to arrive at an independent judgment. And far removed from the experimental control of a clinical trial in which timing of follow up assessments are fixed.

Randomized clinical trials take this kind of control away from investigators and put it into the design and the phenomenon being studied so that maybe investigators can be proved incorrect.

The amount of therapy that these wives received (M= 22-9, range =13-35) was substantially more what was provided in past EFT outcome studies. Whatever therapeutic gains were observed in the sample could not be expected to generalize to past studies. [5]

Despite the therapy that they had received and despite the low levels of marital dissatisfaction with which they had begun, the average couple finishing the study still qualified for entering it. [6]

There is no explanation given why only wives data are presented. No theoretical or clinical rationale is given for not studying husbands or presenting their data as well [7]

A great deal is made of whether particular results are statistically significant or not. However, keep in mind that there was a very small sample size and the seemingly sharp distinction between significant and nonsignificant is arbitrary. Certainly, the size of most differences between results characterized as significant versus nonsignificant is not itself statistically significant. [8]

And, we will see, much is being made of small differences that did not occur for all wives, only those initially with the lowest marital satisfaction.

The number of statistical tests the conducted was many times number of women in the study. The authors do not indicate all the analyses they conducted and selectively reported a subset of the analyses conducted, but there was considerable room for capitalizing on chance.

cherrypickingMultiple statistical tests in  a small sample without adjustment for there being so many tests is a common complaint about small fMRI studies, but this study is a particularly bad example. Happy cherrypicking!

The article and Johnson’s promotional materials make much of differences that were observed from fMRI data collected before and after therapy. But the article never reports results for actually testing these differences.This is an important discovery. Let’s stop and explore it.

The article leads off its presentation of the fMRI results with

The omnibus test of EFT and handholding on all voxels activated in the original Coan et al. handholding study indicated a significant interaction between EFT, handholding and DAS, F (2, 72.6) = 3.6, p= .03 (Alone x EFT x DAS b= 10.3, SE =3.7; Stranger x EFT x DAS b = 2.5, SE =3.3).

What is oddly missing here is any test of the simple interaction between EFT (before versus after therapy) and handholding, i.e., EFT x handholding. The authors do not tell us whether the overall effects on hand holding (partner versus alone versus stranger) were different from before versus after completion of EFT (partner versus alone versus stranger), but that is the difference they want to discuss.

Basically, the authors only report interactions between EFT and handholding as qualified by level of initial marital satisfaction.

So? The authors proposed the simple hypothesis that receiving EFT will affect fMRI results in a situation involving threat of pain. They are about to do a very large number of multiple statistical tests and they want to reassure the reader that they are not capitalizing on chance.

For reassurance, they need an interaction between EFT and handholding in the omnibus test. Apparently they did not get it. What they end up doing is going back and forth between whatever few statistical tests are significant from the well over 100 tests that they conducted for pre-/post-fMRI findings. When most of those tests proved nonsignificant they went to a more complex interaction between fMRI results qualified by wives’ level of marital satisfaction.

NThis  is a classic fishing expedition with a high probability that many of the fish should be thrown back as false positives. And the authors do not even have the fishing license that they hoped  significant omnibus results would have provided.

The article makes repeated references to following up and replicating an earlier study by one of the authors, Jim Coan. That study involved only 16 women selected for higher marital satisfaction, so much so, they were called “supercouples” in press coverage of the study. You can find Neurocritic’s critique of that study here.

The levels of marital satisfaction for the two small samples were discontinuous with each other—any couples eligible for one would be disqualified from the other by a wide margin. Most of the general population of married people would fall in between these two studies in  level of marital satisfaction. And any reference, as these authors make, to findings for women with low marital satisfaction in the Coan study are bunk. The highly select sample in the Coan study did not have any women with low marital satisfaction.

The two  samples are very different, but neither study presented data in a way that allowed direct comparison with the other. Both studies departed from transparent, conventional presentation of data. Maybe the results for the original Coan study were weak as well and were simply covered up. That is suggested in the Neurocritic blog post.

But the problem is worse than that. The authors claim that they are preselected the regions of interest (ROIs) based on the results that Coan obtained with his sample of 16 women. If you take the trouble to examine Table 1 from this article and compare it to Coan’s results, you will see that some of the areas of the brain they are examining did not produce significant results in Coan’s study. More evidence of a fishing expedition.

It is apparent that the authors changed their hypotheses after seeing the data. They did not expect changes in the stranger condition and scrambled to explain these results. If you jump to the Discussion section concerning fMRI results for the stranger condition, you get a lot of amazing post-hoc gobbledygook as the authors try to justify the results they obtained. They should simply have admitted that their hypothesis was not confirmed.

j figure 2.pone.0079314.g002

Figure 2. Point estimates of percent signal change graphed as a function of EFT (pre vs. post) by handholding (alone, stranger, partner) and DAS score.

The graphic representations in Figures 2 and 4 were produced by throwing away two thirds of the available data [9].  Yup. Each line represents results for two wives. It is unclear what interpretation is possible, except that it appears that after throwing away all this data, differences between pre- and post-therapy were not apparent for the group that started with higher marital satisfaction. It is nearly flat in the partner condition, which the authors consider so important.

We do not want to make too much of these graphs because they are based on so few wives. But they do seem to suggest that not much was happening for women with higher marital satisfaction to begin with. And this may be particularly true for the responses when they were holding the hand of their partner. Yikes!

aPLOS Johnson EFT-1

Click to enlarge

In looking at the graphical representations of self-report data in figure 1 and the fMRI data in figures 3 and 5, pay particular attention to the bracketing +/- zones, not just the heights of the bar graphs. Some of the brackets overlap or nearly so and you can see that small differences are being discussed.

And, oh, the neuroscience….

It is helpful to know something about fMRI studies to go much further in evaluating this one. But I can provide you with some light weaponry for dispensing with common nonsense.

First, beware of multiple statistical tests from small samples. The authors reassure us that their omnibus test reduced that threat, but they did not report relevant results and they probably did not obtain the results they needed for reassurance. And the results they expected for the omnibus test would not have been much reassurance anyway, they would still be largely capitalizing on chance. The authors also claim that they were testing regions of interest (ROIs), but if you take a careful look, they were testing other regions of the brain and they generally did not replicate much of Coan’s findings from his small study.

new phrenologySecond, beware of suggestions that particular complex mental functions are localized in single regions of the brain so that a difference for that mental function can be inferred from a specific finding for that region. The tendency of investigators to lapse into such claims has been labeled the new phrenology, phrenology being the 19th century pseudoscience of bumps. The authors of this study lead us into this trap when they attempt to explain in the discussion section findings they did not expect.

Third, beware of glib interpretations that a particular region of the brain is activated in terms of meaning that certain mental processes are occurring. It is often hard to tell what activation means. More activity can mean that more mental activity is occurring or it can mean the same mental activity requires more effort.

Fourth, beware of investigators claiming that changes in activation observed in fMRI data represent changes in the structure of the brain or mental processes (in this case, the authors’ claim that processing of threat had been changed). They are simply changes in activity and they may or may not persist and they may or may not be compensated by other changes. Keep in mind the brain is complex and function is interconnected.

Overall, the MRI results were weak, inconsistent, and obscured by the authors’ failure to report simple pre-post differences in any straightforward fashion. And what is presented really does not allow direct comparison between the earlier Coan study and the present one.

The authors started with the simple hypothesis that fMRI assessments conducted before and after EFT would show changes in wives’ response to threat of pain relative to whether there hand was being held by their partner, a stranger, or no one. Results were inconsistent and the authors were left struggling with findings that after a course of EFT, among other things, the wives were more comfortable with their hands been held by a stranger and less comfortable being alone. And that overall, results that they expected to be simply a result of the wives getting EFT actually were limited to wives who got EFT, but who had the lowest marital satisfaction to begin with.

We could continue our analysis by getting into the specific areas of brain functioning for which significant results were or were not obtained. That is dubious business because so many of the results are likely to be due to chance. If we nonetheless continue, we have to confront post-hoc gobbledygook efforts to explain results like

In the substantia nigra/red nucleus, threat-related activity was generally greater during stranger than partner handholding, F (1, 47.4) = 6.5, p = .01. In the vmPFC, left NAcc, left pallidum, right insula, right pallidum, and right planum polare, main effects of EFT revealed general decreases from pre- to post- therapy in threat activation, regardless of whose hand was held, all Fs (1, 41.1 to 58.6) > 3.9, all ps < .05.

Okay, now we started talking about seemingly serious neuroscience and fMRIs and you are confused. But you ought to be confused. Even a neuroscientist would be confused, because the authors are not providing a transparent presentation of their findings, only a lot of razzle dazzle designed to shock and awe, not really inform.

Magneto, the BS-fighting superhero summoned by Neurocritic

Magneto, the BS-fighting superhero summoned by Neurocritic

In an earlier blog post concerning the PLOS One study, Neurocritic detected nonsense and announced that Magneto, a BS-fighting superhero was being summoned. But even mighty Magneto was thwarted by the confused presentation of ambiguous results and the absence of knowledge of what other results had been examined but were suppressed because they did not support the story the authors wanted to tell.

I’m not sure that I understand this formulation, or that a dissociation between behavioral self-report and dACC activity warrants a reinterpretation of EFT’s therapeutic effects. Ultimately, I don’t feel like a BS-fighting superhero either, because it’s not clear whether Magneto has effectively corrected the misperceptions and overinterpretations that have arisen from this fMRI research.

Some of you may be old enough to recall Ronald Reagan doing advertisements for Generalconfused-man Electric on television. He would always end with “Progress is our most important product.” We have been trying to make sense of neuroscience data being inappropriately used to promote psychotherapy,and have had to  deal with all the confusion, contradictory results, and outright cover-up in an article in PLOS One. To paraphrase Reagan, “Confusion is our most important product.” If you are not confused, you don’t sufficiently grasp what is being done in the PLOS One article and the press coverage and promotional video.

Category: Couples therapy, neuroscience, Uncategorized | Tagged , , , , , | 3 Comments

Soothing psychotherapists’ brains with NeuroBalm

Promoters of Emotionally Focused Psychotherapy offer sciencey claims with undeclared conflicts of interest, cherry picked evidence, and bad science.

The temptation exists for researchers and clinicians to search for the strongest and most provocative version of their knowledge, which will create greatest publicity. The appeal is great; oversell and over-dramatize the result and attention will follow. — Jay Lebow, Editor, Family Process

amygdalaPity the poor therapists. They want to do the best for their clients. They are required to get CE credits for licensure and renewal. But how do they choose their CE courses? With workshop promoters hawking approved courses in thought field therapy and somatic experiencing therapies, therapists can understand that professional organizations’ approval is no guarantee that what they will learn is evidence- supported or that it will mostly help, rather than hurt their clients.

Worse, few therapists have the research background minimally necessary to interpret the sometimes wild claims made promoters of workshops. They are unprepared to evaluate impressively sciency claims that are being made for treatments. And what is more sciency than neuroscience?

Psychotherapy is an inherently uncertain, subjective process. Isolated in sessions with clients, therapists do not have ready ways to monitor what is going on with confidence and decide moment to moment if it is helpful. Even when psychotherapy is manualized, done by the book, there is lots of uncertainty as to what is to be done when, to whom, whether it is done effectively, and how to follow-up.

Neuroscience seems to hold the promise of reducing some of that uncertainty. Exploitative hucksters make lots of money from therapists and their clients with claims that they can use neuroscience to monitor and direct the process of psychotherapy with precision. The hucksters play on the belief that changes in neural functioning can somehow serve to get more at what is “really” going on in therapy, beyond and, if necessary, in sharp contradiction of what therapists observe and clients report.

Enter workshop promoter Susan Johnson.  As told in the New York Times, she claims her emotionally-focused therapy (EFT)

can help couples break out of patterns, “interrupting and dismantling these destructive sequences and then actively constructing a more emotionally open and receptive way of interacting.” She aims to transform relationships “using the megawatt power of the wired-in longing for contact and care that defines our species,” and offers various exercises to restore trust.

Wow! If we could only monitor that interrupting and dismantling and the megawatt power of the “wired-in longing” with neuroscience.

In this blog post I discuss an article in PLOS One in which psychotherapist Johnson teams up with neuroscientist Jim Coan to claim they can do just that.

Ultimately, our handholding paradigm has provided a unique opportunity to test some of the proposed mechanisms of social support in general, and EFT in particular, all at the level of brain function, in vivo.

It is a terrible article, starting with its undisclosed conflicts of interests: Johnson is using the article to promote her psychotherapy products. And when we get past that, the article is shamelessly blatant cherry-picked evidence and poor psychotherapy research. We can learn from it as such.

Click in text for video

Click in text for video

But wait, hold on! Think of me like the greeter at the local Kanuka_BalmMacy’s department store who sprays you with free cologne or maybe rubs your hands with a soothing balm. Before we get into discussing the article, you can get a free sample of the Neurobalm right here that is being used to promote this psychotherapy product. See, no, feel for yourself. This is best appreciated wearing high-quality earphones to do the wonderful soundtrack justice.

Disclaimer: As you can already tell, I find this article outrageous and I am just getting warmed up in explaining how and why.  I am a PLOS One Academic Editor and I have gone on record insisting that promoters of psychotherapy be held to the same standards as the pharmaceutical companies in having to disclose apparent conflicts of interest. And now I have encountered an undisclosed conflict in the very journal where I work for free to provide a small bit of the oversight of the quality and integrity of what readers find there.

Oversight of conflicts of interest is far from perfect, especially when it depends on author disclosure. And oversight of the 24,000 articles published in PLOS One last year cannot be expected to be perfect.  But PLOS One has numerous tools to be self-correcting, especially when faced with undisclosed conflicts of interest. Unlike the journals Prevention Science and Clinical Psychology Review that I have been recently complaining about, PLOS One asks every author about potential conflicts of interest and every article published in PLOS One has an explicit declaration. And unlike these other two journals, PLOS One has explicit, orderly procedures for responding to apparent non-disclosures. An editor like myself, just like any reader, can make a complaint, and PLOS One will evaluate whether an inquiry to authors is necessary in order to decide what further action to take.

disclaimerThe opinions I am going to express here are my own, and not necessarily those of the journal or other members of the editorial board. Thankfully, at Mind the Brain, bloggers are free to speak out for themselves without censorship or even approval from the sponsoring journal. Remember what happened at Psychology Today and how I came to blog here.

The full text of the open access article is available here.

Abstract

Social relationships are tightly linked to health and well-being. Recent work suggests that social relationships can even serve vital emotion regulation functions by minimizing threat-related neural activity. But relationship distress remains a significant public health problem in North America and elsewhere. A promising approach to helping couples both resolve relationship distress and nurture effective interpersonal functioning is Emotionally Focused Therapy for couples (EFT), a manualized, empirically supported therapy that is strongly focused on repairing adult attachment bonds. We sought to examine a neural index of social emotion regulation as a potential mediator of the effects of EFT. Specifically, we examined the effectiveness of EFT for modifying the social regulation of neural threat responding using an fMRI-based handholding procedure. Results suggest that EFT altered the brain’s representation of threat cues in the presence of a romantic partner. EFT-related changes during stranger handholding were also observed, but stranger effects were dependent upon self-reported relationship quality. EFT also appeared to increase threat-related brain activity in regions associated with self-regulation during the nohandholding condition. These findings provide a critical window into the regulatory mechanisms of close relationships in general and EFT in particular.

Before co-authoring this PLOS One article with Susan Johnson, Jim Coan published a closely related 2006 study in Psychological Science. Coan received lots of press coverage, even before the article was available on the Internet. The blogger Neurocritic critiqued the press coverage and then followed up with a blog post critiquing the present PLOS One article.

Neurocritic provides some generally useful wisdom concerning interpreting statements about neural imaging, psychotherapy, and relationships. I think most neuroscientists would agree with him. If you are a therapist, you might want to bookmark his blog post for future reference when you feel slathered with Neurobalm from psychotherapy workshop gurus.

An extended excerpt from Neurocritic

Can neuroscience illuminate the nature of human relationships? Or does it primarily serve as a prop to sell self-help books? The neurorelationship cottage industry touts the importance of brain research for understanding romance and commitment. But any knowledge of the brain is completely unnecessary for issuing take-home messages like tips on maintaining a successful marriage.

In an analogous fashion, we can ask whether successful psychotherapy depends on having detailed knowledge of the mechanisms of “neuroplasticity” (a vague and clichéd term). Obviously not (or else everyone’s been doing it wrong). Of course the brain changes after 12 sessions of psychotherapy, just as it changes after watching 12 episodes of Dexter. The important question is whether knowing the pattern of neural changes (via fMRI) can inform how treatment is administered. Or whether pre-treatment neuroimaging can predict which therapy will be the most effective.

However, neuroimaging studies of psychotherapy that have absolutely no control conditions are of limited usefulness. We don’t know what sort of changes would have happened over an equivalent amount of time with no intervention. More importantly, we don’t know whether the specific therapy under consideration is better than another form of psychotherapy, or better than going bowling once a week.

Problems start  with the article’s title

Soothing the threatened brain: Leveraging contact comfort with Emotionally Focused Therapy.

This title titillates the unwary but triggers an alert among even open minded skeptics.

Some of you may recall that in the tips I gave for writing titles in the Colon Theory of Titles. I suggested that if you reserve one side of the colon in a title for keywords, you might use the other side to have a little fun attracting interest in your article.

Coyne, J. C., & van Sonderen, E. (2012). The Hospital Anxiety and Depression Scale (HADS) is dead, but like Elvis, there will still be citings. Journal of Psychosomatic Research, 73(1), 77-78.

Or the more outrageous

Krauth, S. J., Coulibaly, J. T., Knopp, S., Traoré, M., N’Goran, E. K., & Utzinger, J. (2012). An in-depth analysis of a piece of shit: distribution of Schistosoma mansoni and hookworm eggs in human stool. PLOS Neglected Tropical Diseases, 6(12), e1969.

Fair fun. But the problem with Soothing the Threatened Brain is that many of the therapists think there is something more profound about ‘soothing the brain’, rather than soothing the wife or her heart or her emotions. And this target audience is all too ready to believe that there is something special about promoters of emotion focused therapy claiming it soothes the threatened brain. EFT is better than other marital therapies because it works on the wife’s brain, not just a couple. Other therapies only do relationships or wives but EFT does brains.

If you think I am being too tough on therapists, you can do an informal experiment. Strike up conversations with a few therapists about how they understand the abstract and title of this article or the dramatic video is based on the study. I tried this, and though some expressed some skepticism, they really did not feel competent to argue with a peer-reviewed article or a video of a fMRI assessment.

undeclaredAppearance of conflict of interest

Competing interests: The authors have declared that no competing interests exist.

Funding. This research was supported in part by the International Centre for Excellence in Emotionally Focused Therapy (ICEEFT), a not-for-profit corporation whose mission includes the scientific evaluation of EFT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Additional funding was provided by a National Institute of Mental Health grant, Award Number R01MH080725, awarded to JAC. No additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This does not ring true. The website of the International Center for Excellence and Emotion Focused Therapy lists Susan Johnson as founder and director. Not only did Susan Johnson design the study and interpret the data, she provided supervision of the therapy and somehow decided the particular time point at which women who had received the intervention would get fMRI assessment. As psychotherapy research, this is bizarre and breaks down any benefits of experimental control — the investigator with strong allegiance to her treatment gets to pick when outcome assessments are done rather than having preset times of assessment.

The website for Johnson’s for-profit Ottawa Couples and Family Institute indicates that it shares the same physical space and administrative staff as ICEEFT. The nonprofit corporation serves a number of marketing functions, including maintaining a referral list of therapists who have completed sufficient trainings to obtain the certification, as well as granting permission to otherwise unqualified persons to participate in workshops and get certification so they can practice in the community, often without licensure. Depending on the country or state, people who attend trainings can offer EFT for a fee without having a license or any regulation as long as they do not adopt a title that is regulated and licensed in that particular jurisdiction. This can be a matter of calling themselves a coach or counselor, depending on the jurisdiction.

This is a rather standard arrangement in the therapy training business, whereby profit-making activities are from ostensibly nonprofit certification that extends the market for trainings.

If readers were informed of financial interests at stake….

If a candid conflict of interest statement had been provided, readers would have been more prepared to independently and skeptically evaluate claims starting in introduction and colorful and on scientific language throughout.

For instance, the authors declare in the introduction

Early research suggested that EFT was superior to behavioral marital therapy [20], and a more recent meta-analysis [21] concluded that 70–73% of couples who undergo EFT are no longer relationally distressed at the end of therapy – at an average effect size of d= 1.3.

The evidence of superiority [20] refers to a 1986 study with 15 couples each assigned to EFT or behavioral marital therapy. It was a small underpowered study that can be discounted by its high risk of bias, including the developers testing their own therapy.

Let us get real. Accumulated psychotherapy studies suggest that it is quite unrealistic to expect that a comparison of 15 couples receiving a particular therapy versus 15 couples who were on a waiting list will yield a significant finding. There are also lots of studies suggesting only modest differences between credible, active, structured therapies like EFT versus behavioral marital therapy. It is highly unlikely that such findings would be obtained by honest and transparent reporting of well-done psychotherapy research by anyone without a dog in the fight.

The “more recent meta-analysis [21]” refers to a 1999 poor quality review and meta-analysis also conducted by developers of EFT.

The meta-analysis is worth a look. You can click on the table to the right and see the 7 table EFT meta-analysis 1999-1studies included. All were done by developers of EFT or as a dissertation under the supervision. Three studies are nonrandomized trials, one with only seven couples. All of the randomized trials have 16 or fewer couples assigned to EFT,  and so have less than 50% probability of detecting a positive result if it is present. All of the articles identified as positive studies.

Essentially this is a poor quality meta-analysis of what should have been left as pilot studies conducted by promoters of a therapy in their own lab. The meta-analysis lacks many of the formal features of meta-analyses including forest plots and assessments of risk of bias. The overall effect size of 1.31 is improbably high in the failsafe N of 49 studies being unpublished to unseat a positive evaluation EFT highlights the absurdity of invoking that statistic. If we took failsafe N seriously, we would have to accept that there would have to be almost as many unpublished null trials as there are couples in the published studies.

It is instructive to compare the assessment of the EFT from its developers to a more detached consideration about the same time by a group organized by American Psychological Association to evaluate the evidence-supported status of psychotherapies.

The APA group tell you things that somehow get missed in this review of EFT in PLOS One.

It is important to note that the interventions were restricted to moderately distressed couples because the investigators were concerned that EFT might not be optimal for extremely distressed couples.

Promoters of a psychotherapy rarely lose a comparison between the intervention they are rooting for and a rival comparison-control, particularly in a grossly underpowered study. However, that is exactly what occurred in one of the studies included in the meta-analysis were EFT was bested by strategic therapy in follow-up. This led the APA group to declare systemic therapy “possibly efficacious.” I doubt this kind of upset has ever happened in formal evaluations of psychotherapy research. Of course, the APA group’s rules are kind of loopy and I would not give this evaluation too much credence. Nonetheless, the APA group goes on:

This difference between treatments resulted from the couples in the EFT treatment experiencing significant relapse during the follow-up period. The investigators noted that couples in this study were much more distressed than couples in the Johnson and Greenberg study, which might account for the differences seen in the two studies at follow-up. They cautioned that, with severely distressed couples, time-limited EFT might not be powerful enough to create sufficient intimacy to maintain posttest gains.

Compare this to what Susan Johnson says in the PLOS article:

 Moreover, EFT treatment gains realized among distressed couples at high risk for relapse are stable over two- and three- year assessment periods [22,23]. *

Art Garfunkel’s Mr Shuck ‘N Jive http://tinyurl.com/k7wbwo4

Art Garfunkel’s Mr Shuck ‘N Jive http://tinyurl.com/k7wbwo4

The discrepancy can be explained by picking and choosing particular timepoints for particular tiny psychotherapy follow-up studies with highly selected, on representative patients.  Come on Susan, you’re shucking us. This has little resemblance to finding best evidence, you are just finding evidence to sell your psychotherapy.

The APA group also noted some differences in the outcomes of a waitlist control group in behavioral marital therapy conducted by its originator, Neil Jacobson:

Whereas 50% of James’s [an EFT study done by a dissertation student] waiting list couples improved without treatment, the waiting list couples in the BMT studies reviewed by Jacobson et al. showed an improvement rate of only 13.5%.

So, even the waitlist control groups do better in the EFT versus BMT studies.

Johnson continues her overview of the literature in the PLOS One introduction.

EFT has also been successfully applied to couples in which one or both partners are coping with a history of childhood sexual abuse [28,29], major depression [30,31], and even breast cancer [32].

You are shucking us again, Susan. What constitutes being “successfully applied”? These are not randomized controlled studies. For instance, the application to breast cancer involved only to patients. You are hardly in a position to crow about this. Shame on you.

When I read an introduction to a scientific article, I expect a much more nuanced, balanced consideration of the existing literature in a way that leads up to the research question of a particular study. What we get in this introduction in no way resembles us. Rather, an author with undeclared conflicts of interest is shamelessly hawking her psychotherapy product.

But stay tuned. In Part Two of this blog post I will offer a detailed critique of the methodology and interpretation of the actual study. It would be great if readers read the open access PLOS One article ahead of my next post and were prepared with their own interpretations and maybe even to dispute mind.

*The EFT literature and apparently what is said in workshops provide strong claims about outcomes that are echoed in the advertisements of therapists who get certified in EFT. For instance, the website of a Philadelphia-based therapist claims

EFT ad

Click to enlarge

This is either an exaggeration or outright fraud if it is supposed to represent the likelihood of a positive outcome of a couple coming to this therapy.

conflict of interest

 

Category: Conflict of interest, evidence-supported, mental health care, neuroscience, psychotherapy | Tagged , , , | 4 Comments