Re-examining Ellen Langer’s classic study of giving plants to nursing home residents

 Memories of famous studies do not serve us well

A journalist emailed me a request for an interview about the work of Ellen Langer. I was busy hopping from Leipzig to Istanbul and on to Groningen with undependable Internet quality. So I suggested that we communicate by email instead of Skype.

adding yearsI recalled Langer’s classic study being portrayed as demonstrating that nursing home residents provided with a plant to take care of  lived longer than if they had a plant tended by the staff. I had not reread the study recently, but was sure that that was an oversimplification of what was done in the study and its findings. Yet, I knew this was still the way the study was widely cited in the media and undergraduate psychology textbooks. I did not recall much doubt being expressed.

From my current perspective, we should always be  skeptical about small psychosocial studies claiming effects on mortality. Especially when the studies were not actually planned with mortality as the primary outcome, or with appropriate participant selection, uniformity of baseline characteristics between groups, and experimental controls. These are needed to guarantee that the only differences between experimental and control conditions are attributable to by group assignment.

There is a long history of caims about mortality found in small studies not holding up under scrutiny. The consistency of this happening should provide the prior probabilities with which we approach the next such claim. Very often,  apparent positive results can be explained by data have been manipulated to support such a conclusion  Even when we cannot see precisely how the data were manipulated, apparent effects cannot be replicated.

Surely there may yet be a replicable finding lurking somewhere out there.

But I operate with  the principle “Beware of apparent evidence of improved survival from small [underpowered] psychosocial studies  not designed to look for that particular effect.” I begin examining a new claim with a healthy dose of skepticism. And we should always ask: by what plausible, established or testable biological mechanism would such an effect be expected?

Death is a biomedical outcome. I continue to be amazed at how people, even professionals, who would be dismissive of claims for medical interventions extending life be made post hoc from tiny studies nonetheless going gaga when the intervention is psychosocial. I’ve come to appreciate that finding a link between manipulating the mind and extending life is a holy grail that keeps getting pursued, despite a history of nothing being found.holy-grail

I’ve debunked claims like that of Drs. David Spiegel and Fawzy Fawzy [sic] about psychological interventions extending the lives of cancer patients. These claims didn’t hold up under careful scrutiny. And findings claim for these original studies cannot be replicated in larger, better designed studies.

Why should we care? Isn’t keeping hope alive a good thing? Claims about the mind triumphing over body are potentially harmful not only because they mislead cancer patients about their abilities to control their health outcomes. More importantly, cancer patients are left with the mistaken belief that they succumbed to cancer, they and their loved ones can blame them for not exerting appropriate mind control.

In my email to the journalist, I expressed skepticism and was eventually quoted:

The study that arguably made Langer’s name — the plant study with nursing-home patients — wouldn’t have “much credibility today, nor would it meet the tightened standards of rigor,” says James Coyne, professor emeritus of psychology at the University of Pennsylvania medical school and a widely published bird dog of pseudoscience. (Though, as Coyne also acknowledges, “that is true of much of the work of the ’70s, including my own concerning depressed persons depressing others.”) Langer’s long-term contributions, Coyne says, “will be seen in terms of the thinking and experimenting they encouraged.”

mind body mindsetThe journalist had told me his article would appear in the New York Times at the end of October. At first, I didn’t bother to check, but then I saw the video of the CBS morning News extended interview with Langer. It was shocking.

I learned of her claims from an unpublished study that she could lower blood glucose levels of women with diabetes by manipulating their sense of time. And she had lowered women’s blood pressure by giving them a hair cut and coloring. And now she had wild plans to attempt to shrink the tumors of women with metastatic breast cancer.

The New York Times article had a title that warned of hype and hokum ahead –

What if Age Is Nothing but a Mind-Set?

The Times article described plaintiff’s intervention:

Langer gave houseplants to two groups of nursing-home residents. She told one group that they were responsible for keeping the plant alive and that they could also make choices about their schedules during the day. She told the other group that the staff would care for the plants, and they were not given any choice in their schedules. Eighteen months later, twice as many subjects in the plant-caring, decision-making group were still alive than in the control group.

To Langer, this was evidence that the biomedical model of the day — that the mind and the body are on separate tracks — was wrongheaded.

The  study was conducted in the 70s, in the days of showman research like that of Philstanford prisom Zimbardo (whom Langer had as a professor) and Stanley Milgram. Many of the social psychological studies of that period were more carefully attempts at dramatic demonstration of the investigators’ ideas, than rigorously controlled experiments. And they don’t hold up to scrutiny. You can find thorough debunking of the Zimbardo Stanford prison experiment – which had only 24 student guard and prisoners – as well as Milgram’s obedience study, but the perceptions of such work seem immune to being unsettled.

In re-examining Langer’s 70′s nursing home study, we shouldn’t chastise her for not anticipating later developments like CONSORT  or doubts being expressed about strong post-hoc claims made from underpowered studies. But we should not now be  uncritically accepting such flawed, older studies as evidence, particularly when it comes to physical health effects and mortality. So, I’m going to be looking at the validity of currently citing Ellen Langer’s work, not chastising her for what she didn’t do in the 70s.

You can find a PDF here of

Rodin, J., & Langer, E. J. (1977). Long-term effects of a control-relevant intervention with the institutionalized aged. Journal of Personality and Social Psychology, 35(12), 897.

The study is described as a follow-up of an earlier study, although the relationship between the two articles is more complex and of interest in itself. You can find a PDF here of

Rodin, J., & Langer, E. (1976). The effect of choice and enhanced personal responsibility for the aged: A field experiment in an institutional setting. Journal of Personality and Social Psychology, 34(2), 191-198.

The intervention

The intervention was more complex than simply giving nursing home residents plans to tend.

A later independent attempt at replication describes Langer’s original intervention in succinct and accurate terms [but you can click here to get the full details in Langer’s article]:

In their study, an intervention designed to encourage aged nursing home residents to feel more control and responsibility for day-to-day events was used. One group of residents was exposed to a talk delivered by the hospital administrator emphasizing their responsibility for themselves. A second group heard a communication that stressed the staff’s responsibility for them as patients. These communications were bolstered by offering to subjects in the experimental group plants that they could tend, whereas residents in the comparison group were given plants that were watered by the staff.

Follow the numbers!: Scrutinizing how Langer got from the first study to the second

The original 1976 study reported

There were 8 males and 39 females in the responsibility-induced condition (all fourth-floor residents) and 9 males and 35 females in the comparison group (all second-floor residents). Residents who were either completely bedridden or judged by the nursing home staff to be completely noncommunicative (11 on the experimental floor and 9 on the comparison floor) were omitted from the sample. Also omitted was one woman on each floor, one 40 years old and the other 26 years old, due to their age. Thus, 91 ambulatory adults, ranging in age from 65 to 90, served as subjects.

However, statistical analyses reported in table 1 of the article were based on 24 residents being in the responsibility induced condition and 28 in the comparison condition. A footnote explained

All of the statistics for the self-report data and the interviewers’ ratings are based on 45 subjects (25 in the responsibility-induced group and 20 in the comparison group), since these were the only subjects available at the time of the interview.

The 1977 follow-up study reported

Twenty-six of the 52 were still in the nursing home and were retested. Twelve had died, and 14 had been transferred to other facilities or had been discharged The differences between treatment conditions in mortality are considered in a subsequent section The groups did not differ in transfer or discharge rate Only 9 other persons from the original sample of 91 were available for retesting Since they had incomplete nurses’ ratings in the first study, they are only included in follow-up analyses not involving change scores in nurses’ evaluations Almost all of the participants now lived in different rooms, since the facility had completed a more modern addition 13 months after the experimental treatment.

The 1977 follow-up study supplemented these data with a new control group of residents who had not participated in the original study.

We also evaluated a small control group of patients who had not participated in the first study due to a variety of scheduling problems Five had previously lived on the same floor as subjects in the responsibility-induced condition, and 4 lived on the same floor as the comparison group All were now living in the new wing. The average length of time in the nursing home was 3.9 years, which was not reliably different for the three groups.

Please Click to Enlarge

Please Click to Enlarge

Mortality data

The 1977 follow-up study reported 18 month follow-up data, with 7 deaths for the 47 (15%) of the residents in the responsibility-induced intervention group and 13 deaths (30%) in the composite comparison group. These data were subject to an arcsine transformation frequencies and described as statistically significant (z = 3.14, p< .01).

The Erratum

An erratum statement corrected the z-score to z = 1.74, p<.10.

The outcome is therefore only marginally significant, and a more cautious interpretation of the mortality findings than originally given is necessary.

The APA electronic bibliographic resource, PsyINFO does not indicate that the existence of the erratum when the study is accessed, nor is there an indication at the journal. According to Google scholar, Langer’s nursing home study has been cited 848 times, but the erratum cited only 6 times. Surveying accounts of this study in the scientific literature and social media, I see no indication that there were no significant differences. Sure, it was a small study, but we can’t assume that accumulation of more participants would preserve the 2:1 difference between groups. Highly unlikely.

Attempted Replication

Richard Shultz and Barbara Hartman Hanusa followed up on Shultz’s earlier study investigating effects of manipulations of control and predictability on the physical and psychological well-being of nursing home residents. Like Langer’s study, the examination of mortality was post-hoc.

The original Shultz study was a randomized field experiment involving four groups:

(1) A control-enhanced condition in which the nursing home residents could control the frequency and length of visits from college student volunteers.

(2) A yoked group of residents who got the same frequency and length of visits, but without having control.

(3) A predict condition in which the residents were told when the volunteers were would visit, but could not control these visits were not informed how long they would last.

(4) A no-visit control condition.

Mortality data were examined in the follow-up article:

Two persons in the predict group and one person in the control-enhanced group died prior to the 24-month follow-up. A fourth person, also in the control-enhanced group, died between the 30- and 42-month follow-up.4 Fisher’s exact probability test was used to analyze these data (Siegel, 1956). Combining the no-treatment with the random group and the predict with the control-enhanced group yields a marginally significant Fisher’s exact probability of .053.

Like Langer’s study, this one involved post-hoc construction of lumped and split off groups. The marginally significant results would be radically changed by addition or removal of a single participant.

The collective memory of Langer’s study

The Langer study continues to be hailed as a breakthrough study in both the thinking of Langer and the larger field in terms of the powers of predictability and control in mind-body  interventions seeking effects on mortality.

Aside from not achieving significant effects, the study is deficient by contemporary standard in numerous ways. The mortality effect is not arise in a randomized trial, nor even the original sample, but involves a post-hoc construction of a comparison group. There’s a high risk of bias in terms of a lack of

  • Allotment concealment.
  • Blinding of investigators and probably nurse raters.
  • Equalization of baseline characteristics in the two groups.
  • Rigorous experimental control of what happened between randomization and final follow-up.

There were noteworthy changes in conditions in the nursing home in the interim.

The complex intervention is not reducible to simply giving nursing home residents responsibilities for plants.

The study nonetheless makes claims about a biomedical outcome, mortality. Even in 1978, the study would have been judged deficient if the intervention had been biomedical, rather than psychosocial. It certainly would not have gotten as much attention either then or now.

Ellen Langer’s incessant self-promotion has had no small part in preserving and extending the claims of mortality effect.  Note that the study is widely referred to as the Langer study rather than the Rodin and Langer study.

Langer does not provide any plausible mechanism by which effects on mortality could have occurred. Overall, she asserts that the intervention manipulated responsibility and control, but even if that is the principal psychological effect, is on clear how this translates into better all cause mortality in a population already afflicted by diverse life-threatening conditions.

We have to be careful about the tooth fairy science involved in trying to test mechanismstooth fairy for effects we don’t even know exist. But whatever occurred in this study is unlikely to be tied to the residents having to take care of plants. Yet that is one of features that is so enticing. All of us have anecdotes about older people being kept alive by having to care for their dog or cat. But even if there is some validity to these observations, it’s unlikely the beneficial effects of having a pet are tied to the power of attitude, rather than the modification of health-related behaviors.The original 1976 study ended with

The practical implications of this experimental demonstration are straightforward. Mechanisms can and should be established for changing situational factors that reduce real or perceived responsibility in the elderly. Furthermore, this study adds to the body of literature (Bengston, 1973; Butler, 1967; Leaf, 1973; Lieberman, 1965) suggesting that senility and diminished alertness are not an almost inevitable result of aging. In fact, it suggests that some of the negative consequences of aging may be retarded, reversed, or possibly prevented by returning to the aged the right to make decisions and a feeling of competence.

Not so straightforward, and these noble sentiments are not empirically advanced by the actual results of these studies, particularly the follow-up study. Yet I have no confidence that debunking will have any effect on how it is been mythologized. Some people have a need for such examples of mind triumphing over bodily limitations, even if  the examples are not true.

Category: alternative medicine, cancer, PLOS One commentary | 6 Comments

Positive psychology interventions for depressive symptoms

POSITIVE THINKINGI recently  talked with a junior psychiatrist about whether she should undertake a randomized trial of positive psychology interventions with depressed primary care patients. I had concerns about whether positive psychology interventions would be acceptable to clinically depressed primary care patients or offputting and even detrimental.

Going back to my first publication almost 40 years ago, I’ve been interested in the inept strategies that other people adopt to try to cheer up depressed persons. The risk of positive psychology interventions is that depressed primary care patients would perceive the exercises as more ineffectual pressures on them to think good thoughts, be optimistic and snap out of their depression. If depressed persons try these exercises without feeling better, they are accumulating more failure experiences and further evidence that they are defective, particularly in the context of glowing claims in the popular media of the power of simple positive psychology interventions to transform lives.  Some depressed people develop acute sensitivity to superficial efforts to make them feel better. Their depression is compounded by their sense of coercion and invalidation of what they are so painfully feeling. This is captured in the hilarious Ren & Stimpy classic

happy happy 2

 

Happy Helmet Joy Joy song video

 

Something borrowed, something blue

By positive psychology interventions, my colleague and I didn’t have in mind techniques that positive psychology borrowed from cognitive therapy for depression. Ambitious positive psychology school-based interventions like the UK Resilience Program incorporate these techniques. They have been validated for use with depressed patients when part of Beck’s cognitive therapy, but are largely ineffective when used with nonclinical populations that are not sufficiently depressed to register an improvement. Rather, we had had in mind interventions and exercises that are distinctly positive psychology.

Dr. Joan Cook, Dr.Beck, and Jim Coyne

Dr. Joan Cook, Dr.Beck, and Jim Coyne

I surveyed the positive psychology literature to get some preliminary impressions, forcing myself to read the Journal of Positive Psychology and even the Journal of Happiness Studies. I sometimes had to take breaks and go see dark movies as an antidote, such as A Most Wanted Man and The Drop, both of which I heartily recommend. I will soon blog about the appropriateness of positive psychology exercises for depressed patients. But this post concerns a particular meta-analysis that I stumbled upon. It is open access and downloadable anywhere in the world. You can obtain the article and form your own opinions before considering mine or double check mine:

Bolier, L., Haverman, M., Westerhof, G. J., Riper, H., Smit, F., & Bohlmeijer, E. (2013). Positive psychology interventions: a meta-analysis of randomized controlled studies. BMC Public Health, 13(1), 119.

I had thought this meta analysis just might be the comprehensive, systematic assessment of the literature for which I searched. I was encouraged that it excluded positive psychology interventions borrowed from cognitive therapy. Instead, the authors sought studies that evaluated

the efficacy of positive psychology interventions such as counting your blessings [29,30], practicing kindness [31], setting personal goals [32,33], expressing gratitude [30,34] and using personal strengths [30] to enhance well-being, and, in some cases, to alleviate depressive symptoms [30].

But my enthusiasm was dampened by the wishy-washy conclusion prominently offered in the abstract:

The results of this meta-analysis show that positive psychology interventions can be effective in the enhancement of subjective well-being and psychological well-being, as well as in helping to reduce depressive symptoms. Additional high-quality peer-reviewed studies in diverse (clinical) populations are needed to strengthen the evidence-base for positive psychology interventions.

Can be? With apologies to Louis Jordan, is they or ain’t they effective? And just why is additional high-quality research needed to strengthen conclusions? Because there are only a few studies or because there are many studies, but mostly of poor quality?

I’m so disappointed when authors devote the time and effort that meta-analysis requires and then beat around the bush such wimpy, noncommittal conclusions.

A first read alerted me to some bad decisions that the authors had made from the outset. Further reads showed me how effects of these decisions were compounded by the poor quality of the literature of which they had to make sense.

I understand the dilemma the authors faced. The positive psychology intervention literature  has developed in collective defiance of established standards for evaluating interventions intended to benefit people and especially interventions to be sold to people who trust they are beneficial. To have something substantive to say about positive psychology interventions, the authors of this meta analysis had to lower their standards for selecting and interpreting studies. But they could have done a better job of integrating acknowledgement of problems in the quality of this literature into their evaluation of it. Any evaluation should come with a prominent warning label about the poor quality of studies and evidence of publication bias.

The meta-analysis

Meta-analyses involve (1) systematic searches of the literature; (2) selection of studies meeting particular criteria; and (3) calculation of standardized effect sizes to allow integration of results of studies with different measures of the same construct. Conclusions are qualified by (4) quality ratings of the individual studies and by (5) calculation of the overall statistical heterogeneity of the study results.

The authors searched

PsychInfo, PubMed and the Cochrane Central Register of Controlled Trials, covering the period from 1998 (the start of the positive psychology movement) to November 2012. The search strategy was based on two key components: there should be a) a specific positive psychology intervention, and b) an outcome evaluation.

They also found additional studies by crosschecking references of previous evaluations of positive psychology interventions.

To be selected, a study had to

  • Be developed within the theoretical tradition of positive psychology.
  • Be a randomized controlled study.
  • Measure outcomes of subjective well-being (such as positive affect), personal well-being (such as hope), or depressive symptoms (Such as Beck Depression Inventory).
  • Have results reported in a peer-reviewed journal.
  • Provide sufficient statistics to allow calculation of standardized effect sizes.

I’m going to focus on evaluation of interventions in terms of their ability to reduce depressive symptoms. But I think my conclusions hold for the other outcomes.

The authors indicated their way of assessing the quality of studies (0 to 6) was based on a count derived from an adaptation of the risk of bias items of the Cochrane collaboration. I’ll discuss their departures from the Cochrane criteria later, but these authors’ six criteria were

  • Adequacy of concealment of randomization.
  • Blinding of subjects to which condition they had been assigned.
  • Baseline comparability of groups at the beginning of the study.
  • Whether there was an adequate power analysis or  at least 50 participants in the analysis.
  • Completeness of follow up data: clear attrition analysis and loss to follow up < 50%.
  • Handling of missing data: the use of intention-to-treat analysis, as opposed to analysis of only completers.

The authors used two indicators to assess heterogeneity

  • The Q-statistic. When significant it calls for rejection of null-hypothesis of homogeneity and indicates that the true effect size probably does vary from study to study.
  • The  I2-statistic, which is a percentage indicating the study-to-study dispersion of effect sizes due to real differences, beyond sampling error.

[I know, this is getting technical, but I will try to explain as we go. Basically, the authors estimated the extent to which the effect size they obtained could generalize back to the individual studies. When individual studies vary very much, an overall effect size for a set of studies can be very different  from any for an individual intervention. So without figuring out the nature of this heterogeneity and resolving it, the effect sizes do not adequately represent individual studies or interventions.]

One way of reducing heterogeneity is to identify outlier studies that have much larger or smaller effect sizes than the rest. These studies can simply be removed from consideration or sensitivity analyses can be conducted, in which analyses are compared that retain or remove outlier studies.

The authors expected big differences across the studies and so adopted the criteria for keeping a study  of Cohen’s d (standardized difference) between intervention and control group of 2.5 standard deviations. That is huge. The average psychological intervention for depression differs from a waitlist or no treatment group by .62, but from another active treatment by only d = .20. How could these authors think that even an effect size of 1.0 with largely nonclinical populations could be expected for positive psychology interventions? They are at risk of letting in a lot of exaggerated and nonreplicable results. But stay tuned.

The authors also examined the likelihood that there was a publication bias in the studies that they were able to find, using funnel plots, the Orwin’s fail-safe number and the Trim and Fill method. I will focus on the funnel plot because it is graphic, but the other approaches provide similar results.  The authors of this meta analysis state

A funnel plot is a graph of effect size against study size. When publication bias is absent, the observed studies are expected to be distributed symmetrically around the pooled effect size.

Hypothetical funnel plot indicating bias CLICK TO ENLARGE

Hypothetical funnel plot indicating bias
CLICK TO ENLARGE

 

Results

At the end of the next two sections, I will conclude that the authors were overly generous in their evaluation of positive psychology interventions. The quality of the available studies precludes deciding whether positive psychology interventions are effective. But don’t accept this conclusion without me having to document my reasons for it. Please read on.

Click to enlarge

Click to enlarge

The systematic search identified 40 articles presenting results of 39 studies. The overall quality ratings of the studies were quite low [See Table 1 in the article]. There was a mean score of 2.5 (SD = 1.25). Twenty studies were rated of low quality (<3), 18 of medium quality (3-4), one received a rating of 5. The studies with the lowest quality had the largest effect sizes (Table 4).

Fourteen effect sizes were available for depressive symptoms. The authors report an overall small effect size of positive psychology interventions on depressive symptoms of .23. Standards for evaluating effect sizes are arbitrary, but this one would generally be considered small.

There was multiple indications  of publication bias, including  funnel plots of these effect sizes, and it was estimated that 5 negative findings were missing. According to the authors

Funnel plots were asymmetrically distributed in such a way that the smaller studies often showed the more positive results (in other words, there is a certain lack of small insignificant studies).

When the effect sizes for the missing studies were imputed (estimated), the adjusted overall effect size for depressive symptoms was reduced to a nonsignificant .19.

To provide some perspective, let’s examine the statistics for approximately the effect size of .20. There is a 56% probability (as opposed to a 50/50 probability) that a person assigned to a positive psychology intervention would be better off than a person assigned to the control group.

Created by Kristoffer Magnusson. http://rpsychologist.com/d3/cohend/

Created by Kristoffer Magnusson. http://rpsychologist.com/d3/cohend/

But let’s give a closer look to a forest plot of the studies with depressive symptoms as an outcome.

As can be seen in the figure below, each study has a horizontal line in the forest plot and most have a square box in the middle. The line represents the 95% confidence interval for the standard mean difference between the positive psychology intervention and its control group, and the darkened square is the mean difference.

forest plot

Click to enlarge

Note that two studies, Fava (2005) and Seligman, study 2 (2006) have long lines with an arrow at the right, but no darkened squares. The arrow indicates the line for each extends beyond what is shown in the graph. The long line for each indicates wide confidence intervals and imprecision in the estimated effect. Implications? Both studies are extreme outliers with large, but imprecise estimates of effect sizes. We will soon see why.

There are also vertical lines in the graph. One is marked 0,00 and indicates no difference between the intervention and control group. If the line for an individual study crosses it, the difference between the intervention and control group was not significant.

Among the things to notice are:

  • Ten of the 14 effect sizes available for depressive symptoms across the 0,00 line indicating that individual effect sizes were not significant.
  • The four lines that don’t cross this line and therefore had significant effects were Fava (2005), Hurley, Mongrain, Seligman (2006, study 2).

Checking Table 2 for characteristics of the studies, we find that Fava compared 10 people receiving the positive psychology intervention to a control group of 10. Seligman had 11 people in the intervention group and 9 in the control group. Hurley is listed as comparing 94 people receiving the intervention to 99 controls. But I checked the actual study and these numbers represent a substantial loss of participants from the 151 intervention and 164 control participants who started the study. Hurly lost 39% of participants from the Time 2 assessment and analyzed only completers, without intent to treat analyses or imputation (which would have been inappropriate anyway because of the high proportion of missing data).

I cannot make sense of Mongrain’s studies being counted as positive. A check with Table 1 indicates that 4 studies with Mongrain as an author were somehow combined. Yet, when I looked them up, one  study reports no significant differences between intervention and control conditions for depression, with the authors explicitly indicated that they failed to replicate Seligman et al (2006). A second study reports

In terms of depressive symptoms, no significant effects were found for time or time x condition. Thus, participant reports of depressive symptoms did not change significantly over time, or over time as a function of the condition that they were assigned to.

A third study reported significant effects for completers, but nonsignificant effects in multilevel modeling analyses that attempted to compensate for attrition. The fourth study  again failed to find that depressive symptoms’ decline over time was a function of which group to which participants were assigned, in multilevel analyses attempting to compensate for attrition.

So, Mongrain’s studies should not be counted as having a positive effect size for depressive symptoms unless perhaps we accept a biased completer analysis over multilevel modeling. We are left with Fava and Seligman’s quite small studies and Hurley’s study relying on completer analyses without adjustment for substantial attrition.

By the authors’ ratings, the quality of these studies was poor. Fava score and Seligman both scored 1 out of 6 in the quality assessments. Hurley scored 2.  Mongrain scored 4 and the other negative studies had a mean score of 2.6. So, any claim from individual studies of positive psychology interventions have an effect on depressive symptoms depend on two grossly underpowered studies and another study with analysis of only completers in the face of substantial attrition. And the positive studies tend to be of lower quality.

worse than itBut the literature concerning positive psychology interventions is worse than it first looks.

The authors’ quality ratings are too liberal.

  • Item 3, Baseline comparability of groups at the beginning of the study, is essential if effect sizes are to be meaningful. But it becomes meaningless if such grossly underpowered studies are included. For instance, it would take a large difference in baseline characteristics of Fava’s 8 intervention versus 8 control participants to be significant. That there were no significant differences in the baseline characteristics is very weak as assurance that individual or combined baseline characteristics did not account for any differences that were observed.
  • Item 4, Whether there was an adequate power analysis or at least 50 participants in the analysis can be met in either of 2 ways. But we don’t have evidence that the power analyses were conducted prior to the conduct of the trial and having at least 50 participants does not reduce bias if there is substantial attrition.
  • Item 5, Completeness of follow up data: clear attrition analysis and loss to follow up < 50%, allows studies with substantial loss to follow up to score positive. Hurly’s loss of over a third of participants who were randomized rules out generalization of results back to the original sample, much less an effect size that can be integrated with other studies that did not lose so many participants.

The authors of this meta analysis chose to “adapt,” rather than simply accept the validated Cochrane Collaboration risk of bias assessment. Seen here, one Cochrane criterion is whether the randomization procedure is described in sufficient detail to decide that the intervention and control group would be comparable except for group assignment. These studies typically did not provide sufficient details of any care having been taken to ensure this or any details whatsoever except that the study was randomized.

Another criterion is whether there is evidence of selective outcome reporting. I would not score any of these studies as demonstrating that all outcomes were reported. The issue is that authors can assess participants with a battery of psychological measures, and then pick those that differed significantly between groups to be highlighted.

The Cochrane Collaboration includes a final criterion, “other sources of bias.” In doing meta analyses of psychological intervention studies, consider investigator allegiance is crucial because the intervention for which the investigator is rooting almost always does better.  My group’s agitation about financial conflicts of interest has won us the Bill Silverman award from the Cochrane Collaboration. The collaboration is now revising its other sources of bias critirion so that conflicts of interest are to be taken into account. Some authors of articles about positive psychology interventions profit immensely from marketing positive psychology merchandise. I am not aware of any of the studies included in the meta analysis having disclosures of conflict of interest.

If you think I am being particularly harsh in my evaluation of positive psychology interventions, you need only to consult my numerous other blog posts about meta analyses and see the consistency with which I apply standards. And I have not even gotten to my pet peeves in evaluating intervention research – overly small cell size and “control groups” that are not clear on what is being controlled.

The number of participants some of these studies is so small that the intended effects of randomization cannot be assured and any positive findings are likely to be false positives. If the number of participants in either the intervention or control group is less than 35, there is less than 50% probability of detecting a moderate sized positive effect, even if it is actually there. Put differently, there is more than 50% probability that any significant finding will be false positive. Inclusion of studies with so few participants undermines the validity of other quality ratings. We cannot tell why Fava or Seligman did not have one more or one fewer participant. These are grossly underpowered studies and adding or dropping a single participant in either group could substantially change results.

Then there is the question of control groups. While some studies simply indicate waitlist, others had an undefined treatment as usual, or no treatment, and a number of others indicate “placebo,” apparently following Seligman et al’s  (2005):

Placebo control exercise: Early memories. Participants were asked to write about their early memories every night for one week.

As Mongrain correctly noted, this is not a “placebo.” Seligman et al. and the studies modeled after it failed to include any elements of positive expectation, support, or attention that are typically provided in conditions labeled “placebo.” Mongrain and her colleagues attempted to provide such elements in their control condition, and perhaps this contributed to their negative findings.

A revised conclusion for this meta-analysis

Instead of the wimpy conclusion of the authors presented in their abstract, I would suggest acknowledgment that

The existing literature does not provide robust support for the efficacy of positive psychology interventions for depressive symptoms. The absence of evidence is not necessarily evidence of an absence of an effect. However, more definitive conclusions await better quality studies with adequate sample sizes and suitable control of possible risk of bias. Widespread dissemination of positive psychology interventions, particularly with glowing endorsements and strong claims of changing lives, is premature in the absence of evidence they are effective.

Can the positive psychology intervention literature be saved from itself?

Studies of positive psychology interventions are conducted, published, and evaluated in a gated community where vigorous peer review is neither sought nor apparently effective in identifying and correcting major flaws in manuscripts before they are published. Many within the positive psychology movement  find this supportive environment an asset, but it has failed to produce a quality literature demonstrating positive interventions can indeed contribute to human well-being. Positive psychology intervention research has been insulated from widely accepted standards for doing intervention research. There is little evidence that any of manuscripts reporting the studies were submitted with completed CONSORT checklists, which are now required by most journals. There’s little evidence of awareness of Cochrane risk of bias assessment or of steps been taking to reduce bias.

In what other area of intervention research are claims for effectiveness so dependent on such small studies of such low methodological quality published in journals in which there is only limited independent peer review and such strong confirmatory bias?

As seen on its Friends of Positive Psychology listserv, the positive psychology community is averse to criticism, even constructive criticism from within its ranks. There is dictatorial one-person rule on the listserv. Dissenters routinely vanish without any due process or notice to the rest of the listserv community, much like under disappearances under a Latin American dictatorship.

There are many in the positive psychology movement who feel that that the purpose of positive psychology research is to uphold the tenets of the movement and show, not test the effectiveness of its interventions for changing lives. Investigators who want to evaluate positive psychology interventions need to venture beyond the safety and support of Journal of Positive Psychology and Journal of Happiness Studies to seek independent peer review, informed by widely accepted standards for evaluating psychological interventions.

Category: Cochrane Collaboration, depression, meta analysis, positive psychology | Tagged , , , , | 13 Comments

Is there benefit to adding psychotherapy to antidepressants?

depressed person

Special thanks to Don Klein, MD and Bruce Thyer, PhD for helpful discussions, but all opinions expressed are the author’s alone.

Is there any benefit to adding psychotherapy to well-managed treatment with antidepressants? This clinically important question was addressed in a large-scale, exceptionally well-resourced study.

Despite appearing in the respected JAMA Psychiatry, the article will not get the attention it deserves. Its results are complex and nuanced. I had to read it carefully a number of times, along with its accompanying editorial to grasp its full significance. However, the study’s disappointing, downright disconcerting findings will keep it from getting widely disseminated.

There was no press release for the study and very little press coverage so far.  One of the few mentions in the media is balanced –once you get passed the hyped title — and includes quotes from the lead author:

“We know they both work so you assume when you put them together it’s going to work better,” says lead author Steven D. Hollon of the psychology department of Vanderbilt University in Nashville, Tennessee.

He would have liked to see that additive effect for the whole group of depressed patients, but for about two thirds of patients, adding cognitive therapy didn’t matter, Hollon said.

Imagine a study evaluating the benefit of adding antidepressant medication to well-delivered cognitive therapy and that the results were similarly disappointing. The study would be well-publicized (“Depressed persons don’t need meds if they are getting adequate therapy”) in part because of the cognitive therapy lobby, but also because the message resonates with the anti-medication side in the antidepressant wars. Unlike results of the present study, these hypothetical results would be trumpeted because they are consistent with entrenched opinions.

The silence greeting the article has much in common with supporters of a soccer team not wanting to discuss a disappointing loss. Opinions about antidepressants and psychotherapy are as partisan as loyalties to soccer teams. There is nothing sinister going on here. But it makes for a bad progression from the availability of evidence to changing practice.

In this post, I will examine some of the specifics of the study and their broader implications. There are some sobering things to be learned. Among them:

  • State-of-the-art treatment combining antidepressants and cognitive therapy continued over a long period of time leaves many patients still depressed.
  • Adding psychotherapy does not improve outcomes for many patients if they are already receiving well-managed, personalized treatment with antidepressants.
  • Whatever cognitive therapy contributes might be achieved cheaply and more simply with supportive therapy or enhanced clinical management of the antidepressants.
  • Therapists need guidance as to what to do when manualized psychotherapy is not having its intended effect, including how to inform and discuss with patients.

But to begin such discussions we need to dive into the details of the methods and the particular interventions being evaluated. And bring in what we already know about treatment of depression, particularly the gross inadequacies in routine care.

The abstract to the paper is available here. As with other papers behind pay walls, you will have to access this one through a University library or email the corresponding author, steven.d.hollon@vanderbilt.edu. The excellent editorial by Michael Thase is also behind a pay wall, but you can email him at thase@mail.med.upenn.edu.

Finally, the registration of the trial is available here.

jama psychiatryThe study

The objective of the study was

To determine the effects of combining cognitive therapy (CT) with ADM [antidepressant medication] vs ADM alone on remission and recovery in major depressive disorder (MDD).

Overall design

The trial design was exceptionally complex and involved providing acute treatment of up to 18 months, removal of patients who did not meet criteria for remission with 18 months, and transitioning of the remaining patients into continuation treatment.

Acute treatment lasted until the patient met the criteria for remission, defined as 4 consecutive weeks of minimal symptoms; continuation treatment lasted to the point of recovery, defined as another 26 consecutive weeks without relapse. Patients did not need to maintain the symptom levels required for remission to meet the criteria for recovery. Participants who experienced relapse during continuation were required to meet remission criteria again before they were eligible to meet the criteria for recovery. Patients who did not meet the symptomatic criteria for remission within 18 months of treatment were removed from the study and referred for other treatment, as were patients who did not meet criteria for recovery within 36 months. Patients who met only the symptomatic criterion for remission at month 18 (or recovery at month 36) continued treatment until it was determined whether they also met the temporal criteria. Thus, up to 19 months were allowed for remission and up to 42 months for recovery.

As a Phase 4 trial, the investigators assumed that the efficacy of both the ADM and CT have already been established so that the focus could be on whether these two efficacious treatments could be usefully combined. All patients received antidepressant treatment and half were randomized to receiving cognitive therapy as well. There was no pill placebo or other comparison group. The decision not to have a condition controlling for attention and support makes sense, but it introduces ambiguity in the interpretability of the ultimate results, as we will see.

The trial registration

The registration is entitled “Preventing the Recurrence of Depression With Drugs and Psychotherapy” and occurred after the first patients were accruing, not before. The title of the registration is discrepant with the actual published study, which does not mention prevention and downplays recurrence as an outcome.

The patient population had to have recurrent or chronic major depressive disorder, with the exclusion criteria were a current diagnosis of a psychotic disorder, a history of nonaffective psychotic disorder, substance abuse during the last three months requiring detoxification, and having either a schizotypal, antisocial, or borderline personality disorder.

There were three primary outcomes declared:

  1.  Time to remission
  2. Time to recovery
  3. Time to recurrence

Psychopharmacotherapy

All patients received acute treatment until they met criteria for remission. Continuation treatment was provided until the point of recovery…Dosages were raised as rapidly as possible and kept at maximum tolerated levels for at least 4 weeks. Treatment in patients who exhibited only a partial response was augmented with additional medications, and treatment in those who showed minimal response (or little additional response following augmentation) was switched to another ADM. Most patients were given multiple trials with easier-to-manage selective serotonin reuptake inhibitors or serotonin-norepinephrine reuptake inhibitors before treatment was switched to more difficult-to-manage tricyclic antidepressants or monoamine oxidase inhibitors.

So, unlike many ADM trials, this one involved providers being able to switch between antidepressants or add additional medications, not just adjust dosage. This is impressive, state-of-the-art, algorithm-based treatment, involving regularly assessing patient outcomes and making decisions about intensifying or changing treatments, based on set rules. You can find more about algorithm-based treatment here.

cognitive therapy depressionCognitive Therapy

The therapists met weekly for 90 minutes at each site to review cases, with onsite supervision provided by 3 of the authors (R.J.D., P.R.Y., and S.D.H.). The therapists followed the procedures outlined in the original treatment manual for CT of depression, augmented when indicated for patients with comorbid Axis II disorders. The protocol called for 50-minute sessions to be held twice weekly for at least the first 2weeks, at least weekly thereafter during acute treatment, and then at least monthly during continuation. Therapists were free to vary the session frequency to meet the needs of the patient.

This too is state-of-the-art treatment. The three supervisors, including the first author and principal investigator Steve Hollon, have been very involved in the promotion of cognitive therapy for depression and could be expected to provide expert implementation and supervision.

Results

Patients treated with antidepressants alone had a recovery rate of 62.5%, which was raised to 73.5% among those who received CT as well.

  • There were no differences in remission rate between patients assigned to ADM alone (60.3% by month 12) and those assigned to combined treatment (63.6%).
  • Fewer patients assigned to combined treatment dropped out and this group also had fewer adverse events, which the authors attribute to their less time in an episode of depression.
  • Recall that the trial registration indicated the study was supposedly aimed at preventing relapse. You have to search to find that there were no differences between the two groups in relapse, 80 relapses in 54 patients retained in the ADM alone group versus 71 in 48 patients vs in the combined group. Note the modest size of the samples of the two groups for which risk of relapse could even be calculated.

These are not impressive results for CT. The authors performed post hoc subgroup analyses

in which they found no effect for rate of recovery for the two thirds of patients with less severe or less chronic depression, but a sizable effect for the remaining patients who met these criteria. Basically, the number needed to treat (NNT) was 3 in this subgroup of patients with severe, nonchronic depression [Update: 10/19/2014 Corrected from earlier "chronic"]. That is impressive, but needs to be replicated, because analyses were post-hoc and underpowered. Such effects tend to be weaker or disappear all together when replication is attempted in a larger sample.

  • There were still no differences for remission in these subgroup analyses.
  • Recall patients with schizotypal, antisocial, or borderline were excluded. But those with other personality disorders took longer to recover than did patients without a personality disorder.

My interpretation of the results

Three things to keep in mind as we begin discussion of some unsettling results:

  1. The study design does not address whether antidepressants add anything to what is achieved with cognitive therapy. To do so would require a study in which all patients receive cognitive therapy and only some were randomized to antidepressants.
  2. This study did not have an inert control group such as wait list or no treatment and so any naturalistically occurring recovery in the absence of treatment gets attributed to the treatments. To some unknown degree, apparent effects of treatment are actually naturally occurring recovery that would have occurred in the absence of treatment.
  3. The study also does not have a psychotherapeutic control group like supportive therapy. We cannnot know whether any benefits achieved by adding cognitive therapy could have been obtained with a less intensive treatment, like supportive counseling or therapy or even simple encouragement and support for adherence.
  4. The study involved an extraordinary amount of patient-provider a contact time. It is unfortunate that exactly how much is not documented, but this is relevant to evaluating the cost effectiveness of prolonged treatment of depression in the absence of improvement.

The glass-is-half-empty interpretation of the study is that even when given state-of-the-art treatment that is more intensive and long-term than is typical, a quarter of depressed patients do not achieve remission or recovery. The quality and intensity of both pharmacological management and psychotherapy far exceeds what is routinely available in the community and probably what is even reimbursable by insurance.

Routine treatment for depression in the community is quite poor. The mean number of visits in a year for persons with a diagnosis of depression is only eight. Most depression treatment is with antidpressant medication, and most medication is given in non-mental health medical care settings, like primary care. Most primary care patients discontinue ADM treatment shortly after starting. Only 20-30% of depressed persons being treated exclusively in primary care settings receive adequate care and follow up. Said differently, 40% of depressed patients are administered treatment with little or no benefit over what would be obtained by remaining on a wait list, representing about 20% of the total cost of treating depression.

The treatment offered in this study could be seen as a  Rolls Royce. If so, routine careflat tire remains a bicycle with a flat tire.

Some of the problems of routine care lie in poor reimbursement and provider indifference to practice guidelines. The guidelines for meds minimally require a follow-up visit in 5 to 7 weeks to determine whether progress is being made, adherence and patient education are adequate, and whether any adjustment or change in medication is needed. That does not typically happen.

But another part of the problem lies in patients’ perception that such an investment in time and effort is not worth the benefits they received. That may also reflect the inadequacies of the care they get, but it is a cost/benefit analysis that could lead patients to refuse more intensive treatment.

The bottom line, is doing the best we can, treatment for depression will leave many patients dissatisfied and continuing to be depressed. We need to be careful about misleading depressed patients about what they can expect.

More details are needed about how much treatment was provided to whom in this study. It is such an ambitious and costly study, and unlikely to be done again anytime soon, but it leaves a lot of questions on answered. We could at least begin to formulate some hypotheses knowing more what went on.

For instance, what did the cognitive therapists do, what specific interventions did they provide in seeing patients so regularly for so long in the absence of apparent benefit? Were the therapists even aware of the lack of a benefit? The therapists surely had to improvise in going well beyond what is indicated in the manual, which is most adapted to shorter term therapy. Did they simply resort to being supportive?

We cannot rule out that any benefits of cognitive therapy in the study are simply due to nonspecific support, reinforcement of positive expectations, and encouragement to adhere to medication. The cognitive therapy had impressively credentialed and carefully supervised therapists. But was this required for the effects that were obtained?

Finally, providers managing medication in the present study relied on algorithms to make decisions about whether and when to make changes in medication, including switching, augmenting, or simply changing dosage. Many of the specific algorithms do not have strong empirical validation, but the notion does have empirical support that at some point clinicians have a responsibility to change what they are doing does. For instance, there are practice guidelines recommending that after around five weeks, and positive clinical change is not evident, the current treatment should be re-examined.

Manualized psychotherapy has guidelines for to do within the therapeutic model when change is not occurring. But there are typically no guidelines as to when medication should be suggested, a different therapeutic approach or referral to another therapist offered, or therapy should be terminated as futile.

Certainly we can conceive of situations where such judgments are warranted, but there is almost no discussion in the psychotherapy literature. In the case of cognitive therapy for depression, observational data derived from clinical trials could be used to suggest when change should be occurring, and if it is not, the likelihood that it will occur later. Out of respect for patient autonomy and informed consent, I think it is incumbent on psychotherapists to evaluate the evidence they have and come up with provisional recommendations for switching or stopping treatment that can be empirically tested.

 

 

 

 

Category: Uncategorized | 16 Comments

How to critique claims of a “blood test for depression”

Special thanks to Ghassan El-baalbaki and John Stewart for their timely assistance. Much appreciated.

“I hope it is going to result in licensing, investing, or any other way that moves it forward…If it only exists as a paper in my drawer, what good does it do?” – Eva Redei, PhD, first author.

video screenshotMedia coverage of an article in Translational Psychiatry uniformly passed on the authors’ extravagant claims in a press release from Northwestern University that declared that a simple blood test for depression had been found. That is, until I posted a critique of these claims at my secondary blog. As seen on Twitter, the tide of opinion suddenly shifted and considerable skepticism was expressed.

I am now going to be presenting a thorough critique of the article itself. More importantly,translational psychiatry I will be pointing to how, with some existing knowledge and basic tools, many of you can learn to critically examine the credibility of such claims that will inevitably arise in the future. Biomarkers for depression are a hot topic, and John Ioannidis has suggested that means a lot of exaggerated claims about flawed studies are more likely to be the result than real progress.

The article can be downloaded here and the Northwestern University press release here. When I last blogged about this article, I had not seen the 1:58 minute video that is embedded in the press release. I encourage you to view it before my critique and then view it again if you believe that it has any remaining credibility. I do not know where the dividing line is between unsubstantiated claims about scientific research and sheer quackery, but this video tests the boundaries, when evaluated in light of the evidence actually presented in the article.

I am sure that many journalists, medical and mental health professionals, laypersons were intimidated by the mention of “blood transcriptomic biomarkers” in the title of this peer-reviewed article. Surely, the published article had survived evaluation by an editor and reviewers with better, relevant expertise. What is there for an unarmed person to argue about?

Start with the numbers and basic statistics

Skepticism about the study is encouraged by a look at the small numbers of patients involved in the study, which was limited to

  • 64 total participants, 32 depressed patients from a clinical trial and 32 controls.
  • 5 patients were lost from baseline  to follow up.
  • 5 more were lost  from 18 week blood draws, leaving
  • 22 remaining patients –
  • 9 classified as in remission, 13 not in remission.

The authors were interested in differences in 20 blood transcriptomic biomarkers in 2 comparisons: the 32 depressed patients versus 32 controls and the 9 patients who remitted at the end of the trial versus 13 who did not. The authors committed themselves to looking for a clinically significant difference or effect size, which, they tell readers, is defined as .45. We can use a program readily available on the web for a power analysis, which indicates the likelihood of obtaining a statistically significant result (p <.05) for any one of these biomarkers, if differences existed between depressed patients and controls or between the patients who improved in the study versus those who did not. Before even putting these numbers into the calculator, we would expect the likelihood is low because of the size of the sample.

We find that there is only a power of 0.426 for finding one of these individual biomarkers significant, even if it really distinguishes between depressed patients and controls and a power of 0.167 for finding a significant difference in the comparison of the patients who improved versus those who did not.

Bottom line is that this is much too small a sample to address the questions in which the authors are interested – less than 50-50 for identifying a biomarker that actually distinguished between depressed patients and controls and less than 1 in 6 in finding a biomarker actually distinguishing those patients who improved versus those who did not. So, even if the authors really have stumbled upon a valid biomarker, they are unlikely to detect it in these samples.

But there are more problems. For instance, it takes a large difference between groups to achieve statistical significance with such small numbers, so any significant result will be quite large. Yet, with such small numbers, statistical significance is unstable: dropping or adding a few or even a single patient or control or reclassifying a patient as improved or not improved will change the results. And notice that there was some loss of patients to follow-up and to determining whether they improved or not. Selective loss to follow-up is a possible explanation of any differences between the patients considered improved and those who are not considered improved. Indeed, near the end of the discussion, the authors note that patients who were retained for a second blood draw differed in gene transcription from those who did not. This should have tempered claims of finding differences in improved versus unimproved patients, but it did not.

So what I am getting at is that this small sample is likely to produce strong results that will not be replicated in other samples. But it gets still worse –

Samples of 32 depressed patients and 32 controls chosen because they match on age, gender, and race – as they were selected in the current study – can still differ on lots of variables.  The depressed patients are probably more likely to be smokers and to be neurotic. So the authors made only be isolating blood transcriptomic biomarkers associated with innumerable such variables, not depression.

There can be single, unmeasured variables that are the source of any differences or some combination of multiple variables that do not make much difference by themselves, but do so when they are together present in a sample. So,  in such a small sample a few differences affecting a few people can matter greatly. And it does no good to simply do a statistical test between the two groups, because any such test is likely to be underpowered and miss influential differences that are not by themselves so extremely strong that they meet conditions for statistical significance in a small sample.

The authors might be tempted to apply some statistical controls – they actually did in a comparison of the nine versus 13 patients – but that would only compound the problem. Use of statistical controls requires much larger samples, and would likely produce spurious – erroneous – results in such a small sample. Bottom line is that the authors cannot rule out lots of alternative explanations for any differences that they find.

The authors nonetheless claim that 9 of the 20 biomarkers they examined distinguish depressed patients and 3 of these distinguish patients who improve. This is statistically improbable and unlikely to be replicated in subsequent studies.

And then there is the sampling issue. We are going to come back to that later in the blog, but just consider how random or systematic differences can arise between this sample of 32 patients versus 32 controls and what might be obtained with another sampling of the same or a different population. The problem is even more serious when we get down to the 9 versus 13 comparison of patients who completed the trial. A different intervention or a different sample or better follow-up could produce very different results.

So, just looking at the number of available patients and controls, we are not expecting much good science to come out of this study that is pursuing significance levels to define results. I think that many persons familiar with these issues would simply dismissed this paper out of hand after looking at these small numbers.

The authors were aware of the problems in examining 20 biomarkers in such small comparisons. They announced that they would commit themselves to adjusting significance levels for multiple comparisons. With such low ratios of participants in the comparison groups to variables examined, this remains a dubious procedure.  However, when this correction eliminated any differences between the improved and unimproved patients, they simply ignored having done this procedure and went on to discuss results as significant. If you return to the press release and the video, you can see no indication that the authors had applied a procedure that eliminated their ability to claim results as significant. By their own standards, they are crowing about being able to distinguish ahead of time patients who will improve versus those who will not when they did not actually find any biomarkers that did so.

What does the existing literature tell us we should expect?

Our skepticism aroused, we might next want to go to Google Scholar and search for topicspull down menu such as genetics depression, biomarkers depression, blood test depression, etc. [Hint: when you put a set of terms into the search box and click, then pull down the menu on the far right to get an advanced search.]

I could say this takes 25 minutes because that is how much time I spent, but that would be misleading. I recall a jazz composer who claim to write a song in 25 minutes. When the interviewer expressed skepticism, the composer said “Yeah, 25 minutes and 25 years of experience.” I had the advantage of knowing what I was looking for.”

The low heritability of liability for MDD implies an important role for environmental risk factors. Although genotype X environment interaction cannot explain the so-called ‘missing heritability’,52 it can contribute to small effect sizes. Although genotype X environment studies are conceptually attractive, the lessons learned from the most studied genotype X environment hypothesis for MDD (5HTTLPR and stressful life event) are sobering.

And

Whichever way we look at it, and whether risk variants are common or rare, it seems that the challenge for MDD will be much harder than for the less prevalent more heritable psychiatric disorders. Larger samples are required whether we attempt to identify associated variants with small effect across average backgrounds or attempt to enhance detectable effects sizes by selection of homogeneity of genetic or environmental background. In the long-term, a greater understanding of the etiology of MDD will require large prospective, longitudinal, uniformly and broadly phenotyped and genotyped cohorts that allow the joint dissection of the genetic and environmental factors underlying MDD.

[Update suggested on Twitter by Nese Direk, MD] A subsequent even bigger search for the elusive depression gene reported

We analyzed more than 1.2 million autosomal and X chromosome single-nucleotide polymorphisms (SNPs) in 18 759 independent and unrelated subjects of recent European ancestry (9240 MDD cases and 9519 controls). In the MDD replication phase, we evaluated 554 SNPs in independent samples (6783 MDD cases and 50 695 controls)…Although this is the largest genome-wide analysis of MDD yet conducted, its high prevalence means that the sample is still underpowered to detect genetic effects typical for complex traits. Therefore, we were unable to identify robust and replicable findings. We discuss what this means for genetic research for MDD.

So, there is not much encouragement for the present tiny study.

baseline gene expression may contain too much individual variation to identify biomarkers with a given disease, as was suggested by the studies’ authors.

Furthermore it noted that other recent studies had identified markers that either performed poorly in replication studies or were simply not replicated.

Again, not much encouragement for the tiny present study.

[According to Wiktionary, Omics refers to  related measurements or data from such interrelated fields as genomics, proteomics. transcriptomic or other fields.]

The report came about because of numerous concerns expressed by statisticians and bioinformatics scientists concerning the marketing of gene expression-based tests by Duke University. The complaints concerned the lack of an orderly process for validating such tests and the likelihood that these test would not perform as advertised. In response, the IOM convened an expert panel, which noted that many of the studies that became the basis for promoting commercial tests were small, methodological flawed, and relied on statistics that were inappropriate for the size of the samples and the particular research questions.

The committee came up with some strong recommendations for discovering, validating, and evaluating such tests in clinical practice. By these evidence-based standards, the efforts of the authors of the Translational Psychiatry are woefully inadequate and irresponsible in jumping from their modest size study to the claims they are making to the media and possible financial backers, particularly from such a preliminary small study without further replication in an independent sample.

Given that the editor and reviewers of Translational Psychiatry nonetheless accepted this paper for publication, they should be required to read the IOM report. And all of the journalists who passed on ridiculous claims about this article should also read the IOM book.

If we google the same search terms, we come up with lots of press coverage of what work previously claimed as breakthroughs. Almost none of them pan out in replication, despite the initial fanfare. Failures to replicate are much less newsworthy than false discoveries, but once in a while a statement of resignation makes it into the media. For instance,

Depression gene search disappoints

Click to expand

Click to expand

Looking for love biomarkers in all the wrong places

The existing literature suggests that the investigators have a difficult task looking for what is probably a weak signal with a lot of false positives in the context of a lot of noise. Their task would be simpler if they had a well-defined, relatively homogeneous sample of depressed patients. That is so these patients would be relatively consistent in whatever signal they each gave.

With those criteria, the investigators chose was probably the worst possible sample. They obtained their small sample of 32 depressed patients from a clinical trial comparing face-to-face to Internet cognitive behavioral therapy in a sample recruited from primary medical care.

Patients identified as depressed in primary care are a very mixed group. Keep in mind that the diagnostic criteria require that five of nine symptoms be present for at least two weeks. Many depressed patients in primary care have only five or six symptoms, which are mild and ambiguous. For instance, most women experience sleep disturbance weeks after given birth to an infant. But probing them readily reveals that their sleep is being disturbed by the infant. Similarly, one cardinal symptom of depression is the loss of the ability to experience pleasure, but that is confusing item for primary care patients who do not understand that the loss of the ability is supposed to be due to not being able to experience pleasure, rather than not been able to do things that are previously given them pleasure.

And two weeks is not a long time. It is conceivable that symptoms can be maintained that long in a hostile, unsupportive environment but immediately dissipate when the patient is removed from that environment.

Primary care physicians, if they even adhere to diagnostic criteria, are stuck with the challenge of making a diagnosis based on patients having the minimal number of symptoms, with the required  symptoms often being very mild and ambiguous in themselves.

So, depression in primary care is inherently noisy in terms of its inability to give a clear signal of a single biomarker or a few. It is likely that if a biomarker ever became available, many patients considered depressed now, would not have the biomarker. And what would we make of patients who had the biomarker but did not report symptoms of depression. Would we overrule them and insist that they were really depressed? Or what about patients who exhibited classic symptoms of depression, but did not have the biomarker. When we tell them they are merely miserable and not depressed?

The bottom line is that depression in primary care can be difficult to diagnose and to do so requires a careful interview or maybe the passage of time. In Europe, many guidelines discourage aggressive treatment of mild to moderate depression, particularly with medication. Rather, the suggestion is to wait a few weeks with vigilant monitoring of symptoms and  encouraging the patient to try less intensive interventions, like increased social involvement or behavioral activation. Only with the failure of those interventions to make a difference and the failure of symptoms to resolve the passive time, should a diagnosis and initiation of treatment be considered.

Most researchers agree that rather than looking to primary care, we should look to more severe depression in tertiary care settings, like inpatient or outpatient psychiatry. Then maybe go back and see the extent to which these biomarkers are found in a primary care population.

And then there is the problem by which the investigators defined depression. They did not make a diagnosis with a gold standard, semi structured interview, like the Structured Clinical Interview for DSM Disorders (SCID) administered by trained clinicians. Instead, they relied on a rigid simple interview, the Mini International Neuropsychiatry Interview, more like a questionnaire, that was administered by bachelor-level research assistants. This would hardly pass muster with the Food and Drug Administration (FDA). The investigators had available scores on the interview-administered Hamilton Depression Scale (HAM-D), to measure improvement, but instead relied on the self-report Personal Health Questionnaire (PHQ-9). The reason why they chose this instrument is not clear, but it would again not pass muster with the FDA.

Oh, and finally, the investigators talk about a possible biomarker predicting improvement in psychotherapy. But most of the patients in this study were also receiving antidepressant medication. This means we do not know if the improvement was due to the psychotherapy or the medication, but the general hope for a biomarker is that it can distinguish which patients will respond to one versus the other treatment. The bottom line is that this sample is hopelessly confounded when it comes to predicting response to the psychotherapy.

Why get upset about this study?

I could go on about other difficulties in the study, but I think you can get the picture that this is not a credible study and one that can serve as the basis in search for a blood base, biomarker for depression. It simply absurd to present it as such. But why get upset?

  1. Publication of such low quality research and high profile attempts to pass it off as strong evidence of damage the credibility of all evidence-based efforts to establish the efficacy of diagnostic tools and treatments. This study adds to the sense that much of what we read in the scientific journals and is echoed in the media is simply exaggerated or outright false.
  2. Efforts to promote this article are particularly pernicious in suggesting that primary care physicians can make diagnoses of depression without careful interviewing of patients. The physicians do not need to talk to the patients, they can simply draw blood or give out questionnaires.
  3. Implicit in the promotion of their results has evidence for a blood test of depression is the assumption that depression is a biological phenomenon, strongly influenced by genetic expression, not the environment. Aside from being patently wrong and inconsistent with available evidence, it leads to a reliance on biomedical treatments.
  4. Wide dissemination of the article and press release’s claims serve to reinforce laypersons and clinicians’ belief in the validity of commercially available blood tests of dubious value. These tests can cost as much as $475 per administration and there is no credible evidence, by IOM standards, that they perform superior to simply talking to patients.

At the present time, there is no strong evidence that antidepressants are on average superior in their effects on typical primary care patients, relative to, say, interpersonal psychotherapy (IPT). IPT assumes that regardless of how depression comes about, patient improvement can come about by understanding and renegotiating significant interpersonal relationships. All of the trash talk of these authors contradicts this evidence-based assumption. Namely, they are suggesting that we may soon be approaching an era where even the mild and moderate depression of primary care can be diagnosed and treated without talking to the patient. I say bollocks and shame on the authors who should know better.

Category: blood test, Conflict of interest, depression, genomics, omics, primary care | Tagged , , | 33 Comments

The Top Eleven Ways to Tell that a Journal is Fake

I am delighted to offer Mind the Brain readers a guest blog written by my colleague, Eve Carlson, Ph.D.  Eve Carlson is a clinical psychologist and researcher with the National Center for PTSD and the U.S. Department of Veterans Affairs, VA Palo Alto Health Care System.  Her research focuses on assessment of trauma exposure and responses, and she has developed measures of PTSD, dissociation, trauma exposure, self-destructive behavior, affective lability, and risk for posttraumatic psychological disorder.  Her research has been funded by National Institute for Mental Health (U.S.) and the Dept. of Veterans Affairs (U.S.) and recognized by awards from the International Society for Traumatic Stress Studies and the International Society for the Study of Trauma and Dissociation.  Her publications include books on trauma assessment and trauma research methodology and numerous theoretical and research articles.  She has served as President and a member of the Board of Directors for the International Society for Traumatic Stress Studies and on the editorial boards of several journals.

 

The Top  Eleven Ways to Tell that a Journal is Fake

Eve Carlson, Ph.D.

Past President, International Society for Traumatic Stress Studies

If you have ever published a scholarly paper, your email inbox is probably peppered with invitations to submit papers to new journals with plausible-sounding names.  Many people dismiss these emails as spam, but with all one hears about the impending death of paper journals, who knows what is next in the wild, wild West of open source publishing?  And if you venture to click on a link to the journal, you may well see a web page boasting about a journal editor who is a prominent name in your field, an editorial board that includes several luminaries, instructions for authors, and legitimate-looking articles.  With the “publish or perish!” pressure still going strong, what’s an academic to do?

I recently stumbled into an “investigation” of a new, online, open source journal in the course of service as a leader of a professional society.  When I was president of an international professional society, a new journal began soliciting submissions that had a name that was very similar to our Society’s journal -“Journal of XXX”.  The Society feared that the new journal, called “Journal of XXX Disorders and Treatment”, would be mistaken for an offshoot of the original.  I saw the names of colleagues I knew on the editorial board and skimmed some of the opinion piece articles posted online and assumed it was a new experiment in open source publishing. But when I contacted the colleagues and began asking questions, it quickly became apparent that this journal had no editor, editorial board members were acquired via spam emails to authors of published articles, the journal appeared to follow no standard publishing practices, and most editorial board members had observed irregularities that made them suspicious that the journal was not legitimate.  Once informed of the problems observed and put in communication with one another, 16 of the 19 editorial members resigned en masse.

 

Based on actual experiences looking into three questionable open source journals, you can tell a journal is fake when…

 

1)  Searching in the box marked “Search this journal” on the journal web page for the name of an author of an article in a recent issue of the journal does not return any hits.

 

2)  Clicking on a link like this medline on the journal web site leads to the spoof site www.Medline.com.

 

3)  No specific person is identified as the editor of the journal or the person who appears to be identified as the journal’s Editor on the web site says he is not the editor.

 

4)  Google Maps searches for the address of journal shows its headquarters is in a suburban bungalow.

googlemaps

 

5)  You cannot find articles from a bio-medical journal when you search PubMed.  [You can check by searching for the journal title here]

 

6)  The journal’s mission on its home page is described in vague, generic terms such as “To publish the most exciting research with respect to the subjects of XXXXXX.”

 

7)  When you call the local phone number for the journal office listed on the web page, any of these happen:  1. No one answers. 2. Someone answers “hello?” on what sounds like a cell phone and hangs up as soon as they hear you speaking.  3. The call is forwarded to the 800 phone bank for the publisher, and the person on the other end cannot tell you the name of the editor of the journal.

 

8)  PubMed Central refuses to accept content from a publisher’s bio-medical journals and DHHS sends a “cease and desist” letter to the publisher.

 

9)  The journal publisher’s posts online a legal notice warning a blogger who writes about the publisher that he is on a “perilous journey” and is exposing himself to “serious legal implications including criminal cases lunched (sic) again you in INDIA and USA” and directs him to pay the publisher $1 billion in damages.  Check out the legal notice here.

 

10)  The journal issues and posts online certificates with hearts around the border that certifies you as “the prestigious editorial board member of [name of journal here].”

certificate

 

11)  The journal posts “interviews” with members of its editorial board that appear to be electronic questionnaires with comical responses to interviewer questions such as:

interview1

interview3

CORRECTION: The site www.medline.com is real, not a spoof site.

 

 

Category: Commentary, mental health care, Psychiatry, research | Tagged , , | 6 Comments