NIMH Biomarker Porn: Depression, Daughters, and Telomeres Part 1

Does having to cope with their mother’s depression REALLY inflict irreversible damage on daughters’ psychobiology and shorten their lives?

telomerejpg

Telomere

A recent BMJ article revived discussion of responsibility for hyped and distorted coverage of scientific work in the media. The usual suspects, self-promoting researchers, are passed over and their University press releases are implicated instead.

But university press releases are not distributed without authors’ approval.  Exaggerated statements in press releases are often direct quotes from authors. And don’t forget the churnaling journalists and bloggers who uncritically pass on press releases without getting second opinions.  Gary Schwitzer remarked:

Don’t let news-release-copying journalists off the hook so easily. It’s journalism, not stenography.

In this two-part blog post, I’ll document this process of amplification of the distortion of science from article to press release to subsequent coverage. In the first installment, I’ll provide a walkthrough commentary and critique of a flawed small study of telomere length among daughters of depressed women published in the prestigious Nature Publishing Group journal, Molecular Psychiatry. In the second, I will compare the article and press release to media coverage, specifically the personal blog of NIMH Director Thomas Insel.

whackI warn the squeamish that I will whack some bad science and outrageous assumptions with demands for evidence and pelt the study, its press release, and Insel’s interpretation with contradictory evidence.

I’m devoting a two-part blog to this effort. Bad science with misogynist, mother bashing assumptions is being touted by the  Director of NIMH as an example to be followed. When he speaks, others pay attention because he sets funding priorities. Okay, Dr. Insel, we will listen up, but we will do so skeptically.

A paper that shares an author with the Molecular Psychiatry paper was criticized by Daniel Engber for delivering

A mishmash of suspect stats and overbroad conclusions, marshaled to advance a theory that’s both unsupported by the data and somewhat at odds with existing research in the field.

The criticism applies to this paper as well.

But first, we need to understand some things about telomere length…

What is a Telomere?

Telomeres are caps on the ends of every chromosome. They protect the chromosome from losing important genes or sticking to other chromosomes. They become shorter every time the cell divides.

I have assembled some resources in an issue of Science-Based Medicine:

Skeptic’s Guide to Debunking Claims about Telomeres in the Scientific and Pseudoscientific Literature

As I say in that blog, there are many exaggerated and outright pseudoscientific claims about telomere length as a measure of “cellular aging” and therefore how long we’re going to live.

I explain the concepts of biomarker and surrogate endpoint, which are needed to understand the current fuss about telomeres. I show why the evidence is against routinely accepting telomere length as a biomarker or surrogate endpoint for accelerated aging and other health outcomes.

I note

  • A recent article in American Journal of Public Health claimed that drinking 20soda kills ounces of carbonated (but not noncarbonated) sugar-sweetened drinks was associated with shortened telomere length “equivalent to an approximately 4.6 additional years of aging.” So, effects of drinking soda on life expectancy is equivalent to what we know about smoking’s effect.
  • Rubbish. Just ignore the telomere length data and directly compare the effects of drinking 20 ounces soda to the effects of smoking on life expectancy. There is no equivalence. The authors confused differences in what they thought was a biomarker with differences in health outcomes and relied on some dubious statistics. The American Journal of Public Health soda study was appropriately skewered in a wonderful Slate article, which I strongly recommend.
  • Claims are made for telomere length as a marker for effects of chronic stress and risk of chronic disease. Telomere length has a large genetic component and is correlated with age. When appropriate controls are introduced, correlation among telomere length, stress, and health outcomes tend to disappear or get sharply reduced.
  • A 30-year birth cohort study did not find an association between exposure to stress and telomere length.
  • Articles from a small group of investigators claim findings about telomere lengths that do not typically get reproduced in larger, more transparently reported studies by independent groups. This group of investigators tends to have or have had conflicts of interest in marketing of telomere diagnostic services, as well as promotion of herbal products to slow or reverse the shortening of telomere length.
  • Generally speaking, reproducible findings concerning telomere length require large samples with well-defined phenotypes, i.e., individuals having well-defined clinical presentations of particular characteristics, and we can expect associations to be small.

Based on what I have learned about the literature concerning telomere length, I would suggest

  • Beware of small studies claiming strong associations between telomere length and characteristics other than age, race, and gender.
  • Beware of studies claiming differences in telomere length arising in cross-sectional research or in the short term if they are not reproduced in longitudinal, prospective studies.

A walk-through commentary and critique of the actual article

Gotlib, I. H., LeMoult, J., Colich, N. L., Foland-Ross, L. C., Hallmayer, J., Joormann, J., … & Wolkowitz, O. M. (2014). Telomere length and cortisol reactivity in children of depressed mothers. Molecular Psychiatry.

Molecular Psychiatry is a pay-walled journal, but a downloadable version of the article is available here.

Conflict of Interest Statement

The authors report no conflict of interest. However, in the soda article published December 2014, one of the authors of the present paper, Jun Lin disclosed being a shareholder in Telomere Diagnostics, Inc., a telomere measurement company. Links at my previous blog post take you to “Telomeres and Your Health: Get the Facts” at the website of that company. You find claims that herbal products based on traditional Chinese medicine can reduce the shortening of telomeres.

Jun Lin has a record of outrageous claims. For instance, in another article, that normal women whose minds wander may be losing four years of life, based on the association between self-reported mind wandering and telomere length. So, if we pit this claim against what is known about the effects of smoking on life expectancy, women can extend their lives almost as much by better paying attention as from quitting smoking.

Hmm, I don’t know if we have undeclared conflict of interest here, but we certainly have a credibility problem.

The Abstract

Past research shows distorted and exaggerated media portrayals of studies are often already evident in abstracts of journal articles. Authors engage in a lot of cherry picking and spin results to strengthen the case their work is innovative and significant.

The opening sentence of the abstract to this article is a mashup of wild claims about telomere length in depression and risk for physical illnesses. But I will leave commenting until we reach the introduction, where the identical statement appears with elaboration and a single reference to one of the author’s work.

The abstract goes on to state

Both MDD and telomere length have been associated independently with high levels of stress, implicating dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion in this relation.

hpa useWhen I showed this to a pioneer in the study of the HPA axis, he remarked:

If you can find coherence in this from the Abstract you are smarter than I am…The phrase dysregulation of the HPA axis has been used to support more hand waving than substance.

The abstract ends with

This study is the first to demonstrate that children at familial risk of developing MDD are characterized by accelerated biological aging, operationalized as shortened telomere length, before they had experienced an onset of depression; this may predispose them to develop not only MDD but also other age-related medical illnesses. It is critical, therefore, that we attempt to identify and distinguish genetic and environmental mechanisms that contribute to telomere shortening.

This breathless editorializing about the urgency of pursuing this line of research is not tied to the actual methods and results of the study. “Accelerated biological aging” and “predispose to develop… other age-related medical illnesses” is not a summary of the findings of the study, but only dubious assumptions.

Actually, the evidence for telomere length as a biomarker for aging is equivocal and does not meet American Federation of Aging Research criteria.  A large scale prospective study did not find that telomere length predicted onset of diabetes or cardiovascular disease.

And wait to when we examine whether the study had reproducible results concerning either shorter telomeres and depression or telomeres being related to cortisol reactivity.

The introduction

The 6-paragraph introduction packs in a lot of questionable assumptions backed by a highly selective citation of the literature.

A growing body of research demonstrates that individuals diagnosed with major depressive disorder (MDD) are characterized by shortened telomere length, which has been posited to underlie the association between depression and increased rates of medical illness, including cardiovascular disease, diabetes, metabolic syndrome, osteoporosis and dementia (see Wolkowitz et al.1 for a review).

Really? A study co-authored by Wolkowitz and cited later in the introduction actually concluded

telomere shortening does not antedate depression and is not an intrinsic feature. Rather, telomere shortening may progress in proportion to lifetime depression exposure.

“Exposure” = personal experience being depressed. This would seem to undercut the rationale for examining telomere shortening in young girls who have not yet become depressed.

But more importantly, nether the Molecular Psychiatry article nor the Wolkowitz review acknowledge the weakness of evidence for

  • Depression being characterized by shortened telomere length.
  • The association of depression and medical illness in older persons representing a causal role for depression that can be modified by or prevention or treatment of depression in young people.
  • Telomere length observed in the young underlying any association between depression and medical illnesses when they get old.

Wolkowitz’s “review” is a narrative, nonsystematic review. The article assumes at the outset that depression represents “accelerated aging” and offers a highly selective consideration of the available literature.

In neither it nor the Molecular Psychiatry article we told

  • Some large scale studies with well-defined phenotypes fail to find associations between telomeres and depressive disorder or depressive symptoms. One large-scale study co-authored by Wolkowitz found weak associations between depression and telomere length too small to be detected in the present small sample. Any apparent association may well spurious.
  • The American Heart Association does not consider depression as a (causal) risk factor for cardiovascular disease, but as a risk marker because of a lack of the evidence needed to meet formal criteria for causality. Depression after a heart attack predicts another heart attack. However, our JAMA systematic review revealed a lack of evidence that screening cardiac patients for depression and offering treatment reduces their likelihood of having another heart attack or improves their survival. An updated review confirmed our conclusions.
  • The association between recent depressive symptoms and subsequent dementia is evident with very low level of symptoms, suggesting that it reflects residual confounding and reverse causation  of depressive symptoms with other risk factors, including poor health and functioning. I published a commentary in British Medical Journal  that criticized  claim that we should begin intervening for even low symptoms of depression in order to prevent dementia. I suggested that we would be treating a confound and it would be unlikely to make a difference in outcomes.

I could go on. Depression causally linked to diabetes via differences in telomere length? Causing osteoarthritis? You gotta be kidding. I demand quality evidence. The burden of evidence is on anyone who makes such wild claims.

Sure, there is lots of evidence that if people have been depressed in the past, they are more likely to get depressed again when they have a chronic illness. And their episodes of depression will last longer.

In general, there are associations between depression and onset and outcome of chronic illness. But the simple, unadjusted association is typically seen at low levels of symptoms, increases with age and accumulation of other risk factors and other physical co-morbidities. People who are older, already showing signs of illness, or who have poor health-related behaviors tend to get sicker and die. Statistical control for these factors reduces or eliminates the apparent association of depressive symptoms with illness outcomes. So, we are probably not dealing with depression per se.  If you are interested in further discussion of this see my slide presentation, see

Negative emotion and health: why do we keep stalking bears, when we only find scat in the woods?

I explain risk factors (like bears) versus risk markers (like scat) and why shooting scat does not eliminate the health risk posed by bears,.

I doubt few people familiar with the literature believe that associations among telomeres and depression, depression and the onset of chronic illness, and telomeres and chronic illness are such that a case could be made for telomere length in young girls being importantly related to physical disease in their mid and late life. This is science fiction being falsely presented as evidence-based.

The authors of the Molecular Psychiatry paper are similarly unreliable when discussing “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” You would think that they are referring to established biomarkers for risk of depression. Actually, most biological correlates of depression are modest, nonspecific to depression, and state, not trait-related – limited to when people are actually depressed.

MDD and ND [nondepressed] individuals exhibited similar baseline and stress cortisol levels, but MDD patients had much higher cortisol levels during the recovery period than their ND counterparts.

We did not find the expected main effects of maternal depression on children’s cortisol  reactivity.

  • They misrepresent a directly relevant study that examined cortisol secretion in the saliva of adolescents as a predictor of subsequent development of depression.  It actually found no baseline measure of cortisol measures predicted development of depression except cortisol awakening response.

In general, cortisol secretion is more related to stress than to clinical depression. One study concluded

The hypothalamic—pituitary—adrenal axis is sensitive to social stress but does not mediate vulnerability to depression.

depressed girlWhat is most outrageous about the introduction, however, is the specification of the pathway between having a depressed mother and shortened telomere length:

The chronic exposure of these children to this stress as a function of living with mothers who have experienced recurrent episodes of depression could represent a mechanism of accelerated biologic aging, operationalized as having shorter telomere length.

Recognize the argument that is being set up: having to deal with the mothers’ depression is a chronic stressor for the daughters, which sets up irreversible processes before the daughters even become depressed themselves, leading to accelerated aging, chronic illness, and early death. We can ignore all the characteristics, including common social factors, that the daughter share with their mothers, that might be the source of any daughters’ problems.

This article is a dream paper for the lawyers for men seeking custody of their children in a divorce: “Your honor, sole custody for my client is the children’s only hope, if it is not already too late. His wife’s depression is irreversibly damaging the children, causing later sickness and early death. I introduced as evidence of an article by Ian Gotlib that was endorsed by the Director of the National Institute of Mental Health…

Geraldine Downey and I warned about this trap in a classic review, children of depressed parents, cited 2300 times according to Google Scholar and still going strong. We noted that depressed mothers and their children share a lot of uncharted biological, psychological, and environmental factors. But we also found that among the strongest risk factors for maternal depression are marital conflict, other life events generated by the marriage and husband, and a lack of marital support. These same factors could contribute to any problems in the children. Actually, the husband could be a source of child problems. Ignoring these possibilities constitutes a “consistent, if unintentional, ‘mother-bashing’ in the literature.”

The authors have asked readers to buy into a reductionist delusion. They assume some biological factors in depression are so clearly established that they can serve as biomarkers.  The transmission of any risk for depression associated with having a depressed mother is by way of irreversible damage to telomeres. We can forget about any other complex social and psychological processes going on, except that the mothers’ depression is stressing the daughters and we can single out a couple of biological variables to examine this.

Methods

The Methods lacks basic details necessary to evaluate the appropriateness of what was done and the conclusions drawn from any results. Nonetheless, there is good reason to believe that we are dealing with a poorly selected sample of daughters from poorly selected mothers.

We’re not told much about the mothers except that they have experienced recurrent depression during the childhood of the daughters. We have to look to other papers coming out of this research group to discover how these mothers were probably identified. What we see is that they are a mixed group, in part drawn from outpatient settings and in part from advertisements in the community.

Recall that identification of biological factors associated with depression requires well-defined phenotypes. The optimal group to study would be patients with severe depression. We know that depression is highly heterogeneous and that “depressed” people in the community who are not in specialty treatment are likely to just barely meet criteria. We are dealing with milder disorder that is less likely to be characterized by any of the biological features of more severe disorder. Social factors likely play more of a role in their misery. In many countries, medication would not be the first line of treatment.

Depression is a chronic, remitting, recurrent disorder with varying degrees of severity of overall course and in particular episodes. It has its onset in adolescence or early adulthood. By the time women have daughters who are 10 to 14 years old, they are likely to have had multiple episodes. But in a sample selected from the community, these episodes may have been mild and not necessarily treated, nor even noticeable by the daughters. The bottom line is we should not be too impressed with the label “recurrent depression” without better documentation of the length, severity, and associated impairment of functioning.

Presumably the depressed mothers in the study were selected because they were currently depressed. That makes it difficult to separate out enduring factors in the mothers and their social context versus those that are tied to the women currently being depressed. And because we know that most biological factors associated with depression are state dependent, we may be getting a skewed picture of the biology of these women – and their daughters, for that matter – then at other times.

Basically, we are dealing with a poorly selected sample of daughters from a poorly selected sample of mothers with depression. The authors are not telling us crucial details that we need to understand any results they get. Apparently they are not measuring relevant variables and have too a small sample to apply statistical controls anyway.As I said about another small study making claims for a blood test for depression, these authors are

Looking for love biomarkers in all the wrong places.

Recall that I also said that results from small samples like this one often conflict with results from larger, epidemiologic studies with larger samples and better defined phenotypes. I think we can see the reasons why developing here. The small sample consist only of daughters who have a depressed mother, but who have not yet become depressed themselves and have low scores on a child depression checklist. Just how representative is the sample? What proportion of daughters this age of depressed women would meet these criteria? How are they similar or different from daughters who have already become depressed? Do the differences lie in their mothers or in the daughters or both? We can’t address any of these questions, but they are highly relevant. That’s why we need more larger clinical epidemiologic studies and fewer small studies of poorly defined samples. Who knows what selection biases are operating?

Searching the literature for what this lab group was doing in other studies in terms of mother and daughter recruitment, I came across a number of small studies of various psychological and psychobiological characteristics of the daughters. We have no idea whether the samples are overlapping or distinct. We have no idea about how the results of these other modest studies confirm or contradict results of the present one. But integrating their results with the results of the present study could have been a start in better understanding it.

As noted in my post at Science Based Medicine, we get a sense of the methods section of the Molecular Psychiatry article of unreliability in single assessments of telomeres. Read the description of the assay of telomere length in the article to get a sense of the authors having to rely on multiple measurements, as well as the unreliability of any single assessment. Look at the paragraph beginning

To control for interassay variability…

This description reflects the more general problems in the comparability of assessment of telomeres across individuals, samples, and laboratories problems that, that preclude recommending telomere length as a biomarker or surrogate outcome with any precision.

Results and Interpretation

As in the methods, the authors fail to supply basic details of the results and leave us having to trust them. There is a striking lack of simple descriptive statistics and bivariate relations, i.e., simple correlations. But we can see signs of unruly, difficult to tame data and spun statistics. And in the end, there are real doubts that there is any connection in these data between telomeres and cortisol.

The authors report a significant difference in telomere length between the daughters of depressed women versus daughters in the control group. Given how the data had to be preprocessed, I would really like to see a scatter plot and examine the effects of outliers before I came to a firm conclusion. With only 50 daughters of depressed mothers and 40 controls, differences could have arose from the influence of one or two outliers.

We are told that the two groups of young girls did not differ in Tanner scores, i.e., self-reported signs of puberty. If the daughters of depressed women had indeed endured “accelerated aging,” would it be reflected in Tanner scores? The authors and for that matter, Insel, seem to take quite literally this accelerated aging thing.

I think we have another seemingly large difference coming from a small sample that is statistically improbable to yield such a difference, given past findings. I could be convinced by these data of group differences in telomere length, but only if findings were replicated in an independent, adequately sized sample. And I still would not know what to make of them.

The authors fuss about  anticipating a “dysregulation of the hypothalamic-pituitary-adrenal (HPA) axis and anomalous levels of cortisol secretion.” They indicate that the cortisol data was highly skewed and had to be tamed by winsorizing, i.e., substituting arbitrary values for outliers. We are not told for how many subjects this was done or from which group they came. The authors then engaged in some fancy multivariate statistics, “a piecewise linear growth model to fit the quadratic nature of the [winsorized] data.”  We need to keep in mind that multilevel modeling is not a magic wand to transform messy data. Rather, it involves some assumptions that need to be tested and not assumed. We get no evidence of the assumptions being tested and the small sample sizes is such that they could not be reliably tested.

The authors found no differences in baseline cortisol secretion. Moreover, they found no differences in distress recovery for telomere length, group (depressed versus nondepressed mother), or group by telomere interaction. They found no effect for group or group by telomere interaction, but they did find a just significant (p< .042) main effect for telomere length on cortisol reactivity. This would not to seem to offer much support for a dysregulation of the HPA axis or anomalous levels of cortisol secretion associated with group membership (having a depressed versus nondepressed mother). If we are guided by the meta-analysis of depression and cortisol secretion, the authors should have obtained a group difference in recovery, which they didn’t. I really doubt this is reproducible in a larger, independent sample, with transparently reported statistics.

Recognize what we have here: prestigious journals like Molecular Psychiatry have a strong publication bias in requiring statistical significance. Authors therefore must chase and obtain statistical significance. There is miniscule difference from p<.042 and p<.06 – or p<.07, for that matter – particularly in the context of multivariate statistics being applied to skewed and winsorized data. The difference is well within the error of messy measurements. Yet if the authors had obtained p<.06 or p<.07, we probably wouldn’t get to read their story, at least in Molecular Psychiatry.*

Stay tuned for my next installment in which I compare results of this study to the press release and coverage in Insel’s personal blog.  I particularly welcome feedback before then.

*For a discussion of whether “The number of p-values in the psychology literaturethat barely meet the criterion for statistical significance (i.e., that fall just below .05) is unusually large,” see Masicampo and LaLande (2012)  and Lakens (2015).

Category: biomarkers, HPA Axis, hype, maternal depression, Mind-body, stress | Tagged , , , , | 3 Comments

Will following positive psychology advice make you happier and healthier?

smile or dieSmile or Die the European retitling of Barbara Ehrenreich’s realist, anti-positive-psychology book Bright Sided:How Positive Thinking Is Undermining Americacaptures the threat of some positive psychology marketers’ advice: if you do not buy what we sell, you will face serious consequences to your health.

Barbara Fredrickson, along with co-authors including Steven Cole, make the threat that if we simply pursue pleasure in our lives rather than meaning, there will be dire consequences for our immune system by way of the effects on genomic expression.

People who are happy but have little-to-no sense of meaning in their lives have the same gene expression patterns as people who are enduring chronic adversity.

A group consisting of Nick Brown, Doug McDonald, Manoj Samanta, Harris Friedman and myself obtained and reanalyzed the data on which Fredrickson et al based their claim. We concluded:

Not only is Fredrickson et al.’s article conceptually deficient, but more crucially statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.

objecrtive approach to moralIn workshops, books, and lucrative talks to corporate gatherings, Fredrickson promises that practicing the loving-kindness meditation that she markets will send you on an upward spiral of physical and mental health that ends who knows where.

My co-authors – this time, Nick Brown, Harris Friedman and James Heathers– and I examined her paper and obtained her data. Re-analyses found no evidence that loving-kindness meditation improved physical health. The proxy measure for physical health in this study – cardiac vagal tone – is not actually reliably related to objective measures of physical health and probably wouldn’t be accepted in other contexts. And it was not affected by loving-kindness meditation anyway.

Katrinaloverimages http://www.fanpop.com/clubs/katerinalover/images/30154750/title/dont-worry-happy-photo

Katrinaloverimages
http://www.fanpop.com/clubs/katerinalover/images/30154750/title/dont-worry-happy-photo

The simplest interpretation of Fredrickson’s interrelated and perhaps overlapping studies of loving-kindness meditation is that lots of people drop out from follow-up and any apparent effect of the meditation is actually due to unexplained deterioration in the control group. And though data concerning the participants’ practice of mediation were collected, none were presented concerning whether participants assigned to mediation actually practiced it or how it affected physical and mental health outcomes. Why were the data collected if they were not going to be reported? They could be used to address the crucial question of whether actually practicing meditation affects health and well-being.

Another queen of positive psychology advice, Sonia Lyubomirsky, proclaims in a highly cited paper:

The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.

I showed these claims are based on a faulty meta-analysis of methodologically-poor studies. In addition to Lyubomirsky’s highly-cited meta-analysis, I examined a more recent and better meta-analysis by Bolier and colleagues. It showed that the smaller and poorer-quality a study of positive psychology interventions is, the stronger the effect size. With the more recent studies included in Bolier’s meta-analysis, I concluded:

The existing literature does not provide robust support for the efficacy of positive psychology interventions for depressive symptoms. The absence of evidence is not necessarily evidence of an absence of an effect. However, more definitive conclusions await better quality studies with adequate sample sizes and suitable control of possible risk of bias. Widespread dissemination of positive psychology interventions, particularly with glowing endorsements and strong claims of changing lives, is premature in the absence of evidence they are effective.

I’m quite confident that this conclusion holds for effects on positive affect and general well-being as well.

Actually, when Lyubomirsky attempted to demonstrate the efficacy she claims for positive psychology interventions, she obtained null results but relegated her findings to a book chapter that was not peer reviewed. Yet, her marketing of the claim that positive psychology interventions improve well-being continues undaunted and gets echoed in the most recent papers coming out of the positive psychology community, such as:

Robust evidence exists that positive psychology interventions are effective in enhancing well-being and ameliorating depression.

Advice gurus claim that practicing positive psychology interventions will lead to health and well-being without a good scientific basis. But another literature attempts to identify small changes in everyday and laboratory behavior that can have lasting benefits. These studies are not explicitly evaluating interventions, but the claim is that they identify small behaviors with potentially big implications for well-being and happiness.

Let’s start with an example from the Wall Street Journal (WSJ):

Walk this way: Acting happy can make it so

Research shows people can improve their mood with small changes in behavior

Elizabeth Dunn, Associate Professor of Psychology at the University of British Columbia, provides an orientation:

There are these little doses of social interactions that are available in our day” that can brighten our mood and create a sense of belonging. “I don’t think people recognize this.”

depressed postureThe article starts with a discussion of work by Johannes Michalak from the Department of Psychology and Psychotherapy at Witten Herdecke University, Germany. In one study, 30 depressed psychiatric inpatients were randomized to instructions to sit in either a slumped (n =15) or an upright (n =15) position and then completed a memory test. The idea is that an emotion like depression is embodied. Adopting a slumped posture should increase a depressive negative bias in recall. The abstract of the original article reports:

Upright-sitting patients showing unbiased recall of positive and negative words but slumped patients showing recall biased towards more negative words.

Michalak conducted another study in which the gait of 39 college students was manipulated with biofeedback so as to simulate either being depressed or nondepressed as they walked on a treadmill. During the period on the treadmill, the experimenter read 40 words to them and they were tested for recall. The abstract of the original study reports:

The difference between recalled positive and recalled negative words was much lower in participants who adopted a depressed walking style as compared to participants who walked as if they were happy.

As would be with expected with such small sample sizes, results were weak. Analyses were unnecessarily complicated. It’s not clear that effects would persist if more basic statistics were  presented. For instance, did patients assigned to the “depressive” slump condition in the first study recall fewer positive and more negative words, or both, or neither? Certainly in the second study with college students, there were no differences in recall of positive words and only small differences in recall of negative words. Claims in the abstract were based on the construction of a more complicated composite positive variable.

Michalak is following a familiar strategy in the positive psychology literature – indeed, one that is more widely followed in psychology: If you cannot obtain positive findings in straightforward, simple analyses, then (1) adopt flexible rules of analyses, such as selective introduction of covariates and making up new composite variables; (2) don’t report the simple statistics and analyses in tables where readers could check them; and (3) spin your results in the abstract because that is what most readers will rely on in deciding what your study found

Michalak claims that these studies point to manipulation of the embodiment of depression as a means of treating depression:

There is a mutual influence between mood and body and movement…There might be specific types of movements that are specific characteristics of depression and this feeds the lower mood. So it’s a vicious cycle.

Presumably, with this as a premise, depressed patients could obtain a clinically-significant improvement in mood if they sat up straight and walked faster. Maybe, but this is at best speculative and premature. Michalak does not directly test the take-away message the author of the WSJ article wants to give: Even if you are not depressed, you can improve you mood by sitting up straight and adopt what Michalak calls a “happy walking style.”

To be fair to Michalak, he may be pumping up the strength and significance of his findings and promoting himself a bit. But unlike the rest of the authors discussed in the WSJ article, he is not yet prematurely turning some scientific papers of modest significance and strength of findings into press releases, TED talks and positive psychology products like books and workshops.

But let’s turn to the work of Nicholas Epley that is next described in the article. Epley is a Professor of Behavioral Science, University of Chicago Booth School of Business and author of Mindwise: How We Understand What Others Think, Believe, Feel, and Want. Epley does not have a TED talk, but got mentioned in the Business Blog of the Financial Times as not needing one. And he’s available through the Washington Speakers Bureau, whose website proclaims it is “connecting you with the world’s greatest minds”.

According to the WSJ article:

“I used to sit in quiet solitude on the train,” Dr. Epley said. “I don’t anymore. I know now from our data that learning something interesting about the person sitting next to me would be more fun than pretty much anything else I’d be doing then,” he said.

This is a reference to his article

Epley, N., & Schroeder, J. (2014). Mistakenly seeking solitude. Journal of Experimental Psychology: General, 143(5), 1980.

talking to strangersThe study actually involves trains, buses, and taxis and giving different participants instructions to either connect with strangers, remain disconnected, or commute as normal.

In the train experiment, a composite measure was substituted for a simpler measure of whether these various strategies made participants happier:

To obtain an overall measure of positivity, we first calculated positive mood (happy minus sad), then standardized positive mood and pleasantness, and then averaged those two measures into a single index.

Epley study one

Error bars represent the standard error around the mean of each condition.

A one-way analysis of variance indicated a significant difference among the three conditions (p< .05) explaining a modest 6% of participants’ variation in the composite measure of mood. However, consulting Figure 1 in the article suggests the effect was in the difference between instructions to remain disconnected versus the other two conditions, not between connecting with strangers versus commuting as normal. See an excerpt of Figure 1 to the left.

Results were somewhat stronger when the experiment was replicated on buses, (p = .02), with 10% of the variance in participant mood explained by the condition to which participants were assigned. See figure to the right.Epley bus experiment 2

When the experiment involved talking to a taxi driver, significant results were obtained (p <.01). But this time, pairwise differences between conditions were tested and there was no significant difference between the connecting and the control condition, only between the control condition and the condition in which participants were instructed not to talk to the taxi driver.

This may not be rocket science, but it is apparently worthy of press releases, media coverage, and positive psychology products. The results are overall weak and may even disappear in straightforward analyses with simple measures of happiness. The most robust interpretation I could construct was if someone asks you to refrain from talking to others on the train or bus or even a taxi driver, you probably should ignore them. I offer this advice for free, and have no intention of presenting it in a TED talk with unattributed anecdotes.

Re-enter Elizabeth Dunn, Associate Professor of Psychology at University of British Columbia. Dr. Dunn is the author of Happy Money: The Science of Smarter Spending, has presented a  TED talk. She is available as a speaker through the Lavin Agency, which according to its website, is “making the world a smarter place.” The WSJ article reports on:

Sandstrom, G. M., & Dunn, E. W. (2013). Is efficiency overrated? Minimal social interactions lead to belonging and positive affect. Social Psychological and Personality Science,

Participants were instructed to either avoid any unnecessary conversation with a barista at Starbuck’s and simply be efficient in getting their coffee or:

“have a genuine interaction with the cashier—smile, make eye contact to establish a connection, and have a brief conversation.”

The journal article reports that participants instructed to make a “genuine connection” had more positive affect and less negative affect than those instructed to avoid unnecessary conversation. Unfortunately, unlike the Epley experiment, we are not given any comparison with a control condition, which would’ve clarified whether the effect was primarily due to instructing participants to have a “genuine connection” or to avoid conversation.

Then there are Professor Dunn’s student Jordi Quoidbach’s chocolate experiments, which have been promoted not only in this WSJ article, but in Dunn’s op-ed in the New York Times, “Don’t indulge. Be happy.”

The first of the studies was:

Quoidbach, J., Dunn, E. W., Petrides, K. V., & Mikolajczak, M. (2010). Money Giveth, Money Taketh Away: The Dual Effect of Wealth on Happiness. Psychological Science.

stack-50-euro-bills-isolated-white-19404122The study involved priming participants with a reminder of wealth – a photo of a large stack of Euro bills – or a similar photo that was blurred beyond recognition in the control condition. The 40 participants were then instructed to eat a piece of chocolate and complete a follow-up questionnaire.

As seen in the other studies, simple analyses were suppressed in favor of a more complex analysis. Namely, preliminary examination of the data revealed that female participants savored chocolate more than males. So, rather than simple t-test, analyses of covariance were conducted with gender and prior attitude towards chocolate as control variables. Note that they were only 20 participants per group to begin with, and so results of these multivariate analyses are quite dubious. Participants primed with the money photo spent less time eating the piece of chocolate and were rated by observers as enjoying it less.

Studies involving priming participants with seemingly irrelevant, but suggestive stimuli such as this one are now held in low regard and some feel they have contributed to the crisis of confidence in social psychology. Nobel Prize winner Daniel Kahneman suggests the lack of replicability of social priming research is “a train wreck looming” for social psychology. Results are often too good to be true and cannot be replicated. Jordi Quoidbach and her colleagues cite studies by Vohs(1,2) as supporting the validity of the manipulation, however, the two primary studies by Vohs could not be independently replicated.

Overall, this is an underpowered-study with results that probably depended on flexible analyses rather than simple ones. We would probably ignore it, except it appeared in the prestigious journal Psychological Science and has been hyped in the media and positive psychology products.

The second chocolate study was:

Quoidbach, J., & Dunn, E. W. (2013). Give It Up: A Strategy for Combating Hedonic Adaptation. Social Psychological and Personality Science, 4(5), 563-568.

chocolateThe study involved asking 64 participants to eat two pieces of chocolate into lab sessions separated by a week. Analyses were based on the 55 who showed up for the second session. Participants had been randomized to one of three conditions: restricted access (n = 16) in which participants were told not to buy any chocolate until the next lab session; abundant access (n = 18) in which participants were given two pounds of chocolate and told to eat as much as they comfortably could before the next lab session; and a control condition in which no explicit instructions (n = 21) were given. In the second lab session, all participants were given a second piece of chocolate.

Once again we have a small study in which the authors deny readers an opportunity to examine simple statistical tests of what should be a simple hypothesis: that restricted versus free access has an effect on enjoyment of a piece of chocolate. Instead of a simple one-way analysis of variance, the authors looked at their data and decided to do the (unnecessarily) more complex analysis of covariance. Nonetheless, we can still see that in pairwise comparisons between groups, there are differences between the restricted access and the abundant access and control group. Yet there were no differences between the abundant access instructions and having no instructions for what to do in the week between sessions.

The authors did not provide readers with appropriate analyses of group differences in changes in overall positive affect between the two sessions. Nonetheless, within-group t-tests revealed a decline in overall positive affect only for the abundance condition.

So, another small study in which positive results probably depended on tricky flexible analyses. We would not be discussing this if it were not in a relatively prestigious journal, discussed in the WSJ, and written about by one of the authors in an op-ed piece in the New York Times. I invite your comparison of my analysis to the hyped presentation and exaggerated significance for the study claimed in the op-ed.

This blog post ends quite differently than I originally intended. I wanted to take some highly-promoted findings in the positive psychology literature about the effects of small things on overall well-being. I looked to a WSJl article reporting findings from basically prestigious journals with recognizably big name promoters  of positive psychology.

I had expected that positive psychology people out promoting their work and selling their products could surely come up with some unambiguous findings. I could then discuss how we could decide whether to attempt to translate those findings into strategies in our everyday lives and whether we could expect them to be sustained with any lasting impact on our well-being. Unfortunately, I didn’t get that far. Findings turned out to be not particularly positive despite being presented as such. That became an interesting story in itself, even if I will still have to search for robust findings from the positive psychology literature in order to discuss the likelihood that following “scientific” positive psychology advice will make us happier overall.

Despite heavily-marketed claims to the contrary, positive psychology interventions do not consistently improve mental or physical health and well-being. The myth that these interventions are efficacious is perpetuated by a mutually-admiring, self-promotional collective that protects its claims from independent peer review and scrutiny.

As with the positive psychology intervention literature, it is a quick leap from the authors submitting a manuscript to a peer-reviewed journal to making claims in the media, including op-ed pieces in the New York Times, and then releasing products like workshops and books that are lavishly praised by other members of the positive psychology community.

It is apparently too much to expect that positive psychology advice givers will take time out from their self-promotion to replicate what are essentially pilot studies before hitting the road and writing op eds again. And too much to expect that the Association of Psychological Science journals Psychological Science and Social Psychological and Personality Science will insist on transparent reporting of adequately powered studies as a condition for publication.

The incentives for scientifically sound positive psychology advice just aren’t there.

Special thanks to the Skeptical Cat, who is smarter, more independent,  and less easily led than the Skeptical Dog.skeptical sleuth cat 8 30 14-1

 

Category: depression, happiness, positive psychology, self-help | Tagged , , , , , | 4 Comments

Pay $1000 to criticize a bad ‘blood test for depression’ article?

pay to play-1No way, call for retraction.

Would you pay $1,000 for the right to criticize bad science in the journal in which it originally appeared? That is what it costs to participate in postpublication peer review at the online Nature Publishing Group (NPG) journal, Translational Psychiatry.

Damn, NPG is a high-fashion brand, but peer review is quite fallible, even at an NPG npgxJournal. Should we have to pay to point out the flawed science that even NPG inevitably delivers? You’d think we were doing them a favor in terms of quality control.

Put differently, should the self-correction on which scientific progress so thoroughly depends require critics be willing to pay, presumably out of their own personal funds? Sure, granting agencies now reimburse publication costs for the research they fund, but a critique is unlikely to qualify.

Take another perspective: Suppose you have asmall data set of patients for whom you have blood samples.  The limited value of the data set was further comporimsed by substantial, nonrandom loss to follow-up. But you nonetheless want to use it to solicit industry funding for a “blood test for depression.” Would you be willing to pay a premium of $3,600-$3,900 to publish your results in a prestigious NPG journal, with the added knowledge that it would be insulated from critics?

I was curious just who would get so worked up about an article that they would pay $1,000 to complain.

So, I put Translational Psychiatry in PUBLICATION NAME at Web of Science. It yielded 379 entries. I then applied the restriction CORRESPONDENCE and that left only two entries.

Both were presenting original data and did not even cite another article in Translational Psychiatry.  Maybe the authors were trying to get a publication into an NPG journal on the cheap, at a discount of $2,600.

P2PinvestIt appears that nobody has ever published a letter to the editor in Translational Psychiatry. Does that mean that there has never ever been anything about which to complain? Is everything we find in Translational Psychiatry perfectly trustworthy?

I recently posted at Mind the Brain and elsewhere about a carefully-orchestrated media campaign promoting some bad science published in Translational Psychiatry. An extraordinary publicity effort disseminated a Northwestern University press release and video to numerous media outlets. There was an explicit appeal for industry funding for the development of what was supposedly a nearly clinic-ready inexpensive blood test for depression.

The Translational Psychiatry website where I learned of these publication costs displays the standard NPG message that becomes mocking by a paywall that effectively blocks critics:

“A key strength of NPG is its close relationship with the scientific community. Working closely with scientists, listening to what they say, and always placing emphasis on quality rather than quantity, has made NPG the leading scientific publisher at finding innovative solutions to scientists’ information needs.”

The website also contains the standard NPG assurances about authors’ disclosures of conflicts of interest:

“The statement must contain an explicit and unambiguous statement describing any potential conflict of interest, or lack thereof, for any of the authors as it relates to the subject of the report”

The authors of this particular paper declared:

“EER is named as an inventor on two pending patent applications, filed and owned by Northwestern University. The remaining authors declare no conflict of interest.”

Does this disclosure give readers much clarity concerning the authors’ potential financial conflict of interest? Check out this marketing effort exploiting the Translational Psychiatry article.

Northwestern Researchers Develop RT-qPCR Assay for Depression Biomarkers, Seek Industry Partners

I have also raised questions about a lack of disclosures of conflicts of interest from promoters of Triple P Parenting. The developers claimed earlier that their program was owned by the University of Queensland, so there was no conflict of interest to declare. Further investigation  of the university website revealed that the promoters got a lucrative third of proceeds. Once that was revealed, a flood of erratum notices disclosing the financial conflicts of interest of Triple P promoters followed – at least 10 so far. For instance

triple P erratum PNG

Please Click to Enlarge

How bad is the bad science?

You can find the full Translational Psychiatry article here. The abstract provides a technical but misleading summary of results:

“Abundance of the DGKA, KIAA1539 and RAPH1 transcripts remained significantly different between subjects with MDD and ND controls even after post-CBT remission (defined as PHQ-9 <5). The ROC area under the curve for these transcripts demonstrated high discriminative ability between MDD and ND participants, regardless of their current clinical status. Before CBT, significant co-expression network of specific transcripts existed in MDD subjects who subsequently remitted in response to CBT, but not in those who remained depressed. Thus, blood levels of different transcript panels may identify the depressed from the nondepressed among primary care patients, during a depressive episode or in remission, or follow and predict response to CBT in depressed individuals.”

This was simplified in a press release that echoed in shamelessly churnalized media coverage. For instance:

“If the levels of five specific RNA markers line up together, that suggests that the patient will probably respond well to cognitive behavioral therapy, Redei said. “This is the first time that we can predict a response to psychotherapy,” she added.”

The unacknowledged problems of the article began with the authors only having 32 depressed primary-care patients at baseline and their diagnostic status not having been  confirmed by gold standard semi-structured interviews by professionals.

But the problems get worse. For the critical comparison of patients who recovered in cognitive behavioral therapy versus those that did not occurred in the subsample of nine recovered versus 13 unrecovered patients remaining after a loss-to-follow-up of 10 patients. Baseline results for the 9 +13= 22 patients in the follow-up sample did not even generalize back to the original full sample. How, then, could the authors argue that the results apply to the 23 million or so depressed patients in the United States? Well, they apparently felt they could better-generalize back to the original sample, if not the United States, by introducing an analysis of covariance that controlled for age, race and sex.  (For those of you who are tracking the more technical aspects of this discussion, contemplate the implications of controlling for three variables in a between-groups comparison of nine versus 13 patients. Apparently the authors believe that readers would accept the adjusted analyses in place of the unadjusted analyses which had obvious problems of generalizability. The reviewers apparently accepted this.).

Finally, treatment with cognitive behavior therapy was confounded with uncontrolled treatment with antidepressants.

I won’t discuss here the other problems of the study noted in my earlier blog posts. But I think you can see that these are analyses of a small data set truly unsuitable for publication in Translational Psychiatry and serving as a basis for seeking industry funding for a blood test for depression.

As I sometimes do, I tried to move from blog posts about what I considered problematic to a formal letter to the editor to which the authors would have an opportunity to reply. It was then that I discovered the publication costs.

So what are the alternatives to a letter to the editor?

Letters to the editor are a particularly weak form of post-publication peer review. There is little evidence that they serve as an effective self-correction mechanism for science. Letters to the editor seldom alter the patterns of citations of the articles about which they complain.

pay to paly2Even if I paid the $1,000 fee, I would only have been entitled to 700 words to make my case that this article is scientifically flawed and misleading. I’m not sure that a similar fee would be required from the authors to reply. Maybe responding to critics is part of the original package that they purchased from NPG. We cannot tell from what appears in the journal because the necessity of responding to a critic has not yet occurred.

It is quite typical across journals, even those not charging for a discussion of published papers, to limit the exchanges to a single letter per correspondent and a single response from the authors. And the window for acceptance of letters is typically limited to a few weeks or months after an article has appeared. While letters to the editor are often peer-reviewed, replies from authors typically do not receive peer review.

A different outcome, maybe

I recently followed up blogging about the serious flaws of a paper published in PNAS by Fredrickson and colleagues with a letter to the editor. They in turn responded. Compare our letters to see why the uninformed reader might infer that only confusion had been generated by either of them. But stay tuned…

The two letters would have normally ended any exchange.

However, this time my co-authors and I thoroughly re-analyzed the Fredrickson et al data and PNAS allowed us to publish our results. This time, we did not mince words:

“Not only is Fredrickson et al.’s article conceptually deficient, but more crucially statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.”

In the supplementary materials, we provided in excruciating detail our analytic strategy and results. The authors’ response was again dismissive and confusing.

The authors next refused our offer for an adversarial collaboration in which both parties would lay responses to each other with a mediator in order to allow readers to reach some resolution. However, the strengths of our arguments and reanalysis – which included thousands of regression equations, some with randomly generated data – are such others have now calling for a retraction of the original Fredrickson and Cole paper. If that occurs, it would be an extraordinarily rare event.

Limits on journals impose on post-peer-review commentary severely constrain the ability of science to self-correct.

The Reproducibility Project: Psychology is widely being hailed as a needed corrective for the crisis of credibility in science. But replications of studies such as this one involving pre-post sampling of genomic expression from an intervention trial are costly and unlikely to be undertaken. And why attempt a “replication” of findings that have no merit in the first place? After all, the authors’ results for baseline assessments did not replicate in the baseline results of patients still available at follow-up. That suggests a table problem, and that attempts at replication would be futile.

plosThe PLOS journals have introduced the innovation of allowing comments to be placed directly at the journal article’s webpage, with their existence acknowledged on the article itself. Anyone can respond and participate in a post-publication peer review process that can go on for the life of the interest in a particular article. The next stage in furthering post-publication peer review is that such comments be indexed and citable and counted in traditional metrics, as well as altmetrics. This would recognize citizen scientists’ contributions to cleaning up what appears to be a high rate of false positives and outright nonsense in the current literature.

pubmedcommonsPubMed Commons offers the opportunity to post comments on any of the over 23 million entries in PubMed, expanding the PLOS initiative to all journals, even those of the Nature Publishing Group. Currently, the only restriction is that someone attempting to place a comment have authored any of the 23,000,000+ entries in PubMed, even a letter to the editor. This represents progress.

But similar to the PLOS initiative, PubMed Commons will get more traction when it can provide conventional academic credit –countable citations– to contributors identifying and critiquing bad science. Currently authors can get credit for putting bad science into the literature that no one can get for  helping getting it recognized as such.

So, the authors of this particular article have made indefensibly bad claims about having made substantial progress toward developing an inexpensive blood test for depression. It’s not unreasonable to assume their motive is to cultivate financial support from industry for further development. What’s a critic to do?

In this case, the science is bad enough and the damage to the public and professionals’ perception of the state of the science of ‘blood test for depression’ sufficient for  a retraction is warranted. Stay tuned – unless Nature Publishing Group requires a $1,000 payment for investigating whether an article warrants retraction.

Postscript: As I was finishing this post, I discovered that the journals published by the  Modern Language Society requires  payment of a $3,000 membership fee to publish a letter to the editor in one of their journals. I guess they need to keep the discussion within the club. 

Views expressed in this blog post are entirely those of the author and not necessarily those of PLOS or its staff.

Special thanks to Skeptical Cat.skeptical sleuth cat 8 30 14-1

Category: blood test, Conflict of interest, depression, ethics, mental health care, open access, Peer review | Tagged , , , | 4 Comments

Failing grade for highly cited meta-analysis of positive psychology interventions

The many sins of Sin and  Lyubomirsky

failing gradeI recently blogged about Linda Bolier and colleagues’  meta-analysis of positive psychology interventions [PPIs] in BMC Public Health. It is the new kid on the block. Sin and Lyubomirsky’s  meta analysis is accepted as the authoritative summary of the evidence and has been formally identified by Web of Science as among the top 1% in terms of citations of papers in psychology and psychiatry for 2009, with 187 citations according to Web of Science ,487 citations according to Google Scholar.

This meta-analysis ends on a resoundingly positive note:

Do positive psychology interventions effectively boost well-being and ameliorate depression? The overwhelming evidence from our meta-analysis suggests that the answer is ‘‘yes.’’ The combined results of 49 studies revealed that PPIs do, in fact, significantly enhance WB, and the combined results of 25 studies showed that PPIs are also effective for treating depressive symptoms. The magnitude of these effects is medium-sized (mean r =.29 for WB, mean r= .31 for depression), indicating that not only do PPIs work, they work well.

According to Sin and  Lyubomirsky , the strength of evidence justifies PPIs be disseminated and implemented in the community:

The field of positive psychology is young, yet much has already been accomplished that practitioners can effectively integrate into their daily practices. As our metaanalysis confirms, positive psychology interventions can materially improve the wellbeing of many.

The authors also claimed to have dispensed with concerns that clinically depressed persons may be less able to benefit from PPIs.  Hmm…

In this blog post I will critically review Sin and  Lyubomirsky’s meta-analysis, focusing on effects of PPIs on  depressive symptoms, as I did in the  earlier blog post concerning Bolier and colleagues’  meta-analysis. As the title of this blog post suggests, I found the Sin and  Lyubomirsky meta-analysis misleading, falling far short of accepted standards for doing and reporting meta-analyses. I hope to convince you that authors who continue to cite this meta-analysis are either naïve, careless, or eager to promote PPIs in defiance of the available evidence. And I will leave you with the question of what its uncritical acceptance and citation says about that the positive psychology community’s standards.

Read on and I will compare and contrast the Sin and  Lyubomirsky and meta analyses and you will get a chance to see how to grade the meta-analysis using the validated checklist, AMSTAR.

stop sign[If you are interested in using AMSTAR yourself  to evaluate the Sin and  Lyubomirsky and Bolier and colleagues’  meta-analysis independently, this would be a good place to stop and get the actual checklist and the article explaining it.].

The Sin and  Lyubomirsky meta-analysis

The authors indicate the purpose of the meta-analysis was to

Provide guidance to clinical practitioners by answering the following vital questions:

  • Do PPIs effectively enhance WB and ameliorate depression relative to control groups and, if so, with what magnitude?
  • Which variables—with respect to both the characteristics of the participants and the methodologies used—moderate the effectiveness of PPIs?

Similar to Bolier and colleagues, this meta-analysis focused primarily on interventions

aimed at increasing positive feelings, positive behaviors, or positive cognitions, as opposed to ameliorating pathology or fixing negative thoughts or maladaptive behavior patterns.

However, Sin and  Lyubomirsky’s  meta-analysis was less restrictive than Bolier et al in including interventions such as mindfulness, life review therapy, and forgiveness therapy.  These approaches were not developed explicitly within the positive psychology framework, even if they’ve been appropriated by positive psychology.

Positive psychologists have a bad habit of selectively claiming older interventions as their own, as they did with specific interventions from Aaron T Beck’s cognitive therapy for depression. We need to ask if what is considered effective in “positive psychology interventions” is new and distinctly positive psychology or if what is effective is mainly what is old and borrowed from elsewhere.

worse than itSin and  Lyubomirsky’s  meta-analysis also differs from Bolier et al in including nonrandomized trials, although that was nowhere explicitly acknowledged. Sin and  Lyubomirsky included studies in which what was done to student participants depended on what classrooms they were in, not on their individually being randomized. Lots of problems are introduced. For instance, any pre-existing differences associated with students being in particular classrooms are attributed to the participants having gotten PPIs. One should not combine studies with randomization by individual with studies in which interventions depended on being in particular classrooms – unless perhaps, a check is been made statistically of whether they can be considered in the same class of interventions.

[I know, I’m getting into technical details that casual readers of the meta-analysis might want to ignore, but the validity of authors’ conclusions depend on such details. Time and time again, we will see Sin and  Lyubomirsky not providing them.]

Using AMSTAR

If authors have done a meta-analysis and want to submit it to a journal like PLOS One, they must accompany their submission with a completed PRISMA checklist. That is to allow the editor and reviewers to determine whether you’ve provided basic details need for them and for future readers to evaluate for themselves what you actually did. PRISMA is a checklist about transparency in reporting, and does not evaluate the appropriateness or competency of what authors do. Authors can do meta-analysis badly and still score points on PRISMA because readers got the details have the details to see for themselves.

In contrast, AMSTAR evaluates both what is reported and what was done. So, authors don’t get points for reporting how  they did the meta-analyses inappropriately. And unlike a lot of checklists, the items of AMSTAR has been externally validated.

One final thing, before we start, is that you can add up the number of items for which he meta-analysis meets AMSTAR criteria, but a higher score does not indicate that one meta-analysis is better than another. That’s because some items are more important than others in terms of what the authors of meta-analysis have done and whether they’ve given enough details to readers. So, two meta-analyses may get the moderate score using AMSTAR, but may differ in whether the items which they didn’t meet are fatal to the meta-analyses being able to make a valid contribution to the literature.

Some of the problems of Sin and Lyubomirsky’s meta-analysis revealed by AMSTAR

5. Was a list of studies (included and excluded) provided?

While a list of it included studies was provided, there was no list of excluded studies. It is confusing, for instance, why Barbara Fredrickson et al.’s (2008) study of loving kindness meditation with null findings is never mentioned. The study is never identified as a randomized trial in the original article, but is subsequently cited by Barbara Fredrickson and many others within positive psychology as such. That’s a serious problem with the positive psychology literature: you never know when an experimental manipulation is a randomized trial or whether a study will be later cited as evidence of the effectiveness of positive psychology interventions.

Most of the rest of the psychological intervention literature adheres to CONSORT and one of the first requirements is that articles indicate either in their title or abstract that a randomized trial is being discussed. So, when it comes to a meta-analysis of PPIs, it, is particularly important to know what studies were excluded so that readers can judge how that might have affected the effect size that was obtained.

6. Were the characteristics of the included studies provided?

Sin and  Lyubomirsky’s Table 1 is incomplete and misleading in reporting characteristics of the included studies. It doesn’t indicate whether or not studies involved randomization. It is misleading in indicating that studies selected for depression, because it lumps together studies that used a self-report measure of mildly depressed students selected on the basis of self-report questionnaires who were not necessarily clinically depressed in with patients with more severe who meet criteria for formal clinical diagnoses.  The table indicates sample size, but it is not sample size that matters most, but the size of the smallest group, whether intervention or control. A number of positive psychology studies have a big imbalance in the size of the intervention versus the control group. So, there may be a seemingly sufficient number of participants in the study, but the size of the control group would make the study underpowered, with a suspicion that effect sizes were exaggerated.

7. Was the scientific quality of the included studies assessed and documented?

card_3_monkeys_see_no_evil_hear_no_evil_see-ra33d04ad8edf4f008e5230ac381ec8b0_xvuak_8byvr_512Sin and  Lyubomirsky made no effort to evaluate the quality of the included studies! That is a serious, fatal flaw.

On this basis alone, I would judge the meta-analyses either to have somehow evaded adequate peer review or that the editor of Journal of Clinical Psychology and reviewers of this particular paper were incompetent. Certainly this problem would not have been missed at PLOS One and I would hope that other journals were readily picked it up.

Bolier and colleagues explained their rating system and presented its application in evaluating the individual trials included in the meta-analysis. Readers had the opportunity to examine the rating system and its application. We were able to see that the studies evaluating positive psychology interventions tend to be of low quality. We can also see that the studies producing the largest effect sizes tend to be those of the lowest quality and small size.

I was somewhat critical of Bolier and colleagues in an earlier blog, because they liberalized the quality rating scales in order to even be able to conduct a meta-analysis. Nonetheless, they were transparent enough to allow me to make that independent evaluation. Because we have their readings available, we can extrapolate to the studies included in Sin and Lyubomirsky and be warned that this analysis is likely to provide an overly positive evaluation of PPIs. But we have to go outside of what in Sin and Lyubomirsky provides.

8. Was the scientific quality of the included studies used appropriately in formulating conclusions?

AMSTAR indicates

The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations.

Sin and Lyubomirsky could not take quality into account in interpreting their meta-analysis because they did not rate quality. And so they didn’t allow readers a chance to use quality ratings to independently evaluate for themselves.  We are now further in the realm of fatal flaws. We know from other sources that much of the “evidence” for positive psychology interventions comes from small, underpowered studies likely to produce exaggerated estimates of effects. If this is not taken into account, conclusions are invalid.

9. Were the methods used to combine the findings of studies appropriate?

AMSTAR indicates

For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e. Chi-squared test for homogeneity, I²). If heterogeneity exists a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e. is it sensible to combine?).

Sin and Lyubomirsky used an ordinary chi-squared test and found

the set of effect sizes was heterogeneous (c2(23) = 146:32, one-tailed p < 2 x 10-19), indicating that moderators may account for the variation in effect sizes.

[I’ll try to be as non-technical as possible in explaining a vital point. Do try to struggle through this, rather than simply accepting my conclusion this one statistic alone indicates a meta-analysis seriously in trouble. Think of it like a warning message on your car dashboard that should compel you to immediately drive to the side of the road, sure the engine, and call a tow truck].

Tests for heterogeneity basically tell you whether there are enough similarities between the effect sizes for individual studies to warrant combining them. A test for heterogeneity examines whether  the likelihood of too much variation can be rejected within certain limits. The Cochrane collaboration specifically warns against using an ordinary chi-squared test to test for heterogeneity, because it is low powered in situations where the studies vary greatly in sample size, with some of them being small sized. The Cochrane collaboration percent the number of alternatives derived from the chi-square which quantify inconsistency in effect sizes, such as Q and I2. Sin and Lyubomirsky didn’t use either of these, but instead use the standard chi-square, which is prone to miss problems in inconsistency between studies.

wowBut don’t worry, the results are so wild that serious problems are indicated. Look above to the significance of the chi-square that  Sin and Lyubomirsky report. Have you ever seen anything so highly significant : p<. 0000000000000000002?

Rather than panicking like they should have, Sin and Lyubomirsky simply proceeded to examine moderators of effect size and concluded that most of them did not matter for depressive symptoms, including initial depression status of participants and whether participants individually volunteered to be in the study, rather than being assigned because they were in a particular classroom.

Sin and Lyubomirsky’s moderator analyses are not much help in figuring out what was going wrong. If they had examined quality of the studies and sample size, they would’ve gotten on the right path. But they really don’t have many studies, and so they can’t carefully examine these factors. They are basically left with a very serious warning not to proceed, but do so anyway. Once again, where the hell was the editor and reviewers when they could have saved Sin and Lyubomirsky from embarrassing themselves and misleading readers?

10. Was the likelihood of publication bias assessed?

AMSTAR indicates

An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test).

Bolier and colleagues provided a funnel plot of effect sizes in gave a clear indication that small studies with negative or null effects were somehow missing from the studies they had selected for the meta-analysis. Readers with some familiarity meta-analysis can interpret for themselves.

Sin and Lyubomirsky did no such thing. Instead they used Rosenthal’s failsafe N to give readers a false reassurance that hundreds of unpublished null studies of PPIs had to be lurking in drawers in order for their glowing assessment to be unseeded. Perhaps they should be forgiven for using failsafe N because they acknowledged Rosenthal has a consultant. But outside of psychology, experts on meta-analysis reject failsafe N as providing false reassurance.

11. Was the conflict of interest stated?

AMSTAR indicates

Potential sources of support should be clearly acknowledged in both the systematic review and the included studies.

Lyubomirsky had already published The How of Happiness:  A New Approach to Getting the Life You Want. Its extravagant claims prompted a rare display of negativity from within the positive psychology community, an insightful negative review from the editor of Journal of Happiness Studies.

goods to declare_redConflict of interest in the authors – many of whom are also involved in the sale of positive psychology products – of the actual studies was ignored. We certainly know from analyses of studies conducted by pharmaceutical companies that the prospect of financial gain tends to lead to exaggerated effect sizes. Indeed, my colleagues and I were awarded the Bill Silverman award from the Cochrane collaboration for alerting them to its lack of attention to conflict of interest as a formal indicator of risk of bias. The collaboration is now in the process of revising their risk of bias tool to incorporate conflict of interest is a consideration.

Conclusion

omgSin and  Lyubomirsky provides a biased and seriously flawed assessment of  the efficacy of positive psychology interventions. Anyone who uncritical cites this paper is either naïve, careless, or bent on presenting a positive evaluation of positive psychology interventions in defiance of available evidence.  Whatever limitations I pointed out to the meta-analysis of Bolier and colleagues, I prefer it to this one. Yet just watch. I predict Sin and  Lyubomirsky will continue to be cited without acknolwedging Bolier and colleagues. If so, it will add to lots of other evidence of the confirmatory bias and lack of critical thinking within the positive psychology community.

Postscript

Presumably if you’re reading this postscript, you’ve read through my scathing analysis. But I noticed something was wrong in my initial 15 minute casual reading of the meta-analysis after completion of my blog post concerning  about Linda Bolier and colleagues. Among the things I noted was

  1. In their introduction, Sin and Lyubomirsky made positive statements about the efficacy of PPIs based on two underpowered, flawed studies (Fava et al., 2005; Seligman et al., 2006 ) that were outliers in Bolier and colleagues’ analyses. Citing these two studies as positive evidence suggests both prejudgment and a lack of application of critical skills that foreshadowed what followed.
  2. Their method section gave no indication of attention to quality of studies they were going to review. Bad, bad.
  3. Their method section declared that they would use one tailed tests for the significance of effect sizes. Since the 1950s, psychologists consistently rely on two-tailed tests. Unwary readers might except one tailed tests of p<.05 with a more customary two-tailed test would be p<.10  The same results. Reliance on one tailed test is almost always an indication of a bias towards finding significant results or attempts to mislead readers.
  4. The article included no forest plot that would’ve allowed a quick assessment of the distribution of effect sizes, whether they differed greatly, and whether some were outliers. As I analyzed in a earlier blog post, Bolier and colleagues’ inclusion of a forest plot, along with details in the table 1, allowed quick assessment that the overall effect size for positive psychology interventions was strongly influenced by outlier small studies of poor methodological quality.
  5. The wild chi-square concerning heterogeneity was glossed over.
  6. The resounding positive assessment of positive psychology interventions that open the discussion was subsequently contradicted by acknowledgment of some, but not the most serious limitations of the meta-analysis. Other conclusions in the discussion section were not based on any results of the meta-analysis.

I speak only for myself, and not for the journal PLOS One or the other Academic Editors.  I typically take 15 minutes or so to decide whether to send a paper out for review. My perusal of this one would have led to sending it back to the authors, requesting that they attempt to adhere to basic standards for conducting and reporting meta-analyses, before even considering resubmission to me. If they did resubmit, I would check again before even sending out to reviewers. We need to protect reviewers and subsequent readers from meta-analyses that are not only poorly conducted, but that lack transparency in to promoting interventions with undisclosed conflicts of interest.

 

 

Category: Conflict of interest, depression, meta analysis, positive psychology, Publication bias | Tagged , , | 9 Comments

Re-examining Ellen Langer’s classic study of giving plants to nursing home residents

 Memories of famous studies do not serve us well

A journalist emailed me a request for an interview about the work of Ellen Langer. I was busy hopping from Leipzig to Istanbul and on to Groningen with undependable Internet quality. So I suggested that we communicate by email instead of Skype.

adding yearsI recalled Langer’s classic study being portrayed as demonstrating that nursing home residents provided with a plant to take care of  lived longer than if they had a plant tended by the staff. I had not reread the study recently, but was sure that that was an oversimplification of what was done in the study and its findings. Yet, I knew this was still the way the study was widely cited in the media and undergraduate psychology textbooks. I did not recall much doubt being expressed.

From my current perspective, we should always be  skeptical about small psychosocial studies claiming effects on mortality. Especially when the studies were not actually planned with mortality as the primary outcome, or with appropriate participant selection, uniformity of baseline characteristics between groups, and experimental controls. These are needed to guarantee that the only differences between experimental and control conditions are attributable to by group assignment.

There is a long history of caims about mortality found in small studies not holding up under scrutiny. The consistency of this happening should provide the prior probabilities with which we approach the next such claim. Very often,  apparent positive results can be explained by data have been manipulated to support such a conclusion  Even when we cannot see precisely how the data were manipulated, apparent effects cannot be replicated.

Surely there may yet be a replicable finding lurking somewhere out there.

But I operate with  the principle “Beware of apparent evidence of improved survival from small [underpowered] psychosocial studies  not designed to look for that particular effect.” I begin examining a new claim with a healthy dose of skepticism. And we should always ask: by what plausible, established or testable biological mechanism would such an effect be expected?

Death is a biomedical outcome. I continue to be amazed at how people, even professionals, who would be dismissive of claims for medical interventions extending life be made post hoc from tiny studies nonetheless going gaga when the intervention is psychosocial. I’ve come to appreciate that finding a link between manipulating the mind and extending life is a holy grail that keeps getting pursued, despite a history of nothing being found.holy-grail

I’ve debunked claims like that of Drs. David Spiegel and Fawzy Fawzy [sic] about psychological interventions extending the lives of cancer patients. These claims didn’t hold up under careful scrutiny. And findings claim for these original studies cannot be replicated in larger, better designed studies.

Why should we care? Isn’t keeping hope alive a good thing? Claims about the mind triumphing over body are potentially harmful not only because they mislead cancer patients about their abilities to control their health outcomes. More importantly, cancer patients are left with the mistaken belief that they succumbed to cancer, they and their loved ones can blame them for not exerting appropriate mind control.

In my email to the journalist, I expressed skepticism and was eventually quoted:

The study that arguably made Langer’s name — the plant study with nursing-home patients — wouldn’t have “much credibility today, nor would it meet the tightened standards of rigor,” says James Coyne, professor emeritus of psychology at the University of Pennsylvania medical school and a widely published bird dog of pseudoscience. (Though, as Coyne also acknowledges, “that is true of much of the work of the ’70s, including my own concerning depressed persons depressing others.”) Langer’s long-term contributions, Coyne says, “will be seen in terms of the thinking and experimenting they encouraged.”

mind body mindsetThe journalist had told me his article would appear in the New York Times at the end of October. At first, I didn’t bother to check, but then I saw the video of the CBS morning News extended interview with Langer. It was shocking.

I learned of her claims from an unpublished study that she could lower blood glucose levels of women with diabetes by manipulating their sense of time. And she had lowered women’s blood pressure by giving them a hair cut and coloring. And now she had wild plans to attempt to shrink the tumors of women with metastatic breast cancer.

The New York Times article had a title that warned of hype and hokum ahead –

What if Age Is Nothing but a Mind-Set?

The Times article described plaintiff’s intervention:

Langer gave houseplants to two groups of nursing-home residents. She told one group that they were responsible for keeping the plant alive and that they could also make choices about their schedules during the day. She told the other group that the staff would care for the plants, and they were not given any choice in their schedules. Eighteen months later, twice as many subjects in the plant-caring, decision-making group were still alive than in the control group.

To Langer, this was evidence that the biomedical model of the day — that the mind and the body are on separate tracks — was wrongheaded.

The  study was conducted in the 70s, in the days of showman research like that of Philstanford prisom Zimbardo (whom Langer had as a professor) and Stanley Milgram. Many of the social psychological studies of that period were more carefully attempts at dramatic demonstration of the investigators’ ideas, than rigorously controlled experiments. And they don’t hold up to scrutiny. You can find thorough debunking of the Zimbardo Stanford prison experiment – which had only 24 student guard and prisoners – as well as Milgram’s obedience study, but the perceptions of such work seem immune to being unsettled.

In re-examining Langer’s 70′s nursing home study, we shouldn’t chastise her for not anticipating later developments like CONSORT  or doubts being expressed about strong post-hoc claims made from underpowered studies. But we should not now be  uncritically accepting such flawed, older studies as evidence, particularly when it comes to physical health effects and mortality. So, I’m going to be looking at the validity of currently citing Ellen Langer’s work, not chastising her for what she didn’t do in the 70s.

You can find a PDF here of

Rodin, J., & Langer, E. J. (1977). Long-term effects of a control-relevant intervention with the institutionalized aged. Journal of Personality and Social Psychology, 35(12), 897.

The study is described as a follow-up of an earlier study, although the relationship between the two articles is more complex and of interest in itself. You can find a PDF here of

Rodin, J., & Langer, E. (1976). The effect of choice and enhanced personal responsibility for the aged: A field experiment in an institutional setting. Journal of Personality and Social Psychology, 34(2), 191-198.

The intervention

The intervention was more complex than simply giving nursing home residents plans to tend.

A later independent attempt at replication describes Langer’s original intervention in succinct and accurate terms [but you can click here to get the full details in Langer’s article]:

In their study, an intervention designed to encourage aged nursing home residents to feel more control and responsibility for day-to-day events was used. One group of residents was exposed to a talk delivered by the hospital administrator emphasizing their responsibility for themselves. A second group heard a communication that stressed the staff’s responsibility for them as patients. These communications were bolstered by offering to subjects in the experimental group plants that they could tend, whereas residents in the comparison group were given plants that were watered by the staff.

Follow the numbers!: Scrutinizing how Langer got from the first study to the second

The original 1976 study reported

There were 8 males and 39 females in the responsibility-induced condition (all fourth-floor residents) and 9 males and 35 females in the comparison group (all second-floor residents). Residents who were either completely bedridden or judged by the nursing home staff to be completely noncommunicative (11 on the experimental floor and 9 on the comparison floor) were omitted from the sample. Also omitted was one woman on each floor, one 40 years old and the other 26 years old, due to their age. Thus, 91 ambulatory adults, ranging in age from 65 to 90, served as subjects.

However, statistical analyses reported in table 1 of the article were based on 24 residents being in the responsibility induced condition and 28 in the comparison condition. A footnote explained

All of the statistics for the self-report data and the interviewers’ ratings are based on 45 subjects (25 in the responsibility-induced group and 20 in the comparison group), since these were the only subjects available at the time of the interview.

The 1977 follow-up study reported

Twenty-six of the 52 were still in the nursing home and were retested. Twelve had died, and 14 had been transferred to other facilities or had been discharged The differences between treatment conditions in mortality are considered in a subsequent section The groups did not differ in transfer or discharge rate Only 9 other persons from the original sample of 91 were available for retesting Since they had incomplete nurses’ ratings in the first study, they are only included in follow-up analyses not involving change scores in nurses’ evaluations Almost all of the participants now lived in different rooms, since the facility had completed a more modern addition 13 months after the experimental treatment.

The 1977 follow-up study supplemented these data with a new control group of residents who had not participated in the original study.

We also evaluated a small control group of patients who had not participated in the first study due to a variety of scheduling problems Five had previously lived on the same floor as subjects in the responsibility-induced condition, and 4 lived on the same floor as the comparison group All were now living in the new wing. The average length of time in the nursing home was 3.9 years, which was not reliably different for the three groups.

Please Click to Enlarge

Please Click to Enlarge

Mortality data

The 1977 follow-up study reported 18 month follow-up data, with 7 deaths for the 47 (15%) of the residents in the responsibility-induced intervention group and 13 deaths (30%) in the composite comparison group. These data were subject to an arcsine transformation frequencies and described as statistically significant (z = 3.14, p< .01).

The Erratum

An erratum statement corrected the z-score to z = 1.74, p<.10.

The outcome is therefore only marginally significant, and a more cautious interpretation of the mortality findings than originally given is necessary.

The APA electronic bibliographic resource, PsyINFO does not indicate that the existence of the erratum when the study is accessed, nor is there an indication at the journal. According to Google scholar, Langer’s nursing home study has been cited 848 times, but the erratum cited only 6 times. Surveying accounts of this study in the scientific literature and social media, I see no indication that there were no significant differences. Sure, it was a small study, but we can’t assume that accumulation of more participants would preserve the 2:1 difference between groups. Highly unlikely.

Attempted Replication

Richard Shultz and Barbara Hartman Hanusa followed up on Shultz’s earlier study investigating effects of manipulations of control and predictability on the physical and psychological well-being of nursing home residents. Like Langer’s study, the examination of mortality was post-hoc.

The original Shultz study was a randomized field experiment involving four groups:

(1) A control-enhanced condition in which the nursing home residents could control the frequency and length of visits from college student volunteers.

(2) A yoked group of residents who got the same frequency and length of visits, but without having control.

(3) A predict condition in which the residents were told when the volunteers were would visit, but could not control these visits were not informed how long they would last.

(4) A no-visit control condition.

Mortality data were examined in the follow-up article:

Two persons in the predict group and one person in the control-enhanced group died prior to the 24-month follow-up. A fourth person, also in the control-enhanced group, died between the 30- and 42-month follow-up.4 Fisher’s exact probability test was used to analyze these data (Siegel, 1956). Combining the no-treatment with the random group and the predict with the control-enhanced group yields a marginally significant Fisher’s exact probability of .053.

Like Langer’s study, this one involved post-hoc construction of lumped and split off groups. The marginally significant results would be radically changed by addition or removal of a single participant.

The collective memory of Langer’s study

The Langer study continues to be hailed as a breakthrough study in both the thinking of Langer and the larger field in terms of the powers of predictability and control in mind-body  interventions seeking effects on mortality.

Aside from not achieving significant effects, the study is deficient by contemporary standard in numerous ways. The mortality effect is not arise in a randomized trial, nor even the original sample, but involves a post-hoc construction of a comparison group. There’s a high risk of bias in terms of a lack of

  • Allotment concealment.
  • Blinding of investigators and probably nurse raters.
  • Equalization of baseline characteristics in the two groups.
  • Rigorous experimental control of what happened between randomization and final follow-up.

There were noteworthy changes in conditions in the nursing home in the interim.

The complex intervention is not reducible to simply giving nursing home residents responsibilities for plants.

The study nonetheless makes claims about a biomedical outcome, mortality. Even in 1978, the study would have been judged deficient if the intervention had been biomedical, rather than psychosocial. It certainly would not have gotten as much attention either then or now.

Ellen Langer’s incessant self-promotion has had no small part in preserving and extending the claims of mortality effect.  Note that the study is widely referred to as the Langer study rather than the Rodin and Langer study.

Langer does not provide any plausible mechanism by which effects on mortality could have occurred. Overall, she asserts that the intervention manipulated responsibility and control, but even if that is the principal psychological effect, is on clear how this translates into better all cause mortality in a population already afflicted by diverse life-threatening conditions.

We have to be careful about the tooth fairy science involved in trying to test mechanismstooth fairy for effects we don’t even know exist. But whatever occurred in this study is unlikely to be tied to the residents having to take care of plants. Yet that is one of features that is so enticing. All of us have anecdotes about older people being kept alive by having to care for their dog or cat. But even if there is some validity to these observations, it’s unlikely the beneficial effects of having a pet are tied to the power of attitude, rather than the modification of health-related behaviors.The original 1976 study ended with

The practical implications of this experimental demonstration are straightforward. Mechanisms can and should be established for changing situational factors that reduce real or perceived responsibility in the elderly. Furthermore, this study adds to the body of literature (Bengston, 1973; Butler, 1967; Leaf, 1973; Lieberman, 1965) suggesting that senility and diminished alertness are not an almost inevitable result of aging. In fact, it suggests that some of the negative consequences of aging may be retarded, reversed, or possibly prevented by returning to the aged the right to make decisions and a feeling of competence.

Not so straightforward, and these noble sentiments are not empirically advanced by the actual results of these studies, particularly the follow-up study. Yet I have no confidence that debunking will have any effect on how it is been mythologized. Some people have a need for such examples of mind triumphing over bodily limitations, even if  the examples are not true.

Category: alternative medicine, cancer, PLOS One commentary | 6 Comments