How to critique claims of a “blood test for depression”

Special thanks to Ghassan El-baalbaki and John Stewart for their timely assistance. Much appreciated.

“I hope it is going to result in licensing, investing, or any other way that moves it forward…If it only exists as a paper in my drawer, what good does it do?” – Eva Redei, PhD, first author.

video screenshotMedia coverage of an article in Translational Psychiatry uniformly passed on the authors’ extravagant claims in a press release from Northwestern University that declared that a simple blood test for depression had been found. That is, until I posted a critique of these claims at my secondary blog. As seen on Twitter, the tide of opinion suddenly shifted and considerable skepticism was expressed.

I am now going to be presenting a thorough critique of the article itself. More importantly,translational psychiatry I will be pointing to how, with some existing knowledge and basic tools, many of you can learn to critically examine the credibility of such claims that will inevitably arise in the future. Biomarkers for depression are a hot topic, and John Ioannidis has suggested that means a lot of exaggerated claims about flawed studies are more likely to be the result than real progress.

The article can be downloaded here and the Northwestern University press release here. When I last blogged about this article, I had not seen the 1:58 minute video that is embedded in the press release. I encourage you to view it before my critique and then view it again if you believe that it has any remaining credibility. I do not know where the dividing line is between unsubstantiated claims about scientific research and sheer quackery, but this video tests the boundaries, when evaluated in light of the evidence actually presented in the article.

I am sure that many journalists, medical and mental health professionals, laypersons were intimidated by the mention of “blood transcriptomic biomarkers” in the title of this peer-reviewed article. Surely, the published article had survived evaluation by an editor and reviewers with better, relevant expertise. What is there for an unarmed person to argue about?

Start with the numbers and basic statistics

Skepticism about the study is encouraged by a look at the small numbers of patients involved in the study, which was limited to

  • 64 total participants, 32 depressed patients from a clinical trial and 32 controls.
  • 5 patients were lost from baseline  to follow up.
  • 5 more were lost  from 18 week blood draws, leaving
  • 22 remaining patients –
  • 9 classified as in remission, 13 not in remission.

The authors were interested in differences in 20 blood transcriptomic biomarkers in 2 comparisons: the 32 depressed patients versus 32 controls and the 9 patients who remitted at the end of the trial versus 13 who did not. The authors committed themselves to looking for a clinically significant difference or effect size, which, they tell readers, is defined as .45. We can use a program readily available on the web for a power analysis, which indicates the likelihood of obtaining a statistically significant result (p <.05) for any one of these biomarkers, if differences existed between depressed patients and controls or between the patients who improved in the study versus those who did not. Before even putting these numbers into the calculator, we would expect the likelihood is low because of the size of the sample.

We find that there is only a power of 0.426 for finding one of these individual biomarkers significant, even if it really distinguishes between depressed patients and controls and a power of 0.167 for finding a significant difference in the comparison of the patients who improved versus those who did not.

Bottom line is that this is much too small a sample to address the questions in which the authors are interested – less than 50-50 for identifying a biomarker that actually distinguished between depressed patients and controls and less than 1 in 6 in finding a biomarker actually distinguishing those patients who improved versus those who did not. So, even if the authors really have stumbled upon a valid biomarker, they are unlikely to detect it in these samples.

But there are more problems. For instance, it takes a large difference between groups to achieve statistical significance with such small numbers, so any significant result will be quite large. Yet, with such small numbers, statistical significance is unstable: dropping or adding a few or even a single patient or control or reclassifying a patient as improved or not improved will change the results. And notice that there was some loss of patients to follow-up and to determining whether they improved or not. Selective loss to follow-up is a possible explanation of any differences between the patients considered improved and those who are not considered improved. Indeed, near the end of the discussion, the authors note that patients who were retained for a second blood draw differed in gene transcription from those who did not. This should have tempered claims of finding differences in improved versus unimproved patients, but it did not.

So what I am getting at is that this small sample is likely to produce strong results that will not be replicated in other samples. But it gets still worse –

Samples of 32 depressed patients and 32 controls chosen because they match on age, gender, and race – as they were selected in the current study – can still differ on lots of variables.  The depressed patients are probably more likely to be smokers and to be neurotic. So the authors made only be isolating blood transcriptomic biomarkers associated with innumerable such variables, not depression.

There can be single, unmeasured variables that are the source of any differences or some combination of multiple variables that do not make much difference by themselves, but do so when they are together present in a sample. So,  in such a small sample a few differences affecting a few people can matter greatly. And it does no good to simply do a statistical test between the two groups, because any such test is likely to be underpowered and miss influential differences that are not by themselves so extremely strong that they meet conditions for statistical significance in a small sample.

The authors might be tempted to apply some statistical controls – they actually did in a comparison of the nine versus 13 patients – but that would only compound the problem. Use of statistical controls requires much larger samples, and would likely produce spurious – erroneous – results in such a small sample. Bottom line is that the authors cannot rule out lots of alternative explanations for any differences that they find.

The authors nonetheless claim that 9 of the 20 biomarkers they examined distinguish depressed patients and 3 of these distinguish patients who improve. This is statistically improbable and unlikely to be replicated in subsequent studies.

And then there is the sampling issue. We are going to come back to that later in the blog, but just consider how random or systematic differences can arise between this sample of 32 patients versus 32 controls and what might be obtained with another sampling of the same or a different population. The problem is even more serious when we get down to the 9 versus 13 comparison of patients who completed the trial. A different intervention or a different sample or better follow-up could produce very different results.

So, just looking at the number of available patients and controls, we are not expecting much good science to come out of this study that is pursuing significance levels to define results. I think that many persons familiar with these issues would simply dismissed this paper out of hand after looking at these small numbers.

The authors were aware of the problems in examining 20 biomarkers in such small comparisons. They announced that they would commit themselves to adjusting significance levels for multiple comparisons. With such low ratios of participants in the comparison groups to variables examined, this remains a dubious procedure. Nonetheless, the authors commit themselves to it. However, when it eliminated any differences between the improved and unimproved patients, they simply ignored having done this procedure and went on to discuss results as significant. If you return to the press release and the video, you can see no indication that the authors had applied a procedure that eliminated their ability to claim results as significant. By their own standards, they are crowing about being able to distinguish ahead of time patients who will improve versus those who will not when they did not actually find any biomarkers that did so.

What does the existing literature tell us we should expect?

Our skepticism aroused, we might next want to go to Google Scholar and search for topicspull down menu such as genetics depression, biomarkers depression, blood test depression, etc. [Hint: when you put a set of terms into the search box and click, then pull down the menu on the far right to get an advanced search.]

I could say this takes 25 minutes because that is how much time I spent, but that would be misleading. I recall a jazz composer who claim to write a song in 25 minutes. When the interviewer expressed skepticism, the composer said “Yeah, 25 minutes and 25 years of experience.” I had the advantage of knowing what I was looking for.

The low heritability of liability for MDD implies an important role for environmental risk factors. Although genotype X environment interaction cannot explain the so-called ‘missing heritability’,52 it can contribute to small effect sizes. Although genotype X environment studies are conceptually attractive, the lessons learned from the most studied genotype X environment hypothesis for MDD (5HTTLPR and stressful life event) are sobering.

And

Whichever way we look at it, and whether risk variants are common or rare, it seems that the challenge for MDD will be much harder than for the less prevalent more heritable psychiatric disorders. Larger samples are required whether we attempt to identify associated variants with small effect across average backgrounds or attempt to enhance detectable effects sizes by selection of homogeneity of genetic or environmental background. In the long-term, a greater understanding of the etiology of MDD will require large prospective, longitudinal, uniformly and broadly phenotyped and genotyped cohorts that allow the joint dissection of the genetic and environmental factors underlying MDD.

[Update suggested on Twitter by Nese Direk, MD] A subsequent even bigger search for the elusive depression gene reported

We analyzed more than 1.2 million autosomal and X chromosome single-nucleotide polymorphisms (SNPs) in 18 759 independent and unrelated subjects of recent European ancestry (9240 MDD cases and 9519 controls). In the MDD replication phase, we evaluated 554 SNPs in independent samples (6783 MDD cases and 50 695 controls)…Although this is the largest genome-wide analysis of MDD yet conducted, its high prevalence means that the sample is still underpowered to detect genetic effects typical for complex traits. Therefore, we were unable to identify robust and replicable findings. We discuss what this means for genetic research for MDD.

So, there is not much encouragement for the present tiny study.

baseline gene expression may contain too much individual variation to identify biomarkers with a given disease, as was suggested by the studies’ authors.

Furthermore it noted that other recent studies had identified markers that either performed poorly in replication studies or were simply not replicated.

Again, not much encouragement for the tiny present study.

[According to Wiktionary, Omics refers to  related measurements or data from such interrelated fields as genomics, proteomics. transcriptomic or other fields.]

The report came about because of numerous concerns expressed by statisticians and bioinformatics scientists concerning the marketing of gene expression-based tests by Duke University. The complaints concerned the lack of an orderly process for validating such tests and the likelihood that these test would not perform as advertised. In response, the IOM convened an expert panel, which noted that many of the studies that became the basis for promoting commercial tests were small, methodological flawed, and relied on statistics that were inappropriate for the size of the samples and the particular research questions.

The committee came up with some strong recommendations for discovering, validating, and evaluating such tests in clinical practice. By these evidence-based standards, the efforts of the authors of the Translational Psychiatry are woefully inadequate and irresponsible in jumping from their modest size study to the claims they are making to the media and possible financial backers, particularly from such a preliminary small study without further replication in an independent sample.

Given that the editor and reviewers of Translational Psychiatry nonetheless accepted this paper for publication, they should be required to read the IOM report. And all of the journalists who passed on ridiculous claims about this article should also read the IOM book.

If we google the same search terms, we come up with lots of press coverage of what work previously claimed as breakthroughs. Almost none of them pan out in replication, despite the initial fanfare. Failures to replicate are much less newsworthy than false discoveries, but once in a while a statement of resignation makes it into the media. For instance,

Depression gene search disappoints

Click to expand

Click to expand

Looking for love biomarkers in all the wrong places

The existing literature suggests that the investigators have a difficult task looking for what is probably a weak signal with a lot of false positives in the context of a lot of noise. Their task would be simpler if they had a well-defined, relatively homogeneous sample of depressed patients. That is so these patients would be relatively consistent in whatever signal they each gave.

With those criteria, the investigators chose was probably the worst possible sample. They obtained their small sample of 32 depressed patients from a clinical trial comparing face-to-face to Internet cognitive behavioral therapy in a sample recruited from primary medical care.

Patients identified as depressed in primary care are a very mixed group. Keep in mind that the diagnostic criteria require that five of nine symptoms be present for at least two weeks. Many depressed patients in primary care have only five or six symptoms, which are mild and ambiguous. For instance, most women experience sleep disturbance weeks after given birth to an infant. But probing them readily reveals that their sleep is being disturbed by the infant. Similarly, one cardinal symptom of depression is the loss of the ability to experience pleasure, but that is confusing item for primary care patients who do not understand that the loss of the ability is supposed to be due to not being able to experience pleasure, rather than not been able to do things that are previously given them pleasure.

And two weeks is not a long time. It is conceivable that symptoms can be maintained that long in a hostile, unsupportive environment but immediately dissipate when the patient is removed from that environment.

Primary care physicians, if they even adhere to diagnostic criteria, are stuck with the challenge of making a diagnosis based on patients having the minimal number of symptoms, with the required  symptoms often being very mild and ambiguous in themselves.

So, depression in primary care is inherently noisy in terms of its inability to give a clear signal of a single biomarker or a few. It is likely that if a biomarker ever became available, many patients considered depressed now, would not have the biomarker. And what would we make of patients who had the biomarker but did not report symptoms of depression. Would we overrule them and insist that they were really depressed? Or what about patients who exhibited classic symptoms of depression, but did not have the biomarker. When we tell them they are merely miserable and not depressed?

The bottom line is that depression in primary care can be difficult to diagnose and to do so requires a careful interview or maybe the passage of time. In Europe, many guidelines discourage aggressive treatment of mild to moderate depression, particularly with medication. Rather, the suggestion is to wait a few weeks with vigilant monitoring of symptoms and  encouraging the patient to try less intensive interventions, like increased social involvement or behavioral activation. Only with the failure of those interventions to make a difference and the failure of symptoms to resolve the passive time, should a diagnosis and initiation of treatment be considered.

Most researchers agree that rather than looking to primary care, we should look to more severe depression in tertiary care settings, like inpatient or outpatient psychiatry. Then maybe go back and see the extent to which these biomarkers are found in a primary care population.

And then there is the problem by which the investigators defined depression. They did not make a diagnosis with a gold standard, semi structured interview, like the Structured Clinical Interview for DSM Disorders (SCID) administered by trained clinicians. Instead, they relied on a rigid simple interview, the Mini International Neuropsychiatry Interview, more like a questionnaire, that was administered by bachelor-level research assistants. This would hardly pass muster with the Food and Drug Administration (FDA). The investigators had available scores on the interview-administered Hamilton Depression Scale (HAM-D), to measure improvement, but instead relied on the self-report Personal Health Questionnaire (PHQ-9). The reason why they chose this instrument is not clear, but it would again not pass muster with the FDA.

Oh, and finally, the investigators talk about a possible biomarker predicting improvement in psychotherapy. But most of the patients in this study were also receiving antidepressant medication. This means we do not know if the improvement was due to the psychotherapy or the medication, but the general hope for a biomarker is that it can distinguish which patients will respond to one versus the other treatment. The bottom line is that this sample is hopelessly confounded when it comes to predicting response to the psychotherapy.

Why get upset about this study?

I could go on about other difficulties in the study, but I think you can get the picture that this is not a credible study and one that can serve as the basis in search for a blood base, biomarker for depression. It simply absurd to present it as such. But why get upset?

  1. Publication of such low quality research and high profile attempts to pass it off as strong evidence of damage the credibility of all evidence-based efforts to establish the efficacy of diagnostic tools and treatments. This study adds to the sense that much of what we read in the scientific journals and is echoed in the media is simply exaggerated or outright false.
  2. Efforts to promote this article are particularly pernicious in suggesting that primary care physicians can make diagnoses of depression without careful interviewing of patients. The physicians do not need to talk to the patients, they can simply draw blood or give out questionnaires.
  3. Implicit in the promotion of their results has evidence for a blood test of depression is the assumption that depression is a biological phenomenon, strongly influenced by genetic expression, not the environment. Aside from being patently wrong and inconsistent with available evidence, it leads to a reliance on biomedical treatments.
  4. Wide dissemination of the article and press release’s claims serve to reinforce laypersons and clinicians’ belief in the validity of commercially available blood tests of dubious value. These tests can cost as much as $475 per administration and there is no credible evidence, by IOM standards, that they perform superior to simply talking to patients.

At the present time, there is no strong evidence that antidepressants are on average superior in their effects on typical primary care patients, relative to, say, interpersonal psychotherapy (IPT). IPT assumes that regardless of how depression comes about, patient improvement can come about by understanding and renegotiating significant interpersonal relationships. All of the trash talk of these authors contradicts this evidence-based assumption. Namely, they are suggesting that we may soon be approaching an era where even the mild and moderate depression of primary care can be diagnosed and treated without talking to the patient. I say bollocks and shame on the authors who should know better.

Category: blood test, Conflict of interest, depression, genomics, omics, primary care | Tagged , , | 19 Comments

The Top Eleven Ways to Tell that a Journal is Fake

I am delighted to offer Mind the Brain readers a guest blog written by my colleague, Eve Carlson, Ph.D.  Eve Carlson is a clinical psychologist and researcher with the National Center for PTSD and the U.S. Department of Veterans Affairs, VA Palo Alto Health Care System.  Her research focuses on assessment of trauma exposure and responses, and she has developed measures of PTSD, dissociation, trauma exposure, self-destructive behavior, affective lability, and risk for posttraumatic psychological disorder.  Her research has been funded by National Institute for Mental Health (U.S.) and the Dept. of Veterans Affairs (U.S.) and recognized by awards from the International Society for Traumatic Stress Studies and the International Society for the Study of Trauma and Dissociation.  Her publications include books on trauma assessment and trauma research methodology and numerous theoretical and research articles.  She has served as President and a member of the Board of Directors for the International Society for Traumatic Stress Studies and on the editorial boards of several journals.

 

The Top  Eleven Ways to Tell that a Journal is Fake

Eve Carlson, Ph.D.

Past President, International Society for Traumatic Stress Studies

If you have ever published a scholarly paper, your email inbox is probably peppered with invitations to submit papers to new journals with plausible-sounding names.  Many people dismiss these emails as spam, but with all one hears about the impending death of paper journals, who knows what is next in the wild, wild West of open source publishing?  And if you venture to click on a link to the journal, you may well see a web page boasting about a journal editor who is a prominent name in your field, an editorial board that includes several luminaries, instructions for authors, and legitimate-looking articles.  With the “publish or perish!” pressure still going strong, what’s an academic to do?

I recently stumbled into an “investigation” of a new, online, open source journal in the course of service as a leader of a professional society.  When I was president of an international professional society, a new journal began soliciting submissions that had a name that was very similar to our Society’s journal -“Journal of XXX”.  The Society feared that the new journal, called “Journal of XXX Disorders and Treatment”, would be mistaken for an offshoot of the original.  I saw the names of colleagues I knew on the editorial board and skimmed some of the opinion piece articles posted online and assumed it was a new experiment in open source publishing. But when I contacted the colleagues and began asking questions, it quickly became apparent that this journal had no editor, editorial board members were acquired via spam emails to authors of published articles, the journal appeared to follow no standard publishing practices, and most editorial board members had observed irregularities that made them suspicious that the journal was not legitimate.  Once informed of the problems observed and put in communication with one another, 16 of the 19 editorial members resigned en masse.

 

Based on actual experiences looking into three questionable open source journals, you can tell a journal is fake when…

 

1)  Searching in the box marked “Search this journal” on the journal web page for the name of an author of an article in a recent issue of the journal does not return any hits.

 

2)  Clicking on a link like this medline on the journal web site leads to the spoof site www.Medline.com.

 

3)  No specific person is identified as the editor of the journal or the person who appears to be identified as the journal’s Editor on the web site says he is not the editor.

 

4)  Google Maps searches for the address of journal shows its headquarters is in a suburban bungalow.

googlemaps

 

5)  You cannot find articles from a bio-medical journal when you search PubMed.  [You can check by searching for the journal title here]

 

6)  The journal’s mission on its home page is described in vague, generic terms such as “To publish the most exciting research with respect to the subjects of XXXXXX.”

 

7)  When you call the local phone number for the journal office listed on the web page, any of these happen:  1. No one answers. 2. Someone answers “hello?” on what sounds like a cell phone and hangs up as soon as they hear you speaking.  3. The call is forwarded to the 800 phone bank for the publisher, and the person on the other end cannot tell you the name of the editor of the journal.

 

8)  PubMed Central refuses to accept content from a publisher’s bio-medical journals and DHHS sends a “cease and desist” letter to the publisher.

 

9)  The journal publisher’s posts online a legal notice warning a blogger who writes about the publisher that he is on a “perilous journey” and is exposing himself to “serious legal implications including criminal cases lunched (sic) again you in INDIA and USA” and directs him to pay the publisher $1 billion in damages.  Check out the legal notice here.

 

10)  The journal issues and posts online certificates with hearts around the border that certifies you as “the prestigious editorial board member of [name of journal here].”

certificate

 

11)  The journal posts “interviews” with members of its editorial board that appear to be electronic questionnaires with comical responses to interviewer questions such as:

interview1

interview3

CORRECTION: The site www.medline.com is real, not a spoof site.

 

 

Category: Commentary, mental health care, Psychiatry, research | Tagged , , | 6 Comments

Psychosocial care focuses too much on young, attractive patients successfully coping with cancer.

 The YAVIS bias reconsidered: Young, attractive, verbal, intelligent, successfully coping with cancer preferred.

For related slide presentations, see

Rethinking, rebuilding psychosocial care for cancer patients.

Why screening cancer patients for distress will increase disparities in psychosocial services.

purchase of friendshipWilliam Schofield’s provocative book, Psychotherapy: The Purchase of Friendship was written while I was in high school, but it was still being debated in bars and smoke-filled rooms when I was in graduate school. It continued to be discussed in my seminars when I was an Assistant Professor at University of California Berkeley.

Fifty years after its publication, Schofield’s book is a bit dated and probably not discussed much in the training of mental health professionals just entering the field. But the book has unrecognized relevance to understanding inequalities or social disparities in the psychosocial care for cancer patients. And the ideas of the book might be used to generate some caution about what to expect in efforts to reduce the considerable gap between the proportion of cancer patients who report heightened psychological distress and the minority who actually get psychosocial services.

Basically,

  •  Young, early-stage breast cancer patients are overrepresented among cancer patients who use psychosocial services. They are very receptive to psychological counseling, and their distress tends to reduce overtime, regardless of whether they received counseling.
  • Most cancer patients who are psychologically distressed do not receive psychological services. That is considered a problem by organizations promoting routine screening of cancer patients for distress.  Screening is considered the solution because it is presumed to increase detection of distress and referrals.
  • But it would be a big mistake to assume that the cancer patients who are distressed but not receiving services are just like the young early-stage breast cancer patients who predominate among service users. Most probably these patients not receiving services not be as interested in counseling. Many have complicated medical comorbidities and social problems that require something else.

Closing the gap between proportion of cancer patients who are distressed and those who receive services is probably not going to involve a lot of mental health counselors or therapists.

We need to better respect and reward the unsung heroes of psychosocial care for cancer patients.

These are the points that I am going to address, and if that is all that interests you, you can stop here. But I encourage you to read on and learn about the YAVIS bias; similar biases in the portrayal of cancer patients in the media; and the difficulties in dealing with the meetable unmet needs of cancer patients who do not fit the stereotype of young, early-stage breast cancer patients. And just who are the unsung heroes of psychosocial care.

Ideal psychotherapy patients: Young, attractive, verbal, intelligent, and successful (YAVIS)

YAVIS is an acronym for young, attractive, verbal, intelligent, and successful. Schofield’s yavisbook was more philosophical than empirical, but he did cite his study of over 300 mental health professionals in which he asked about “the kind of patient with whom you feel you are efficient and effective in your therapy” (p. 130). Most preferred married women, aged 20-40 with at least some college and a professional/managerial occupation. On this basis, Schofield railed

What is there in the general theory of psychodynamics or psychotherapy to suggest that the neurosis of a 50 year-old commercial fisherman with an eighth-grade education will be more resistant to psychological help than a symptomatically comparable 35 year-old, college-trained artist?

It seems likely that there are pressures toward a systematic selection of patients, pressures that are perhaps subtle and unconscious in part and that, in part, reflect theoretical biases common to all psychotherapists. These selective forces tend to restrict the efforts of the bulk of social workers, psychologists, and psychiatrists to clients who present the “YAVIS” syndrome — clients who are youthful, attractive, verbal, intelligent, and successful.

The Purchase of Friendship and the notion of a YAVIS bias were much more popular with grad students and junior faculty than with the senior faculty who correctly recognized an attack on their way of doing things. George von Hilsheimer got it right in noting that

In 1966 it was already a cliché that the patients who did best in psychotherapy were those who did not need it. The YAVIS criterion was an inside joke. Young, attractive, vital, intelligent, successful individuals benefit best from psychotherapy. In other words, the patients we work best with are the ones who need us least.*

Ouch! Sarcastic, yes, but von Hilsheimer was stumbling upon a finding in health services research so robust that it is considered a law, “the inverse care law.”

The availability of good medical care tends to vary inversely with the need for it in the population served.

Decades of efforts to address social inequalities in access to mental health treatment, as well as recent mental health parity and health insurance reform have increased access to psychotherapy. But inequalities still persist, associated with being older, being a member of a minority group, and having low income and low educational attainment.

In the US, anyone who tries to find mental health treatment in the community for a medical patient quickly discovers that many private practice clinical psychologists and psychiatrists reject low income patients who have only Medicaid or Medicare insurance. They prefer patients wealthy enough to pay out-of-pocket and many do not make exceptions. The only option for non-YAVIS patients are often public community mental health centers. Patients have who are referred there can be left feeling quite discouraged and disrespected.

And even in countries with more universally available health insurance and less inequality in income, like Scandinavia and the Netherlands, the inverse care law still holds.

A YAVIS bias in the psychosocial care of cancer patients.

Younger, early-stage breast cancer patients disproportionately receive psychosocial services. Because of early detection of their cancer and modest increase in the effectiveness of treatments, these women tend to have excellent prognoses and are increasingly unlikely to die from their condition. They tend to have to have heightened emotional distress because cancer represents more of a threat to age-related goals, like raising small children and remaining employed.

Fortunately, much of this distress resolves within a few months of diagnosis. Six months after their diagnosis, women with early-stage breast cancer overall have less distress than women of a comparable age without cancer selected from a primary care waiting room. The chart below comes from a study of over 500, almost entirely early-stage breast cancer patients. The study replicates other findings that from a third to a half of breast cancer patients initially have clinically significant levels of psychological distress. But it goes beyond the initial adjustment to breast cancer and traces the course of that distress, breaking it down into groups identified by cluster analysis of their trajectories of distress.

Click to enlarge

Click to enlarge

One implication of this chart is that even among the minority of breast cancer patients who are distressed after diagnosis, the trajectory of their distress is such that it will be difficult to show that that psychological intervention has any advantage over their simply remaining in routine care. Think of these trajectories as representing the control groups in randomized trials of psychological interventions for distress among cancer patients.

The YAVIS bias in media portrayal of cancer patients.

young breastMost cancers are detected after age 65. It is simply not true that the typical cancer patient is a young attractive woman with young children. But that is the typical portrayal of cancer in the media. A study of portrayal of breast cancer in popular magazines found

In 84% of vignettes (144 of 172), women were diagnosed before 50 years of age; in 47% (80 of 172), women were diagnosed before 40 years of age. On the basis of the age-specific incidence of breast cancer in the United States, the expected percentages would be 16% and 3.6%, respectively.

The vignettes included 25 women who were reported on primarily because they were celebrities or otherwise newsworthy. For example, several articles discussed a group of breast cancer survivors who climbed Mount Acongagua in South America in 1995.28–30 The ages of women in the newsworthy vignettes ranged from 21 years (a former Miss Oklahoma31) to 66 years (Virginia Kelley, mother of former U.S. president Bill Clinton32), and the mean age did not significantly differ from that in the remaining vignettes (42 and 41 years, respectively).

From Burke et al, 2001 PLEASE CLICK TO ENLARGE

From Burke et al, 2001
PLEASE CLICK TO ENLARGE

The table to the left contrasts portrayals of cancer in the media with actual risk of cancer data derived obtained from the National Cancer Institute.

Portraying breast cancer as threatening the childbearing and sexuality of young women has proven a very effective fundraising and political strategy. But it has also had adverse effects for the directing of research funds and specialized services to breast cancer patients versus other cancers or other diseases. Somewhat dated statistics that probably still hold:

The National Cancer Institute (NCI) devoted $572.4 million researching breast cancer in 2007. Other National Institutes of Health (NIH) funding for breast cancer boosted the total spent on the disease to $705 million. Plus, the Department of Defense operates its own breast cancer research outfit at a cost of another $138 million in fiscal 2008.

By way of comparison, in 2007 the NCI spent $226.9 million studying lung cancer, the leading cancer killer in the U.S., and $73.3 million studying pancreatic cancer, which kills nearly as many patients as breast cancer, usually within a year of diagnosis. Cardiovascular disease, the biggest killer of both men and women, received $381 million.

I wish I had breast cancerAs NBC news succinctly put it

In the world of cancer charities and government funding, breast cancer is queen.

The YAVIS dominate clinical trials of interventions designed to psychological distress among cancer patients.

Younger, early-stage breast cancer patients predominate in intervention trials of psychosocial intervention for distress.

When researchers recruit patients for randomized trials evaluating interventions for psychological distress, they do not usually specify that patients have to be distressed as an inclusion criteria. Only a small minority of studies actually require patients have some minimal level of distress.

As a result, studies typically end up with a sample of women with early-stage breast cancer patients who, as a group, have insufficient distress to register an improvement. From the trials I have looked at carefully, I estimate only about a third or so of the women enrolled are distressed.

Psychosocial studies of cancer are often biased to younger patients, with the modal patient being recruited to intervention trials being young women with early-stage breast cancer, high functioning and good prognosis and a level of psychological distress that presents a floor effect on identifying any efficacy.

This creates an odd situation: it is difficult to demonstrate in a controlled study that interventions have clinically significant effects on the distress levels of typical cancer patients. The typical study does not show an effect, but illusions can be created by

  • Concentrating on subgroup analyses analyses, although if the subgroup of patients being considered are distressed, the analyses will be underpowered.
  • Selecting outcome measures after results are known and focusing on the one with the best outcomes.
  • Other flexible rules of design and analysis.

A number of psychosocial intervention trials for cancer patients widely cited as positive actually had null findings, but were well spun to appear positive and be cited as such. I rail about this often, and will not get into it here, except to cite two prime examples [1, 2] which I have discussed elsewhere [3, 4].

Why are early breast cancer patients drawn to psychological counseling treatments if they are not distressed? For the same reason that they are disproportionately drawn to support groups. They are probably not there to reduce distress, but to get support and an opportunity to talk about their situation. Such opportunities are increasingly scarce in cancer care driven by the need for billable procedures and time-efficiency. If patients want to talk to someone on a regular basis, they have to go into counseling or to a support group, even if they not primarily seeking reduction of distress.

The distress of early breast cancer patients is quite genuine. Cancer poses real challenges to completion of life stage tasks that are very important to them. Their being diagnosed with cancer threatens their spouses and children. But they mostly soon learn these are manageable issues.

Breast cancer patients versus lung cancer patients

We have to be careful about making comparisons between patients with different kinds of cancers and implying there is something devious and unfair in patients with one type of cancer getting with services that patients with another type. But there are striking disparities between breast cancer and lung cancer.

  • Lung cancer causes more deaths than the next three most common cancers combined –colon, breast and pancreatic.
  • In the late 1980s, lung cancer surpassed breast cancer to become the leading cause of cancer deaths in women.
  • Lung cancer patients receive almost none of the attention that breast cancer patients get in terms of their psychosocial needs.
  • Lung cancer patients have among the highest levels of psychological distress, yet lung cancer patients are the lowest utilizers of support groups and similar services.

When is the last time you saw mobs of people running through a park seeking a cure for lung cancer?

As I have mentioned before, the prevailing notion in the psycho-oncology literature is that we need to screen distressed cancer patients not receiving psychological services, show them their scores on the distress thermometer, and convince them to accept a referral. Simple as that.  But we need to be careful about generalizing to lung cancer patients from the breast cancer patients who were more likely to receive services. One study found that

  • 57% of distressed lung patients who did not access mental health services did not perceive the need for help.
  • The most prevalent patient-reported barrier to mental health service use is wanting to manage emotional concerns independently (58%).
  • 75% of patients preferred talking to a primary care physician if they were to have an emotional concern.
  • 42% had received mental health services prior to their lung cancer diagnosis.
  • 33% reported that they had received mental health services since their lung cancer diagnosis.
  • Only a few patients without a history of mental health service use accessed these services following the lung cancer diagnosis.

So, most lung cancer patients do not think of their predicament as a mental health issue, unless they have a previous history of using the health services.

Older cancer patients

Older cancer patients actually experience less distress than younger patients.

But when older patients are distressed, their emotional state is more likely to reflectElderly-patient_Rfree_1 physical comorbidities and limitations in their functioning. Distress among older cancer patients is also more likely to represent psychiatric disorder for which they will have long-term risk of relapse and recurrence. And older patients pose more difficulties in obtaining the kind of monitoring of treatment and follow-up any depressive or anxiety disorder they may have.

We need to keep in mind that most diagnosable psychiatric conditions are chronic and recurring, with a first episode in adolescence or early adulthood. By the time older patients get diagnosed with cancer, any risk for depression or anxiety disorders has probably expressed itself and that have probably been some episodes of treatment.

But having a physical condition like cancer means that subsequent episodes of depression or anxiety are going to be more frequent and last longer. The issue is not initial detection, but of arranging continued care and follow-up in general medical care, where routine care is quite inadequate. So, why concentrate our efforts on detection and referral, if it only means that more patients will get in adequate care in the community?

Older patients experience their cancer in the context of other physical conditions and limits on the physical functioning. These problems may get noticed in repeated visits to a cancer center for treatment of their cancer, but are difficult to treat in such settings. And yet, being diagnosed with cancer and being treated in a specialized cancer center often cuts off older patients from their primary medical care and reprioritizes their goals as dealing with their cancer.

In health services research terms, a diagnosis of cancer often transfers patients out of their usual healthcare and into specialized cancer care, where the goals of medical care are reorganized.

instant replay

Click to enlarge

Addressing other physical conditions requires complex care coordination. The figure to the left represents the kind of complex care coordination that a 71-year-old patient actually needed. The lines represent telephone calls, emails, and patient visits. But if you want to see a more dramatic representation of this process, click on this link, where you will see an animation of how this communication evolved over 80 days.

Older patients tend to be poorer than younger patients, because of inadequacies in retirement savings and pensions. Many older patients grew up in an economic system that did not expect them to live so long. As a result of being poorer and old, they are likely to have complex needs for social services, but also to have more difficulties accessing these services.

Again, while these problems are likely to be noticed in repeated visits at a cancer center, this is not where these needs can most readily be met. We are back to the issue of complex care coordination, unless we are going to ignore these issues.

The medical and social needs of older patients not readily resolved in time-limited sessions of counseling. And these patients are probably less interested in discussing the emotional experience of having such problems than in getting them resolved.

But psychological counseling is what is being advocated by those who promote routine screening and referral. The assumption is that screening will identify needs that can be met with counseling. For instance the European Partnership Action Against Cancer Policy Statement on Multidisciplinary Cancer Care states

In addressing other care objectives, patients should always have ready access to counselling for psychosocial support; patient distress is particularly important and should be screened for from diagnosis onwards.

And from one of the leading advocates of screening:

Interventions usually assume one of four common forms: psychoeducation, cognitive-behavioural training (group or individual), group supportive therapy, and individual supportive therapy.

The unsung heroes of psychosocial care for cancer patients

Resolution of the needs of nonYAVIS cancer patients requires discussion, negotiation, and follow-up, often with multiple frustrations dealing with bureaucracies and medical care and social services outside the immediate cancer setting.

These tasks are left to social workers and nurses who cannot bill for them as counseling. Consequently, in cash conscious, profit-centered cancer care, the activity of talking to patients and those who provide to talk are devalued. Think of it:

A social worker or nurse who accurately records the three hours spent getting home visits and MealsonWheels for a housebound, inarticulate older widowed cancer patient

Versus

A similar amount of time by the same professional delivering mindfulness therapy to well insured patients were quite able to pay out of pocket anyway.

Who is more important? Who is most at risk when cutbacks in staffing and funding are considered?

*Quoted here.

Category: cancer, mental health care, palliative care, psychotherapy | Tagged , , | Leave a comment

Reanalysis: No health benefits found for pursuing meaning in life versus pleasure

NOTE: After I wrote this blog post, I received via PNAS the reply from Steve Cole and Barbara Fredrickson to our article.  I did not have time to thoroughly digest it, but will address it in a future blog post. My preliminary impression is that their reply is, ah…a piece of work. For a start, they attack our mechanical bitmapping of their data as an unvalidated statistical procedure. But calling it a statistical procedure is like Sarah Palin calling Africa a country. And they again assert the validity of  their scoring of a self-report questionnaire without documentation. As seen below, I had already offered to donate $100 to charity if they can produce the unpublished analyses that justified this idiosyncratic scoring. The offer stands. They claim that our factor analyses were in appropriate because the sample size was too small, but we used their data, which they claimed to have factor analyzed. Geesh. But more on their reply later.

Our new PNAS article questions the reliability of results and interpretations in a high profile previous PNAS article.

Fredrickson, Barbara L., Karen M. Grewen, Kimberly A. Coffey, Sara B. Algoe, Ann M. Firestine, Jesusa MG Arevalo, Jeffrey Ma, and Steven W. Cole. “A functional genomic perspective on human well-being.” Proceedings of the National Academy of Sciences 110, no. 33 (2013):   13684-13689.

 

From http://theoaklandjournal.com/oaklandnj/health-happiness-vs-meaning/

Oakland Journal http://tinyurl.com/lpbqqn6
Click to enlarge

Was the original article a matter of “science” made for press release? Our article poses issues concerning the gullibility of the scientific community and journalists regarding claims of breakthrough discoveries from small studies with provocative, but fuzzy theorizing and complicated methodologies and statistical analyses that apparently even the authors themselves do not understand.

  •  Multiple analyses of original data do not find separate factors indicating striving for pleasure versus purpose
  • Random number generators yield best predictors of gene expression from the original data

[Warning, numbers ahead. This blog post contains some excerpts from the results section that contain lots of numbers and require some sophistication to interpret. I encourage readers to at least skim these sections, to allow independent evaluation of some of things that I will say in the rest of the blog.]

A well-orchestrated media blitz for the PNAS article had triggered my skepticism. The Economist, CNN, The Atlantic Monthly and countless newspapers seemingly sang praise in unison for the significance of the article.

objecrtive approach to moralMaybe the research reported in PNAS was, as one the authors, Barbara Fredrickson claimed, a major breakthrough in behavioral genomics, a science-based solution to an age-old philosophical problem of how to lead one’s life.  Or, as she has later claimed in a July 2014 talk in Amsterdam, the PNAS article provided an objective basis for moral philosophy.

Maybe it showed

People who are happy but have little to no sense of meaning in their lives—proverbially, simply here for the party—have the same gene expression patterns as people who are responding to and enduring chronic adversity.

Skeptical? Maybe you are paying too much attention to your conscious mind. What does it know? According to author Steve Cole

What this study tells us is that doing good and feeling good have very different effects on the human genome, even though they generate similar levels of positive emotion… “Apparently, the human genome is much more sensitive to different ways of achieving happiness than are conscious minds.”

Or maybe this PNAS article was an exceptional example of the kind of nonsense, pure bunk, you can find in a prestigious journal.

Assembling a Team.

I blogged about the PNAS article. People whom I have yet to meet expressed concerns similar to mine. We began collaborating, overcoming considerable differences in personal style but taking advantage of complementary skills and background.

It all started with a very tentative email exchange with Nick Brown. He brought on his co-author from his American Psychologist article demolishing any credibility to a precise positivity ratio, Harris Friedman. Harris in turn brought on Doug McDonald to examine Fredrickson and Cole’s claims that factor analysis supported their clean distinction between two forms of well-being with opposite effects on health.

Manoj Samanta found us by way of my blog post and then a Google search that took him electric fishto Nick and Harris’ article with Alan Sokal. Manoj cited my post in his own blog. When Nick saw it, he contacted him. Manoj was working in genomics, attempting to map the common genomic basis for the evolution of electric organs in fish from around the world, but was a physicist in recovery. He was delighted to work with a couple of guys who had a co-authored a paper with his hero from grad school, Alan Sokal. Manoj interpreted Fredrickson and Cole’s seeming unnecessarily complicated approach to genomic analysis. Nick set off to deconstruct and reproduce Cole’s regression analyses predicting genomic expression.  He discovered that Cole’s procedure generated statistically significant (but meaningless) results from over two-thirds of the thousands of ways of splitting the psychometric data.  Even using random numbers produced huge numbers of junk results.

The final group was Nick, Doug, Manoj, Harris, and myself. Others came and went from our email exchanges, some accepting our acknowledgment in the paper, while others asked us explicitly not to acknowledge them.

The team gave an extraordinarily careful look at the article, noting its fuzzy theorizing and conceptual deficiencies, but we did much more than that. We obtained the original data and asked the authors of the original paper about their complex analytic methods. We then reanalyzed the data, following their specific advice. We tried alternative analyses and even re-did the same analyses with randomly generated data. Overall, our hastily assembled group performed and interpreted 1000s of analyses, more than many productive labs do in a year.

The embargo on our paper in PNAS is now off.

I can report our conclusion that

Not only is Fredrickson et al.’s article conceptually deficient, but more crucially statistical analyses are fatally flawed, to the point that their claimed results are in fact essentially meaningless.

A summary of our PNAS article is available here and the final draft is here.

Fuzzy thinking creates theoretical and general methodological  problems

Fractal_FunFredrickson et al. claimed that two types of strivings for well-being, eudaimonic and hedonic have distinct and opposite effects on physical health, by way of “molecular signaling pathways” or genomic expression, despite an unusually high correlation for two supposedly different variables. I had challenged the authors about the validity of their analyses in my earlier blog post and then in a letter to PNAS, but got blown off. Their reply dismissed my concerns, citing analyses that they have never shown, either in the original article or the reply.

In our article, we noted a subtlety in the distinction between eudamonia and hedonia.

Eudaimonic well-being, generally defined (including by Fredrickson et al.) in terms of tendencies to strive for meaning, appears to be trait-like, since such striving for meaning is typically an ongoing life strategy.

Hedonic well-being, in contrast is typically defined in terms of a person’s (recent) affective experiences, and is state-like; regardless of the level of meaning in one’s life, everyone experiences “good” and “bad” days.

The problem is

If well-being is a state, then a person’s level of well-being will change over time and perhaps at a very fast rate.  If we only measure well-being at one time point, as Fredrickson et al. did, then unless we obtain a genetic sample at the same time, the likelihood that the well-being score will actually accurately reflect level of genomic expression will be diminished if not eliminated.

In an interview with David Dobbs, Steven Cole seems to suggest an irreversibility to thecole big slide changes that eudaimonic and hedonic strivings produce:

“Your experiences today will influence the molecular composition of your body for the next two to three months,” he tells his audience, “or, perhaps, for the rest of your life. Plan your day accordingly.”

Hmm. Really? Evidence?

Eudaimonic and hedonic well-being constructs may have a long history in philosophy, but empirically separating them is an unsolved problem. And taken together, the two constructs by no means capture the complexity of well-being.

Is a scientifically adequate taxonomy of well-being on which to do research even possible? Maybe, but doubts are raised when one considers the overcrowded field of well-being concepts available in the literature—

General well-being, subjective well-being, psychological well-being, ontological well-being, spiritual well-being, religious well-being, existential well-being, chaironic well-being, emotional well-being, and physical well-being—along with the various constructs which treated as essentially synonymous with well-being, such as self-esteem, life-satisfaction, and, lest we forget, happiness.

No one seems to be paying attention to this confusing proliferation of similar constructs and how they are supposed to relate to each other. But in the realm of negative emotion, the problem is well known and variously referred to as the “big mush” or “crud factor”. Actually, there is a good deal of difficulty separating out positive well-being concepts from their obverse concepts, negative well-being.

Fredrickson and colleagues found that eudaimonia and especially hedonic well-being were strongly, but negatively related to depression. Their measures of depression qualified as a covariate or confound for their analyses, but somehow disappeared from further consideration. If it had been retained, it would have further reduced the analyses to gobbledygook. Technically speaking, the residual of hedonia-controlling-for (highly correlated)-eudaimonia-and-depression does not even have a family resemblance to hedonia and is probably nonsense.

Fredrickson et al. measured well-being with which they called the Short Flourishing Scale, AKA and better known in the literature as the Mental Health Continuum-Short Form (MHC-SF).

We looked and we were not able to identify any published evidence of a two factor solution in which distinct eudaimonic and hedonic well-being factors adequately characterized MHC-SF data.

The closest thing we could find was

Keyes et al. (10) referred to these groupings of hedonic and eudaimonic items as “clusters,” an ostensibly neutral term that seems to deliberately avoid the word “factor.”

However, his split of the MHC-SF items into hedonic and eudaimonic categories appears to have been made mainly to to allow arbitrary classifying of persons as “languishing” versus “flourishing.” Yup, positive psychology is now replacing the stigma of conventional psychology’s deficiency model of depressed versus not depressed with a strength model of languishing versus flourishing.

In contrast to the rest of the MHC-SF literature,  Fredrickson at el referred to a factor analysis of – implicitly in their original PNAS paper, and then explicitly in reply to my PNAS letter – yielding two distinct factors (“Hedonic” and “Eudaimonic”), corresponding to Keyes’ languishing versus flourishing diagnoses (i.e., items SF1–SF3 for Hedonic and SF4–SF14 for Eudaimonic).

The data from Fredrickson et al were mostly in the public domain. After getting further psychometric data from Fredrickson’s lab, we set off set off on a thorough reanalysis that should have revealed whatever basis for their claims there might be.

In exploratory factor analyses, which we ran using different extraction (e.g., principal axis, maximum likelihood) and rotation (orthogonal, oblique) methods, we found two factors with eigenvalues greater than 1 with all items producing a loading of .50 on at least one factor.

That’s lots of analyses, but results were consistent:

Examination of factor loading coefficients consistently showed that the first factor was comprised of elevated loadings from 11 items (SF1, SF2, SF3, SF4, SF5, SF9, SF10, SF11, SF12, SF13, and SF14), while the second factor housed high loadings from 3 items (SF6, SF7, and SF8).

Click to enlarge

Click to enlarge

If this is the factor structure Fredrickson and colleagues claim, eudaimonic well-being would have to be the last three items. But look at them in the figure on the left and particularly look at the qualification below. The items seem to reflect living in a particular kind of environment that is safe and supportive of people like the respondent. Actually, these results seem to lend support to my complaint that positive psychology is mainly for rich people: to flourish, one must live in a special environment. If you languish, it is your fault.

Click to enlarge

Click to enlarge

Okay, we did not find much support for the claims of Fredrickson and colleagues, but we gave them another chance with a confirmatory factor analysis (CFA). With this analysis, we would not be looking for the best solution, only learning if either one or two factor models are defensible.

For the one-factor model, goodness-of-fit statistics indicated grossly inadequate fit (χ2 = 227.64, df = 77, GFI = .73, CFI = .83, RMSEA = .154).  Although the equivalent statistics for the correlated two-factor model were slightly better, they still came out as poor (χ2 = 189.40, df = 76, GFI = .78, CFI = .87, RMSEA = .135).

Thus, even though our findings tended to support the view that well-being is best represented as at least a two dimensional construct, we did not confirm Fredrickson et al.’s claim (6) that the MHC-SF produces two factors conforming to hedonic and eudaimonic well-being.

Hey Houston, we’ve got a problem.

As Ryff and Singer (15) put it, “Lacking evidence of scale validity and reliability, subsequent work is pointless” (p. 276).

Maybe we should have thrown in the towel. But if Fredrickson and colleagues could

From Hilda Bastian

From Hilda Bastian

nonetheless proceed to multivariate analyses relating the self-report data to genomic expression, we decided that we would follow in the same path.

Relating self-report data to genomic expression: Random can be better

Fredrickson et al. analytic approach to genomic expression seemed unnecessarily complicated. They repeated regression analyses 53 times (which we came to call RR53) in which they regressed each of 53 genes of interest on eudaimonic and hedonic well-being and a full range of confounding/control variables.  Recall that they had only 80 participants. This approach leaves them lots of room for capitalizing on chance.

So, why not simply regress

the scores for hedonic and eudaimonic well-being on the average expression of the 53 genes of interest, after changing the sign of the values of those genes that were expected to be down-regulated. [?]

After all the authors had said

[T]he goal of this study is to test associations between eudaimonic and hedonic well-being and average levels of expression of specific sets of genes” (p. 1)

We started with our simpler approach.

We conducted a number of such regressions, using different methods of evaluating the “average level of expression” of the 53 CTRA genes of interest (e.g., taking the mean of their raw values, or the mean of their z-scores), but in all cases the model ANOVA was not statistically significant.

Undaunted, we next applied the RR53 regression procedure to see whether it could, in contrast to our simpler “naive” approach, yield such highly significant results with the factors we had derived.

You can read the more technical description of our procedures in our article and its supplementary materials, but our results were

The t-tests for the regression coefficients corresponding to the predictor variables of interest, namely hedonic and eudaimonic well-being, were almost all non-significant (p > .05 in 104 out of 106 cases; mean p = .567, SD = 0.251), and in the two remaining cases (gene FOSL1, for both “hedonic,” p = .047, and “eudaimonic,” p = .030), the overall model ANOVA was not statistically significant (p = .146).

We felt that drawing any substantive conclusions from these coefficients is inappropriate.

Nonetheless, we continued….

We…created two new variables, which we named PWB (corresponding to items SF1–SF5 and SF9–SF14) and EPSE (corresponding to items SF6–SF8).  When we applied Fredrickson et al.’s regression procedure using these variables as the two principal predictor variables of interest (replacing the Hedonic and Eudaimonic factor variables), we discovered that the “effects” of this factor pair were about twice as high as those for the Hedonic and Eudaimonic pair (PWB: up-regulation by 13.6%, p < .001; EPSE: down-regulation by 18.0%, p < .001; see Figures 3 and 4 in the Supporting Information).

Wow, if we accept statistical significance over all other considerations, we actually did better than Fredrickson et al.

Taken seriously, it suggests that the participants’ genes are not only expressing “molecular well-being” but even more vigorously, some other response that we presume Fredrickson et al. might call “molecular social evaluation.”

Or we might conclude that living in a particular kind of environment, is good for your genomic expression.

But we were skeptical about whether we could give substantive interpretations of any kind and so we went wild, using the RR53 procedure with every possible way of splitting up the self-report data. Yup, that is a lot of analyses.

Excluding duplicates due to symmetry, there are 8,191 possible such combinations.  Of these, we found that 5,670 (69.2%) gave statistically significant results using the method described on pp. 1–2 of Fredrickson et al.’s Supporting Information (7) (i.e., the t-tests of the fold differences corresponding to the two elements of the pair of pseudo-factors were both significant at the .05 level), with 3,680 of these combinations (44.9% of the total) having both components significant at the .001 level.

Furthermore, 5,566 combinations (68.0%) generated statistically significant pairs of fold difference values that were greater in magnitude than Fredrickson et al.’s (6, figure 2A) Hedonic and Eudaimonic factors.

While one possible explanation of these results is that differential gene expression is associated with almost any factor combination of the psychometric data, with the study participants’ genes giving simultaneous “molecular expression” to several thousand factors which psychologists have not yet identified, we suspected that there might be a more parsimonious explanation.

But we did not stop there. Bring on the random number generator.

As a further test of the validity of the RR53 procedure, we replaced Fredrickson et al.’s psychometric data (6) with random numbers (i.e., every item/respondent cell was replaced by a random integer in the range 0–5) and re-ran the R program.  We did this in two different ways.  First, we replaced the psychometric data with normally-distributed random numbers, such that the item-level means and standard deviations were close to the equivalent values for the original data.  With these pseudo-data, 3,620 combinations of pseudo-factors (44.2%) gave a pair of fold difference values having t-tests significantly different from zero at the .05 level; of these, 1,478 (18.0% of the total) were both statistically significant at the .001 level.  (We note that, assuming independence of up- and down-regulation of genes, the probability of the latter result occurring by chance with random psychometric data if the RR53 regression procedure does indeed identify differential gene expression as a function of psychometric factors, ought to be—literally—one in a million, i.e. 0.001², rather than somewhere between one in five and one in six.)  Second, we used uniformly-distributed random numbers (i.e., all “responses” were equally likely to appear for any given item and respondent).  With these “white noise” data, we found that 2,874 combinations of pseudo-factors (35.1%) gave a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 893 (10.9% of the total) were both significant at the .001 level.  Finally, we re-ran the program once more, using the same uniformly distributed random numbers, but this time excluding the demographic data and control genes; thus, the only non-random elements supplied to the RR53 procedure were the expression values of the 53 CTRA genes.  Despite the total lack of any information with which to correlate these gene expression values, the procedure generated 2,540 combinations of pseudo-factors (31.0%) with a pair of fold difference values having t-tests statistically significantly different from zero at the .05 level, of which 235 (2.9% of the total) were both significant at the .001 level.

Thus, in all cases, we obtained far more statistically significant results using Fredrickson et al.’s methods (6) than would be predicted by chance alone for truly independent variables (i.e., .052 × 8191 ≈ 20), even when the psychometric data were replaced by meaningless random numbers.  To try to identify the source of these puzzling results, we ran simple bivariate correlations on the gene expression variables, which revealed moderate to strong correlations between many of them, suggesting that our significant results were mainly the product of shared variance across criterion variables.  We therefore went back to the original psychometric data, and “scrambled” the CTRA gene expression data, reassigning each cell value for a given gene to a participant selected at random, thus minimizing any within-participants correlation between these values.  When we re-ran the regressions with these data, the number of statistically significant results dropped to just 44 (.54%).

The punchline

To summarize: even when fed entirely random psychometric punchlinedata, the RR53 regression procedure generates large numbers of results that appear, according to these authors’ interpretation, to establish a statistically significant relationship between self-reported well-being and gene expression.  We believe that this regression procedure is, simply put, totally lacking in validity.  It appears to be nothing more than a mechanism for producing apparently statistically significant effects from non-significant regression coefficients, driven by a high degree of correlation between many of the criterion variables.

poof1Despite exhaustive efforts, we could not replicate the authors’ simple factor structure differentiating hedonic versus eudaimonic well-being, upon which their genomic analyses so crucially depended. Then we showed that the complicated RR53 procedure turned random nonsense into statistically significant results. Poof, there is no there there (as Gertrude Stein once said about Oakland, California) in their paper, no evidence of “molecular signaling pathways that transduce positive psychological states into somatic physiology,” just nonsense.

How in the taxonomy of bad science, do we classify first this slipup and the earlier one in American Psychologist? Poor methodological habits, run-of-the-mill scientific sloppiness, innocent probabilistic error, injudicious hype, or simply an unbridled enthusiasm with inadequate grasp of methods and statistics?

Play nice and avoid the trap of negative psychology?

keep-calm-and-radiate-positivityOur PNAS article exposed the unreliability of the results and interpretation offered in a paper claimed to be a game changing breakthrough in our understanding of how positive psychology affects health by way of genomic expression. Science is slow and incomplete in self-correcting. But corrections, even of outright nonsense, seldom garner the attention of the original error. It is just not as newsworthy to find that claims of minor adjustments in everyday behavior modifying gene expression are nonsense as to make unsustainable claims in the first place.

Given the rewards offered by media coverage and even prestigious journals, authors can be expected to be incorrigible in terms of giving in to the urge to orchestrate media attention for ill understood results generated by dubious methods applied in small samples. But the rest of the scientific community and journalists need to keep in mind that most breakthrough discoveries are false, unreplicable, or at least wildly exaggerated.

The authors were offered a chance to respond to my muted and tightly constrained letter to PNAS. Cole and Fredrickson made references to analyses they have never presented and offered misinterpretations of the literature that I cited. I consider their response disingenuous and dismissive of any dialogue. I am willing to apologize for this assessment if they produce the factor analyses of the self-report data to which they pointed. I will even donate $100 to the American Cancer Society if they can produce it. I doubt they will.

Concerns about the unreliability of the scientific and biomedical literature have risen tothanks_for_not_kvetching_small_mugs the threshold of precipitating concern fromthe director of NIMH, Francis Collins. On the other hand,a backlash has  called out critics for encouraging a negative psychology warned to temper our criticism. Evidence of the excesses of critics include “’voodoo correlation’ claims, ‘p-hacking’ investigations, websites like Retraction Watch, Neuroskeptic, [and] a handful of other blogs devoted to exposing bad science”, to caution us that “moral outrage has been conflated with scientific rigor.” We are told we are damaging the credibility of science with criticism and that we should engage authors in clarification rather than criticize them. But I think our experience with this PNAS article demonstrates just how much work it takes to deconstruct outrageous claims based on methods and results that authors poorly understand but nonetheless promote in social media campaigns.. Certainly, there are grounds for skepticism based on prior probabilities, and to be skeptical is not cynical. But is not cynical to construct the pseudoscience of a positivity ratio and then a faux objective basis for moral philosophy?

Category: genomics, hedonia, positive psychology | Tagged , , , , | 5 Comments

Distress- the 6th vital sign for cancer patients?

A third or more of cancer patients experience significant psychological distress following their diagnosis. Yet a much smaller proportion receives any psychosocial services.

Cancer-Care-for-the-Whole-Patient-Meeting-Psychosocial-Health-NeedsThis situation is counter to recommendations from a number of sources, including the US Institute of Medicine report, Care of the Whole Patient, Meeting Psychosocial Health Needs. The gap between the proportion of patients experiencing distress and those getting services has led to widespread calls for implementing routine screening for distress. The assumption is that the breakdown between being distressed and getting services is patients getting identified and being referred to appropriate services.

There have been intense national and international campaigns by professional organizations first to recommend implementation of screening and then to mandate it as a condition of accreditation of cancer care settings. Increasingly, campaigns are organized around the slogan “distress is the sixth vital sign.” Promoters have sufficient political clout to get the slogan into the titles of articles in scientific journals, often with leaders of the field of psycho-oncology lending support as co-authors.

Holland, J. C., & Bultz, B. D. (2007). The NCCN guideline for distress management: a case for making distress the sixth vital sign. Journal of the National Comprehensive Cancer Network, 5(1), 3-7.

Bultz, B. D., & Johansen, C. (2011). Screening for distress, the 6th vital sign: where are we, and where are we going? PsychoOncology, 20(6), 569-57.

Watson, M., & Bultz, B. D. (2010). Distress, the 6th vital sign in cancer care. Psycho-oncologie, 4(3), 159-163.

The push to make distress accepted as the sixth vital sign for cancer patients is modeled after efforts to get pain designated as the fifth vital sign for general medical patients. In the late 1980s, it was recognized that pain management was grossly inadequate, with many patients’ pain going undetected and untreated. Pain was designated the fifth vital sign in general medical settings, with guidelines set for assessment and improved referral and treatment. These guidelines were first simply recommendations, but they grew to be mandated, with performance goals and quality of care indicators established to monitor their implementation in particular settings.

What is distress? Maybe not what you think

distressDistress was originally defined narrowly in terms of stress and anxiety and depressive symptoms. Early screening recommendations were for quick assessment with a distress thermometer modeled after the simple pain rating scale adopted in the campaign for pain as the fifth vital sign. Efforts were made to validate distress thermometers in terms of their equivalence to longer self-report measures of anxiety and depression, and in turn, interview-based measures of psychiatric disorder. Ratings on a distress thermometer were not seen as a substitute for longer assessments, but rather as a screening for whether longer assessments were needed. Patients who screened above a cutpoint on a distress thermometer would be identified for further assessment.

“Distress” was the preferred term, rather than stress or anxiety or depressive symptoms, because it was viewed as more neutral and not necessarily indicating that someone who wanted services had a psychiatric disorder. It was believed that many cancer patients did not seek services out of fear of being stigmatized and this vague term might avoid that raising that barrier to treatment.

However, the definition has now been extended to include a much broader range of needs and concerns. The widely cited definition of the NCCN of distress is

An unpleasant emotional experience of a psychological, social and/or spiritual nature which extends on a continuum from normal feelings of vulnerability, sadness and fears to disabling problems  such as depression, anxiety, panic, social isolation and spiritual crisis.

The emphasis is still on quick screening, but the distress thermometer is typically supplemented with a checklist of common problems. Particular items vary across recommendations for screening, but the checklist is meant to be a brief supplement to the distress thermometer. Canadians who have taken the lead in promoting distress as the sixth vital sign recommend a Modified Problem Checklist (PCL).

Click to Enlarge

Click to Enlarge

This list contains the 7 most common practical problems in our  settings (accommodation, transportation, parking, drug coverage, work/school, income/finances, and groceries); and 13 psychosocial problems (burden to others, worry about family/friends, talking with family, talking with medical team, family conflict, changes  in appearance; alcohol/drugs, smoking, coping, sexuality, spirituality, treatment decisions and  sleep). Participants indicate the presence or absence of each problem in the preceding week.

Despite the range of problems being assessed, the assumption is still that the predominant form of services that patients will seek as a result of being screened is some form of psychological counseling or psychotherapy.

Interventions usually assume one of four common forms: psychoeducation, cognitive-behavioural training (group or individual), group supportive therapy, and individual supportive therapy.

Support by professional organizations for screening of distress is quite widespread. Well-organized proponents exercise their political clout in making expression of enthusiasm for screening an obligatory condition for publishing in psycho-oncology journals. Dissenting critics and embarrassing data can be summarily sandbagged. Even when embarrassing data makes it through the review process, authors may be surprised to discover that their articles are accompanied by disparaging commentaries, to which they have not been offered an opportunity to respond. I have been publishing for over thirty years, and I have never before encountered the level of bullying that I have seen with papers concerning screening.

screeningYet, promotion of implementing routine screening for distress is based only on consensus of like-minded professionals, not the systematic evaluation of evidence that medical screening generally requires. Screening tests are common in medicine. The standard criteria for adopting a recommendation for screening is that it can be shown to improve patient outcomes beyond simply making available to patients and their providers the resources that are accessed by screening.

It is not sufficient that screening address a serious problem, it must also lead to improved outcomes. For instance, it is now recognized that although prostate cancer is a common and threatening disease, routine screening for prostate cancer of men without symptoms is likely to lead to overtreatment and side effects incontinence and loss of a sexual functioning, without any improvement in overall survival. Campaigns to promote  prostate specific antigen (PSA) testing have largely been discontinued. More generally, Choosing Wisely committees are systematically reevaluating the evidence for common forms of screening and withdrawing recommendations. Some well established screening procedures are losing their recommendation.

We undertook a systematic review of the literature concerning screening for distress and failed to find sufficient evidence to recommend it.

scrrening art

Click to Enlarge

Because of the lack of evidence of beneficial effects of screening cancer patients for distress, it is premature to recommend or mandate implementation of routine screening.

Other reviews have come to more favorable evaluations of screening,  arrived at by including nonrandomized trials or trials that chose number of referrals being made is a surrogate outcome in place of distress being reduced. However, for a patient getting referred to experience reduced distress, a complex pathway has to be crossed, starting with acceptance and completion of the referral and including receiving appropriate treatment at an appropriate level of intensity and frequency. Referrals, particularly when they are to outside the cancer’s care setting, are notoriously black holes. There are a few systematic data concerning the fate of referrals, but the general consensus is that many, perhaps most are not completed.

My colleagues and I also noted that the professional recommendations for screening have not been developed according to the standard procedures for making such recommendations. Development of professional guidelines is supposed to follow an orderly process of

  • Assembling a diverse and representative group of relevant stakeholders,
  • Systematically reviewing the literature with transparency in search criteria and method of synthesis and interpretation.
  • Publication of preliminary recommendations for public comment.
  • Revision of preliminary recommendations taking into account feedback.

In contrast, recommendations for screening have simply been released by professional groups assembled on the basis of previously known loyalty to the conclusion and with inattention to the lack of evidence showing that screening would improve outcomes. Notably missing from the “consensus” groups recommending screening are patients, frontline clinicians who will have to implement screening, and professionals from the community who would have to receive and evaluate referrals, notably primary-care physicians in most countries.

Nonetheless, the push for routine screening of cancer patients for psychological distress continues to gain momentum, with goals being set in many settings for the minimal proportion of cancer patients who must be screened.

The slogan “distress is the sixth vital sign” is not a testable empirical statement of the order of “depression is a significant risk factor for cardiac mortality.” Rather, it is best evaluated in terms of the use of terms, particularly “vital sign” and whether adopting slogan improves patient outcomes. Let us look at the notion of vital sign and then at the outcome of efforts to adopt pain has the fifth vital sign.

What is a vital sign?

According to Wikipedia

vital_signs_bot_image_blkVital signs are measures of various physiological statistics, often taken by health professionals, in order to assess the most basic body functions. Vital signs are an essential part of a case presentation. The act of taking vital signs normally entails recording body temperature, pulse rate (or heart rate), blood pressure, and respiratory rate, but may also include other measurements. Vital signs often vary by age.

In an excellent discussion of vital signs, Lucy Hornstein states

If someone other than the patient can’t see, hear, palpate, percuss, or measure it, it’s a symptom. Anything that can be perceived by someone else is a sign.

And

Vital signs are measured…and yield numeric results. Normal ranges are defined; values that fall outside those normal ranges are described with specific words (eg, bradycardia, tachypnea, hypothermia, hypertension).

A cough is a sign, but a headache is a symptom. Not that headaches are less “real,” they are just different.

With the standard definitions, pain cannot be a vital sign, only a symptom. The labeling of pain as a vital sign is metaphoric involves twisting the definition of a sign.

Something I will note now, but come back to later in the blog: patients cannot really argue against results of assessment of a vital sign. If repeated blood pressure readings indicate dystolic pressure above established cutoff, it is not left for the patient to argue “no, I do not have hypertension.” The notion is that a vital sign is objective and not subject to validation by patient self-report, although what to do about the vital sign may be interpreted in terms of other data about the patient, some of it from self-report. The point is that calling a measurement a “vital sign” claims particular authority for the results.

There have been numerous proposals for designating other vital signs in need of routine assessment and recording in medical records. These include (here I have extended the list from Wikipedia):

Pain as the fifth vital sign

Okay, so it is only metaphorically that pain can be considered a fifth vital sign. But what was nonetheless accomplished by designating it so in terms of improving its management? What can be learned that can be applied to designating distress as the sixth vital sign?

pain buttonThe pain-as-fifth-vital-sign (P5VS) campaign simply started with publicizing of high levels of untreated pain in medical populations, which in turn led to the consensus statements and mandated screening, similar to what is now occurring with declaring distress as the sixth vital sign.

1992. US Agency for Healthcare Quality Research (AHQR), issued guidelines documenting that half of surgical patients did not receive adequate post-surgical medication. The guidelines declared patients had a “right” to adequate pain measurement.

1995. American Pain Society (APS) issued a landmark consensus statement with guidelines for a quality improvement approach towards the treatment of acute and cancer pain [10], expanding upon its 1989 guidelines [11] for the treatment of pain. In his Presidential Address, Dr. James Campbell presented the idea of evaluating pain as a vital sign.

1998. The Veterans Health Administration (VHA) launched a national  P5VS initiative to pain thermometerimprove pain management for its patients, starting with documentation in the electronic medical record of assessment of patients’ self-report of pain.  It required use of a Numeric Rating Scale (NRS) in all clinical encounters.

P5VS campaigns were based on professional consensus, not appraisal of best evidence. When studies were later conducted, results did not demonstrate that the fifth vital sign campaign improved patient outcomes.

The title of a study of 300 consecutive patients before the start of the VA initiative and 300 afterwards says it all:

Measuring Pain as the 5th Vital Sign Does Not Improve Quality of Pain Management

The study examined 7 process indicators of quality of pain management fail to identify any improvement in

  • Subjective provider assessment (49.3% before versus 48.7% after).
  • Occurrence of a pain exam (26.3% versus 26.0%).
  • Orders to assess pain (11.7% versus 8.3%).
  • New pain medication prescribed (8.7% versus 11.0%)
  • Change in existing pain medication prescription (6.7%, 4.3%)
  • Other pain treatment (11.7% versus 13.7%).
  • Recording of follow-up plans (10.0%, 8.7%).

The initiative required that “patients indicating a score of four or about on a pain scale should receive comprehensive pain assessment and prompt intervention . . ’’

In a subsample of patients who reported substantial pain:

  • 22% had no attention to pain documented in the medical record of the visit which they reported.
  • 52% received no new therapy for pain at that visit.
  • 27% had no further assessment documented.

Our investigation of the P5VS…initiative at a single VA site has shown the routine documentation of pain levels, even with system-wide support and broad-based provider education, was ineffective in improving the quality-of-care.

carly-simon-havent-got-time-for-the-pain-1974

Carly Simon – Haven’t Got Time for the Pain

What was going wrong? It appears that front line clinicians making the pain assessments lacked the time or skills to take effective action. Pain assessments were typically conducted in encounters for which there are other reasons for the visit. Furthermore, pain assessments were collected outside of a clinical interaction in which the context or history could be discussed, which could likely lead to invalid ratings. When patients were asked only to rate current pain, it was unknown whether they took into account how much the pain bothered them, whether it was acute or chronic, or whether it reflected any change from past levels, all meaningful considerations. Other clinicians in the system either did not receive the ratings in a timely fashion or lacked the context to interpret the ratings.

Other studies [1, 2] similarly demonstrated that the P5VS campaign was by itself ineffective:

One potential reason for the insensitivity of routine pain screening in these studies is that all were conducted in outpatient primary and specialty care settings where chronic persistent or intermittent pain is much more common than acute pain. Routine pain screening that focuses on pain intensity “now” may not be sufficiently sensitive to detect important chronic pain that occurs episodically or varies with activity. In the VA primary care, the vast majority of pain problems are longstanding in nature, so sensitivity for chronic pain is important for any pain screening strategy in this setting.

The mandate that unrelieved pain must be addressed soon led to ineffective, inappropriate treatment that was not based on proper assessment. There was an increase in diagnostic tests that only confirmed existence of pain that was difficult to treat, notably chronic back pain. In the decade after the campaign to reduce pain was launched, costs of treating chronic back pain escalated without any demonstrated improvement in patient outcomes.

The guidelines had been promulgated with claims that addiction to opiates prescribed for acute pain was rare. But the evidence for that was only a brief 1980 letter in the New England Journal of Medicine indicating only four instances in treatment of 12,000 patients.

The campaign to improve pain control using routine ratings had an effect unanticipated by its proponents.

Dispensing opioids has almost doubled according to National Health and Nutrition Examination Survey data indicating that from 1988 to 1994 a total of 3.2% of Americans reported using opioids for pain, whereas from 2005 to 2008 a total of 5.7% reported use.

This significant increase has been associated with serious consequences, including an estimated 40 deaths per day due to prescription opioids.

Put simply

Improving pain care may require attention to all aspects of pain management, not just screening.

The pain as the fifth vital sign campaign involved mandated assessment of pain with a simple numerical scale at every clinical encounter, regardless of the reason for visit. The rating was typically obtained in the absence of any effort to talk to patients or to examine them in an effort to determine what were likely the multiple sources of their pain, its history, and their goals. Indeed, collecting these simple ratings may have become a substitute for having such discussions. The number on the rating scale came to characterize the patient for purposes of clinical decision-making, and may have led to overtreatment including escalating prescription of pain medication.

Is a good idea to consider distress the sixth vital sign?

Like pain, distress is not a vital sign by the conventional definition of the term. Yet to label it as such suggests that there is some sort of objective procedure involved in collecting ratings on a distress thermometer from patients.

Generally ignored in the promotion of screening is that most patients who indicate distress above established thresholds do not wish to receive a psychosocial service that they are not already receiving. Current guidelines for screening do not have a requirement of asking patients where they have any need for services. Instead, their response to the distress thermometer is used to tell them that they need intervention, with an emphasis on counseling.

When asked directly, most distressed patients reject the need for psychosocial services that they are not already getting, often outside the cancer center.  A rather typical study found that 14% definitely and an additional 29% maybe wanted to talk with a professional about their problems. Patients variously report

  • Already receiving services.
  • Believing they can solve the problems themselves.
  • Concentrating on the treating their physical illness takes precedence over receiving psychosocial and supportive services.
  • Services being offered to them are not needed, timely, or what they preferred.

A heightened score on a distress thermometer is a poor indication of whether patients are interested in receiving services that are listed on a screening sheet. Most do not want to receive a service, but most receiving services are not distressed. Think about it, looking at the problems listed on the screening form above. Many of the problems would be endorsed without patients having a heightened distress score. This poses for the dilemma for any interpretation of a score on a distress thermometer as if it were a vital sign.

Overall, thinking about distress as the sixth vital sign creates the illusion that a score on a distress thermometer is an authoritative, objective standalone indicator, much like a blood pressure reading. Actually, scores on a distress thermometer need to be discussed and interpreted. If distress is taken too seriously as the sixth vital sign, there is a risk that patients who do not meet the cut off for clinically significant distress will be denied an opportunity to discuss the problems that they might otherwise seek.

My colleagues and I undertook a study where we used results of screening for distress to attempt to recruit a sample of patients for an intervention trial evaluating problem-solving therapy as a way of reducing distress. It proved to be a discouragingly inefficient process.

  • We screened 970 cancer patients, of whom 423 were distressed, and, of these, 215 indicated a need for services. However, only 36 (4%) consented to participate in the intervention study.
  • 51% of the distressed patients reported having no need for psychosocial services and an additional 25% were already receiving services for their needs.
  • Overall, we had to screen 27 patients in order to recruit a single patient, with 17 hours of time required for each patient recruited.

Consider could have been accomplished if these 17 hours of were used instead to talk to patients who had indicated they wanted to talk to someone about their problems.

Designating distress as the sixth vital sign suggest false objectivity and validity to a procedure that has not demonstrated improvement in patient outcomes. It is an advertising slogan that is likely to prove ineffective and misdirect resources, just as the P5VS did.

 

 

Category: cancer, distress, evidence-supported, mental health care, palliative care, screening | Tagged , , , , | Leave a comment