Reverse engineer my criticisms of this article and you will discover a strategy to turn your own null findings into a publishable paper.
Here’s a modest little study with null findings, at least before it got all gussied up for publication. It has no clear-cut clinical or public health implications. Yet, it is valuable as a teaching example showing how such studies get published. That’s why I found it interesting enough to blog about it at length.
van de Ven, M. O., Witteman, C. L., & Tiggelman, D. (2013). Effect of Type D personality on medication adherence in early adolescents with asthma. Journal of Psychosomatic Research, 75(6), 572-576. Abstract available here and fulltext here]
As I often do, I am going to get quite critical in this blog post, maybe even making some readers wince. But if you hang in there, you will see some strategies for publishing negative results as if they were positive that are widely used throughout personality, health, and positive psychology. Your critical skills will be sharpened, but you will also be able to reverse engineer my criticisms to get papers with null findings published.
Read on and you’ll see things that the reviewers at Journal of Psychosomatic Research apparently did not see, nor the editors, but they should have. I have emailed the editors inviting them to join in this discussion and I am expecting them to respond. I have had lots of dealings with them and actually find them to be quite reasonable fellows. But peer review is imperfect, and one of the good things about blogging is that I can get the space to call out when it fails us.
The study examined whether some measures of negative emotion predicted adherence in early adolescents with asthma. A measure of negative affectivity (sample item: “I often make a fuss about unimportant things”) and what was termed social inhibition (sample item “I would rather keep other people at a distance”) were examined separately and when combined in a categorical measure of Type D personality (the D in Type D stands for distress).
Type D personality studies were once flourishing, even getting coverage in Time and Newsweek and discussion by Dr. Oz. The claim was that a Type D personality predicted death among congestive heart failure patients so well that clinicians should begin screening for it. Type D was supposed to be a stable personality trait, so it was not clear what clinicians could do with the information from screening. But I will be discussing in a later blog post why the whole area of research can itself be declared dead because of fundamental, inescapable problems in the conception and measurement of Type D. When I do that, I will draw on an article co-authored with Niels de Voorgd, “Are we witnessing the decline effect in the Type D personality literature?”
John Ioannidis providing an approving commentary on my paper with Niels, with the provocative title of “Scientific inbreeding and same-team replication: Type D personality as an example.” Among the ideas attributable to Ioannidis are that most positive findings are false, as well as that most “discoveries” are subsequently proven to be false or at least exaggerated. He calls for a greater value being given to replication, rather than discovery.
Yet in his commentary on our paper, he uses the Type D personality literature as a case example of how the replication process can go awry. A false credibility for a hypothesis is created by false replications. He documented is significant inbreeding of investigators of type D personality: a quite small number of connected investigators are associated with studies with statistically improbable positive findings. And then he introduced some concepts that can be used to understand processes by which the small group could have undue influence on replication attempts by others:
… Obedient replication, where investigators feel that the prevailing school of thought is so dominant that finding consistent results is perceived as a sign of being a good scientist and there is no room for dissenting results and objections; or obliged replication, where the proponents of the original theory are so strong in shaping the literature and controlling the publication venues that they can largely select and mold the results, wording, and interpretation of studies eventually published.
Ioannidis’ commentary also predicted that regardless of any merits, our arguments would be studiously ignored and even suppressed by proponents of Type D personality. Vested interests use the review process to do that with articles that are inconvenient and embarrassing. Reviewing manuscripts has its advantages in terms of controlling the content of what is ultimately published.
Don’t get me wrong. Niels and I really did not expect everyone to immediately stop doing Type D research just because we published this article. After all, a lot of data have already been collected. In Europe, where most Type D personality data get collected, PhD students are waiting to publish their Type P articles in order to complete their dissertations.
We were very open to having Type D personality researchers pointing out why we were wrong, very wrong, and even stupidly wrong. But that is not what we are not seeing. Instead, it is like our article never appeared, with little trace of it in terms of citations even in, ah, Journal of Psychosomatic Research, where our article and Ioannidis’ commentary appeared. According to ISI Web of Science, our article has been cited an overall whopping 6 times as of April 2014. And there have been lots more Type D studies published since our article first appeared.
Anyway, the authors of the study under discussion adopted what has become known as the “standardized method” (that means that they don’t have to justify it) for identifying “categorical” Type D personality. They took their two continuous measures of negative affectivity and social inhibition and split (dichotomized) them. They then crossed them, creating a four cell, 2 x 2 matrix.
Next, they then selected out the high/high quadrant for comparison to the three other groups combined as one.
So, the authors made the “standardized” assumption that only the difference between a high/high group and everyone else was interesting. That means that persons who are low/low will be treated just the same as persons who are high in negative affectivity and low in social inhibition. Those who were low in negative affectivity but high in social inhibition are simply treated the same as those who are low on both variables. The authors apparently did not even bother to check– no one usually does– whether some of the people who were high in negative affectivity and low in social inhibition actually had higher scores on negative affectivity than those assigned to the high/high group.
I have been doing my own studies and reviews of personality and abnormal behavior for decades. I am not aware of any other example where personality types are created in which the high/high group is compared to everybody else lumped together. As we will see in a later blog, there are lots of reasons not to do this, but for Type D personality, it is the “standardized” method.
Adherence was measured twice in this study. At one point we readers are told that negative emotion variables were also assessed twice, but the second assessment never comes up again.
The abstract concludes that
categorical Type D personality predicts medication adherence of adolescents with asthma over time, [but] dimensional analyses suggest this is due to negative affectivity only, and not to the combination of negative affectivity and social inhibition.
Let’s see how Type D personality was made to look like a predictor and what was done wrong to achieve this. To enlarge Table 2 just below, double click on it.
Some interesting things about Table 2 that reviewers apparently missed:
- At time T1, adherence was not related to negative affectivity, social inhibition, or Type D personality. There is not much prediction going on here.
- At time T2, adherence was related to the earlier measured negative affectivity, but not to social inhibition or Type D personality.
Okay, if the authors were searching for significant associations, we have one, only one, here. But why should we ignore the failure of personality variables to predict adherence measured at the same time and concentrate on the prediction of later adherence? Basically, the authors have examined 2×3=6 associations, and seem to be getting ready to make a fuss about the one that proved significant, but was not predicted to stand alone.
Most likely this statistical significance is due to chance– it certainly was not replicated in same-time assessments of negative affectivity and adherence at T1. But this Association seems to be the only basis of claiming one of these negative emotion variables are actually predictors.
- Adherence at time T2 is strongly predicted by adherence at time T1.
The authors apparently don’t consider this particularly interesting, but it is the strongest association in the data set. They want instead to predict change in adherence from T1 to T2 from trait negative emotion. But why should change in the relatively stable adherence be predicted by negative emotion when negative emotion does not predict adherence measured at the same time?
We need to keep in mind that these adolescents have been diagnosed with diabetes for a while. They are being assessed for adherence at two arbitrary time points. This no indication that something has happened in between those points that might strongly affect their adherence. So, we are trying to predict fluctuations in a relatively stable adherence from a trait, not any upward or downward spiral.
Next, some things we are not told that might further change our opinions about what the authors say is going on in their study.
Like pulling a rabbit out of a hat, the authors suddenly tell us that they measured self-reported depressive symptoms. The introduction explains this article is about negative affectivity, social inhibition or Type D personality, but only mentions depression in passing. So, depression was never given the explanatory status that the authors give to these other three variables. Why not?
Readers should have been shown the correlation of depression with the other three negative emotion variables. We could expect from a large literature that the correlation is quite high, probably as high as their respective reliabilities allow—as good, or as bad as it gets.
This no particular reason why this study could not have focused on depressive symptoms as predictors of later adherence, but maybe that story would not have been so interesting, in terms of results.
Actually, most of the explanations offered in the introduction as to why measures of negative emotion should be related to adherence would seem to apply to depression. Just go back to the explanations and substitute depression for whatever variables being discussed. See, doesn’t depression work as well?
One of the problems in using measures of negative emotion to predict other things is that these measures are related so much to each other that we can’t count on them to measure only the variable we are trying to emphasize and not something else.
Proponents of Type D personality like these authors want to assert that their favored variable does something that depression does not do in terms of predictions. But in actual data sets, it may prove tough to draw such distinctions because depressive symptoms are so highly correlated with components of Type D.
Some previous investigators of negative emotion have thrown up their hands in despair, complaining about the “crud factor” or “big mess” of intercorrelated measures of negative emotion ruining their ability to test their seemingly elegant ideas about supposedly distinctly different negative emotion variables. When one of the first Type D papers was published, an insightful commentary complained that the concept was entering an already crowded field of negative emotion variables and asked whether we really needed another one.
In this study, the authors depressive symptoms with the self-report Hospital Anxiety and Depression Scale (HADS) The name of the scale suggests that it separately measures anxiety and depression. Who can argue with the authority of a scale’s name? But using a variety of simple and complicated statistical techniques like different variants of factor analysis, investigators have not been able to show consistently that the separates subscales for anxiety and depression actually measure something different from each other– or that the two scales should not be combined into a general measure of negative emotion/distress.
So talk about measuring “depressive symptoms” with the HADS is wrong, or at least inaccurate. But there are a lot of HADS data sets out there, and so it would be inconvenient to acknowledge what we said in the title of another Journal of Psychosomatic Research article,
Back to this article, if readers had gotten to see the basic correlations of depression with the other variables in Table 2, we might have seen how high the correlation of depression was with negative affectivity. This would have sent us off in a very different direction than the authors took.
To put my concerns in simple form, data that are available to the authors but hidden from the readers’ view probably do not allow making the clean kind of distinctions that the authors would need to make if they are going to pursue their intended storyline.
But, uh, measures of depressive symptoms show up all
the time in studies of Type D personality. Think of such studies as if they are like the rigged American wrestling matches. Depressive symptoms are the heel (or rudo in lucha libre) that always shows up as a looking mean and threatening contender, but most always loses to the face, Type D personality. Read on and find out how supposedly head-to-head comparisons are rigged so this dependably happens.
The authors eventually tell us that they assessed (1) asthma duration, (2) control, and (3) severity. But we were not allowed to examine whether any of these variables were related to the other variables in Table 2. So, we cannot see whether it is appropriate to consider them as “control variables” or more accurately, confounds.
There is good reason to doubt that these asthma variables are suitable “control variables” or candidates for a confounding variable in predicting adherence.
First, for asthma control to serve as a “control variable” we must assume that it is not an effect of adherence. If it is, it makes no sense to try to eliminate asthma control’s influence on adherence with statistics. It sure seems logical that if these teenagers adhere well to what they are supposed to do to deal with their asthma, asthma control will be better.
Simply put, if we can reasonably suspect that asthma control is a daughter of adherence, we cannot keep treating it as if it is the mother that needs to be controlled in order to figure out what is going on. So there is first a theoretical or simple logical objection to treating asthma control as a “control” variable.
Second, authors are not free to simply designate whatever variables they would like as control variables and throw them into multiple regression equations to control a confound. This is done all the time in the published literature, but it is WRONG!
Rather, authors are supposed to check first and determine if two conditions are met. The variables should be significantly related to the predictor variables. In the case of this study, asthma control should be shown to be associated with one or all of the negative emotion variables. But then the authors would also have to show that it was also related to subsequent adherence. If both conditions are not met, the variable should not be included as a control variable.
Reviewers should have insisted on seeing these associations among asthma duration, control, severity, and adherence. While the reviewers were at it, they should have required that the correlations be available to other readers, if the article is to be published.
We need to move on. I am already taxing readers’ patience with what is becoming a longread. But if I have really got you hooked into thinking about the appropriateness of controlling for particular confounds, you can digress to a wonderful slide show telling more.
So far, we have examined a table of basic correlations, not finding some things that we really need to decide what is going on here, but we seem to be getting into trouble. But multivariate analyses will be brought in to save this effort.
The magic of misapplied multivariate regression.
The authors deftly save their storyline and get a publishable paper with “significant” findings in two brief paragraphs
The decrease in adherence between T1 and T2 was predicted by categorical Type D personality (Table 3), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Adolescents with a Type D personality showed a larger decrease in adherence rates fromT1 to T2 than adolescents without a Type D personality.
The results of testing the dimensions NA and SI separately as well as their interaction showed that there was a main effect of NA on changes in adherence over time (Table 4), and this remained significant after controlling for demographic and clinical information and for depressive symptoms. Higher scores on NA at T1 predicted a stronger decrease in adherence over time. Neither SI nor the interaction between NA and SI predicted changes in adherence.
Wow! But before we congratulate the authors and join in the celebration we should note a few things. From now on in the article, they are going to be discussing their multivariate regressions, not the basically null findings obtained with the simple bivariate correlations. But these regression equations do not undo the basic findings with the bivariate correlations. Type D personality did not predict adherence, but it only appears to do so in the context of some arbitrariy and ill-chosen covariates. But now they can claim that Type D won the match fair and square, without cheating.
But don’t get down on these authors. They probably even believe in their results. They have merely following the strong precedent of what almost everybody else seems do in the published literature. They did not get caught by the reviewers or editors of Journal of Psychosomatic Research.
Whatever happened to depressive symptoms as a contender for predicting adherence? It was not let into the ring until after Type D personality and its components had secured the match. These other variables got to do all the predicting they could do, and only then depressive symptoms were entered the ring. That is what happens when you have highly correlated variables and manipulate the match by picking one to go first.
And there is a second trick guaranteeing that Type D will win over depressive symptoms. Recall that to be called Type D personality, research subjects had to be high on negative affectivity and high on social inhibition. Scoring high on two (imperfectly reliable) measures of negative emotion usually bests scoring high on only (imperfectly reliable) one. But if the authors had used two measures of depressive symptoms, they could have had a more even match.
The big question: so what?
Type D personality is not so much a theory, as a tried-and-true method for getting flawed analyses published. Look at what the authors of this paper said about it in the introduction in their discussion. They really did not present a theory, but rather cited precedent and made some unsubstantiated speculations about why past results may have been obtained.
Any theory about Type D personality and adherence really does not make predictions with substantial clinical and public health implications. Think about it: if this study had worked out as the authors intended, what difference would it have made? Type D personality is supposedly a stable trait, and so the authors could not have proposed psychological interventions to change it. That has been done and does not work in other contexts.
What, then, could authors have proposed, other than that more research is needed? Should the mothers of these teenagers be warned that their adolescents had Type D personality and so might have trouble with their adherence? Why not just focus on the adherence problems, if they are actually there, and not get caught up in blaming the teens’ personality?
But Type D has been thung.
Because the authors have been saying in lots of articles that they have been studying Type D, it is tough to get heard saying “No, pal, you have studying statistical mischief. Type D does not exist except for statistical mischief.” Type D has been thung, and who can undo that?
Thing (v). to thing, thinging. 1. To create an object by defining a boundary around some portion of reality separating it from everything else and then labeling that portion of reality with a name.
One of the greatest human skills is the ability to thing. We are thinging beings. We thing all the time.
Yes, yes, you might think, but we are not really “thinging.” After all trees, branches and leaves already existed before we named them. We are not creating things we are just labeling things that already exist. Ahhh…but that is the question. Did the things that we named exist before they were named? Or more precisely, in what sense did they exist before they were named, and how did their existence change after they were named?
…And confused part-whole relationships become science and publishable.
Once we have convincingly thung Type D personality, we can fool ourselves and convince others about there being a sharp distinction with the similarly thung “depressive symptoms.”
Boundaries between concepts are real because we make them so, just like between Canada and the United States, even if particular items are arbitrarily assigned to one or the other questionnaire. Without our thinging, we do not as easily forget the various items come from the same “crud factor,” “big mess,” and could have been lumped or split in other ways.
Part-whole relationships become entities interacting with entities in the most sciencey and publishable ways. See for instance
Romppel, et al. (2012). Type D personality and persistence of depressive symptoms in a German cohort of cardiac patients. Journal of Affective Disorders, 136(3), 1183-1187.
Which compares the effectiveness of Type D as a screening tool compared to established measures of depressive symptoms measured with the (ugh) HADS for predicting subsequent HADS depression.
Lo and behold, Type D personality works and we have a screening measure on our hands! Aside from the other advantages that I noted for Type D as a predictor, negative affectivity items going into the ype D categorization are phrased as if they refer to enduring characteristics whereas items on the HADS are phrased to refer to the last week.
Let us get out of the mesmerizing realm of psychological assessment. Suppose we ask a question about whether someone ate meatballs last week or whether they generally eat meatballs. Which would question you guess better predicts meatball consumption over the next year?
And then there is
Michal, et al. (2011). Type D personality is independently associated with major psychosocial stressors and increased health care utilization in the general population. Journal of Afective Disorders, 134(1), 396-403.
Which finds in a sample of 2495 subjects that
Individuals with Type D had an increased risk for clinically significant depression, panic disorder, somatization and alcohol abuse. After adjustment for these mental disorders Type D was still robustly associated with all major psychosocial stressors. The strongest associations emerged for feelings of social isolation and for traumatic events. After comprehensive adjustment Type D still remained associated with increased help seeking behavior and utilization of health care, especially of mental health care.
The main limitation is the reliance on self-report measures and the lack of information about the medical history and clinical diagnosis of the participants.
Yup, relied on self-report questionnaires in multivariate analyses, not interview-based diagnosis and the measure of “depression” or “depressive symptoms” asked about the last 2 weeks.
How did the study of negative emotion and adherence get published with basically null findings? With chutzpah and by the authors following the formulaic D personality strategy for getting published. This study did not really obtain significant findings, but the availability of the precedent of many studies of type D personality to support claims they achieved a conceptual replication, even if not an empirical one. And these claims were very likely evaluated by members of the type D community making similar claims. In his commentary, Ioannidis pointed to how null Type D findings are gussied up with “approached significance” or, better, was “independently related to blah, blah, when x,y, and z are controlled.”
Strong precedents are often are confused with validity, and the availability of past claims relaxes the standards for making subsequent claims.
The authors were only doing what authors try to do, their damnedest to get their article published. Maybe the reviewers are from the Type D community and can cite the authority of hundreds of studies were only doing what the community tries to do– keep the cheering going for the power of Type D personality and adding another study to the hundreds. But where were the editors of Journal of Psychosomatic Research?
Just because the journal published our paper, for which we remain grateful, I do not assume that they will require authors who submit new papers to agree with us. But you would think, if the editors are committed to the advancement of science, they would request that authors of manuscripts at least relate their findings to the existing conversation, particularly in the Journal of Psychosomatic Research. Authors should dispute our paper before going about their business. If it does not happen in this journal, how can we expect to happen elsewhere?
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution 3.0 Unported License.