I like badges – I have a lot of them! I’m also an open science advocate. So when a group of advocates declared their badges for authors were dramatically increasing open practice, I could have been a lock for the cheer squad. But, no, I’m not. In fact, I think the implications of the badge bandwagon are concerning.
These headlines and a tweeted claim will give you an idea of the level of hype around badges for sharing data and materials in science articles:
Simple badge incentive could help eliminate bad science. (Ars Technica)
Digital badges motivate scientists to share data. (Nature)
Even psychologists respond to meaningless rewards: All they needed to be more open with their data was a badge showing they did it. (FiveThirtyEight)
Data shows the effect of badges on transparency is unrivaled by any other known intervention. (Tweet)
That last one is a stunner, considering other known interventions include mandating data transparency as a condition for funding or publication! That tweet could take us straight to the issue of whether the badge policy pathway is ultimately good or bad for open science. And that a scientist said that goes to the peril of being a “true believer”. We’ll come back to those. But let’s start with the data people are talking about, and what these badges mean for us as readers.
The source of this was a paper by Mallory Kidwell and colleagues in PLOS Biology, called “Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency”. The developer and prominent social marketer of the badges is the senior author of that paper and head of the organization tied to them, Brian Nosek.
The title has most of the authors’ strong claims embedded in it. They are:
- The intervention studied is simply offering a badge if authors meet certain criteria.
- The intervention is “low cost”, “low risk”, “relatively resource-lite”.
- There are no other repercussions – “if badges are not valued by authors, they are ignored and business continues as usual”.
- Badges are “dramatically” effective – dramatic(ally) appears 9 times in the paper.
- Badges increase authors’ use of the kind of repository the badges exist in part to promote – like that of Nosek & co’s Open Science Framework (OSF).
The paper reports what seems to be the only study on this. It involves looking back at what happened before and after a prominent journal, Psychological Science, introduced badges in January 2014. To gauge whether any changes observed might be happening in the discipline anyway, the authors chose 4 journals that did not have badges for comparison.
They then collected data on open-ness of data and study materials from all 5 journals for research-based articles from 2012 to May 2015: 2 years of “before” data and 17 months of “after” data. Let’s look at the authors’ 5 claims and see whether the data support them. The open data for the project were a big help, so the authors get lots of brownie points for walking the open data talk.
Claim 1: Was the intervention simple badges and criteria for them?
No. And that’s pivotal, because any effect could be because of co-intervention(s), not of badges alone.
The badges were among a set of changes at the journal in January 2014. And there is no way to pick apart the contribution of the badge, if anything else could affect the proportion of authors able or willing to share data and materials.
The badges paper does not mention the co-interventions, so you cannot rely on it to understand what they studied. They have no citation at all where they state the journal introduced badges. Here’s the journal editorial that explains what happened from January 2014.
Badges were 1 of 5 initiatives introduced simultaneously. The initiatives were all directed at improved research reproducibility. They all had implications journal-side, and they all could have influenced authors’ decisions to go ahead and publish in Psychological Science, in ways that could affect the journal’s portfolio of research. Remember, to consider confounding variables, we’re looking for anything that could deter or incentivize publishing at the journal. Here are the 4 initiatives other than badges:
- Methods and results reporting were removed from the word limits, because of raised expectations for rigor and reporting quality.
- The evaluative criteria for editors and peer reviewers were changed, upping the importance of the description of methods and evaluating them stringently.
- 4 new items you have to detail and confirm were added on the article submission form, including confirming that you have reported all exclusions, that your manuscript reports all independent variables and manipulations you analyzed, and confirming your manuscript reports sample size calculation and data collection stopping rules.
- A move away from statistical significance, with expectation that analyses conform to what they call “the new-statistics movement” (with a tutorial for that). (Here’s my quick explainer on that issue.)
This represents a much more profound change on the editorial side than just adding a badge opportunity. And if the authors able and willing to do any of this, are associated with being an author who has data and materials in share-able state (including any required permissions), then the co-interventions and any interplay between them are potential contributors to any outcomes observed. Add to these interventions at the journal, the social marketing of the badges launch by influential open science advocates (ostensibly) independent of the journal.
To isolate the impact of badges, you would need groups that had the same conditions except the badges. One comparison group would then have compliance publicly shown only with words; another would have all the same criteria, but no public reward.
Claim 2: Adding badges is low cost.
There was no data on this. The resource implications for journal staff and peer reviewers were not discussed. Some of the co-interventions appear resource-intensive for journals, both in establishment and implementation.
Then there’s the really big question: cost for authors, on whom the bulk of effort here lands. The badges-get-you-huge-bang-for-not-much-buck has become a major part of the social marketing strategy for open science, hasn’t it? Yet, high quality open data practice is a lot more effort than not.
The entire package, or even just the requirements to get a badge, would be resource-intensive for any authors who aren’t already open science practitioners and into what they call “new statistics”. And the impact of that probably lies at the heart of what we will look at under the next claim.
Claim 3: There were no other repercussions.
To be clear, the authors definitely did not state there was an absence of other repercussions. They just seem to have assumed that there wouldn’t be, and didn’t report a plan for identifying unintended effects.
One issue jumped out with neon lights flashing and bells ringing as soon as I started looking at the data, though. It’s not in the paper, as absolute numbers are sparsely reported there. The paper focuses on percentages.
A part of why the percentage of articles reporting open data increased at Psychological Science (PS) was the denominator dropping: after January 2014, they published considerably fewer articles in total.
Productivity: Psychological Science (red) & comparator journals publishing 2002 to 2016 (PubMed IDs)
The data for this chart are here. It doesn’t come from the badges study data. That’s for 2 reasons. The time period was too short – I couldn’t think about what the data might mean if I couldn’t see it in the context of longer trends, because the number of events per journal is so small. (There are only 3 comparator journals, because one didn’t start publishing till 2013. Note too that this is PubMed records, so it includes non-research items in journals.)
Journals in psychology were rocked by a crisis in 2011. That’s what spurred the intense interest in improved methods and reporting in this field. At Psychological Science, it turns out, after having retracted only 1 article ever before 2011, between 2011 and 2012 they retracted 9.
Secondly, the badges study counts the time of publication by inclusion in a “print issue” list of contents – not when the article was actually published (the e-published date). It was March 2014 before any post-2013 articles appeared, and May 2014 before the first badge was awarded. The basic trend is the same, either way. And the upshot is that in 2016, according to PubMed, PS was publishing far less than half the items it published the year before it introduced the new policies (from 363 to 145).
Why? Without more data on practice and policy at PS, there’s not much to go on. This would be consistent, though, with the new policies deterring authors who didn’t already practice open science. There wasn’t a major discipline-wide drop in article production. One of the competitor journals has overtaken PS, and new journals started.
Of course, with so many co-interventions, you can’t say it’s the badge-related criteria that reduced article production so much. But if you’re going to claim the benefits of more articles with shared data and materials, then you have to own the drop in journal productivity (and its potential implications) too.
Claim 4: Adding badges alone is dramatically effective.
There is a lot I could say here, but I will limit it to a few key points. We’ve already seen some of those above: there are too many co-interventions to know what contribution badges made. And the increase in sharing without considering productivity exaggerates the impression of effectiveness.
It all suggests to me that the predominant effect we’re seeing here is a form of “natural selection”, with the journal ratcheting up its standards and possibly rejecting more, and its new systems potentially repelling less “open” authors. It would be really helpful to know what happened to submission and rejection rates at PS from 2010 onwards.
It seems from a superficial look at the data that the authors of 28% of the articles that qualified for badges apparently rejected the offer of the badge. If that is the case, this undercuts the badge as an incentive, too. The absolute number of badges was so small, though, and the timeframe so short, that all the data have a very high risk of bias. (To help put it in perspective, the mean number of badges in the last 6 months of the study was 4.4 a month; the 6 months prior, it was 3.7.)
The methods of the study have a high risk of other forms of bias as well. For example, although the authors went to a lot of effort to make sure the different coders were in sync, there was only a single coder for each study. In my neck of the woods we would consider that a high risk of bias, especially coupled with not being blind to which journal articles came from.
The reporting also has a high degree of researcher spin, which always sets off alarm bells – like an abstract with no absolute numbers, and full text without showing proportions with absolute numbers.
The longer I looked at the data in this study and its context, the harder the impact of badges became to discern. And the relevance of this data to areas outside psychology was problematic too. Studies with no need for a data availability statement because all the data is in the paper and supplementary files can be common in some study types, for example.
It doesn’t mean of course that badges have no effects, especially when accompanied by intense marketing by opinion leaders. But interventions hyped from uncontrolled research generally speaking don’t turn out to be as “dramatic” as their proponents believe. And with funders increasing their requirements for open data, the waters here have been getting murkier.
Claim 5: Adding badges increases use of independent repositories.
This is part of the intent of the intervention, designed, as it was, by an independent repository group. I didn’t look at the data on this question. Let’s just assume it achieves this aim as they say it does. And move on to what all this means – for us as readers, and for open science.
It was encountering these badges as a reader recently at Psychological Science that propelled me into writing this post. The marketing hype from advocates had disturbed me ever since this clearly weak study came out. But it coalesced into concern seeing it in action, and engaging with the badge group on Twitter about it.
Here’s what the article in question looked like at home:
Open data badge, open materials badge – on a closed access article. (The open data wasn’t included as supplementary material and nor was there a link to it outside the paywall. To their credit, they are going to try to organize accessible links in response my complaint.)
Here’s what the article looked like at work, where we have a subscription to the journal:
For anyone with a subscription viewing this, there will be the impression that these authors are practicing fully open science. They are not.
The Twitter conversation about why the open data/materials badges are so limited, ended up with the argument, it’s so high impact closed journals can say they have/move to more open practices – and that access to articles isn’t a problem for open science.
I don’t think the argument that an article is not part of a study’s data flies. It’s as though we’ve moved from saying the paper is the study, to the paper isn’t even part of the study! And open, for data and materials, is defined here in such a way that you can leave essential data and methodological information behind the paywall – and still gain open practice badges.
Part of the reason for openness of studies’ artifacts is to enable critical appraisal – the kind I just did. And not only from people with subscriptions.
Sometimes, you understand more from a study’s data than the paper – if you have the time and skills. But usually, you need as many artifacts as you can get. What’s more, if the authors stick to the specific criteria needed to get an open badge, then they can leave out data they didn’t analyze for the paper but which is essential to check on their analytical choices and interpretations.
Funders’ open data requirements aren’t so limited, and that matters. A large part of what the people who use data from others’ previous studies are studying has nothing to do with the authors’ original analyses. In this analysis, over 70% of requests for clinical data from a repository giving a reason was, asking new questions.
One of the most compelling talks I’ve ever heard about open-ness was by Audrey Watters, at Open Con 2014. (Check out the blog post at that link even if not the full talk.) She eloquently spelled out the dangers of openwashing. A bit like people prominently badging high sugar foods as “low fat”.
The version of data and materials sharing attached to the open badges isn’t the only path here. There are funders’ mandates and Dryad’s Joint Data Archiving Policy (JDAP) that encourage more open routes, for example. These practices and fully open journals have been growing without badges (see here, here, here, here), sometimes at vastly higher rates than in PS.
It’s an open question, really, whether the spread of this badges brand will be a net gain or loss for open-ness in science. What if it spreads at the expense of more effective and open pathways? People who want to be open practitioners, but need to publish some things in high impact closed journals could do better for others by trying those that accept their work after it’s posted as a preprint with open data, rather than seeking out badges.
Here’s another from my personal badge collection. It’s pinned next to my computer at work. It’s an important one.
“Am I fired yet?” is one of several reminders to myself to put principles and public interest ahead of my personal interest. I learned, painfully, decades ago, that there is a strongly magnetic pull towards your personal interests. I learned it in advocacy movements. It’s not only commercial interest that’s a risk. You truly need to minimize bias in your approach to evidence, or you are lost.
As I’ve spent time with the badges “magic bullet” – simple! cheap! no side effects! dramatic benefits! – supported by a single uncontrolled study by an influential opinion leader, with a biased design in a narrow unrepresentative context, very small number of events, and short timeframe….I’ve come to think its biggest lesson may be that even many open science advocates have yet to fully absorb the implications of science’s reliability problems.
[Continued on 31 August: FAQ with more data….What’s Open, What’s Data? What’s Proof, What’s Spin?]
[Update, 30 August 2017] Comment posted at PubMed Commons.
Disclosures: My day job is with a public agency that maintains major literature and data repositories (National Center for Biotechnology Information at the National Library of Medicine/National Institutes of Health). I have had a long relationship with PLOS, including several years as an academic editor of PLOS Medicine, and serving in the human ethics advisory group of PLOS One, as well as blogging here at its Blog Network. I am a user of the Open Science Framework.
* The thoughts Hilda Bastian expresses here at Absolutely Maybe are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services.