It started in June, with cardiovascular disease. Then August ended with another bout, but in cancer. It’s been a rough few months for non-inferiority clinical trials!
We’ve known for as long as there have been non-inferiority trials that the evidence they produce is bound to be, well, inferior. A “normal” clinical trial tries to establish whether a treatment (or whatever) is better than a placebo or another intervention: that’s a superiority trial. A non-inferiority trial lowers the bar a long way, and that makes them easier to do. They are just shooting for showing the treatment is “no worse” – more or less.
There’s a detailed explanation by Steven Snapinn of this and, non-inferiority’s close cousin, equivalence, here. He wrote:
…the results of such trials are not as credible as those from a superiority trial…(A) noninferiority trial that successfully finds the effects of the treatments to be similar has demonstrated no such thing.
Snapinn rolled out a slew of inherent problems already well-known then, back in 2000. Even blinding might not work as well as in superiority trials. If you are assessing subjective outcomes blind to the person’s treatment in a non-inferiority trial, and you are biased to believing the treatments are equal, then it’s easy to reckon everyone’s outcomes are not too different, Snappin argued.
There has been some good news established by non-inferiority trials – Venkatesh Murthy & co wrote about some of those, with cautious optimism (2012). I used those authors’ search strategy to get a rough idea of whether the number of them was still rising. It is. They are still a tiny proportion of all trials, but they seem to be increasing more quickly. (Details and link to this data are below this post.)
There may be relatively few of them, but they have an outsized influence. In 2010, a Government Accountability Office (GAO) report found that the FDA approved 18 out of 43 new drug applications between 2002 and 2009 based on non-inferiority trials.
Which brings us to the first of the 2 new studies on non-inferiority trials.
Behnood Bikdell and colleagues dug out and assessed non-inferiority trials in cardiovascular disease from 3 of the highest profile general medical journals, from 1990 to 2016 (June 2019). They found 111 of them, nearly half of them published after 2010. Most of the trials were funded wholly or partly by industry. More than half were trials of drugs (69.5%).
The trials mostly concluded that the treatments they tested were non-inferior (77.5%). The scary part?
Only 7 (6.3%) trials were considered low-risk for all the major and minor biasing factors.
There were the usual sets of problems common in any bunch of trials. On top, came the weaknesses peculiar to non-inferiority trials. The non-inferiority margins – how high (or low) the bar was set for counting a draw or better – “varied widely”, the authors wrote. And there was often no justification given for how the margin was chosen – a whopping 38.7% of the trials. For the trials that had a methods paper or protocol (60% of them), “1 in 8 had discrepancies related to the non-inferiority margin, compared with the published manuscript”. Seeing shifting of goal posts without a very justifiable reason would be alarming, for sure.
Next up was Bishal Gyawali and colleagues, with a systematic review (30 August 2019). Their study doesn’t include as many trials, but we don’t have to consider the possibility of influence from a few journals’ biased choices here (like potentially preferring studies with “positive” results).
Gyawali & co looked for non-inferiority trials of cancer drugs that had overall survival as an outcome. They found 128 non-inferiority trials of cancer treatment, of which 74 were drug trials, and 23 of those met their criteria. That means that 69% of these cancer drug trials used less reliable measures like progression-free survival, leaving patients and doctors not really sure about what they might be trading off.
The story for the 23 cancer drug trials they studied was similar to the 111 cardiovascular trials: again, 78% of the trials concluded non-inferiority. Most were wholly or partly industry-funded. The authors didn’t discuss justification given for the non-inferiority margin, but they did have criteria for justifying the use of non-inferiority design: the experimental drug was less toxic, cheaper for patients, was easier to use, or could potentially improve quality of life. And 39% of the trials didn’t clear that bar.
What the authors report about the non-inferiority margins was eye-opening. The variation was wide, even here, where we are talking about the exact same outcome: 1.08 to 1.33 for the upper bound of the hazard ratio’s confidence interval (CI),
which means from an 8% to a 33% increase in the hazard of death was considered acceptable (noninferior) in these trials. Furthermore, in multiple cases, this upper limit was defined not for a 95% CI but for a 90% or even an 80% CI.
For most of the trials (70%), the accepted level was 22%. (Lowering the CI makes it easier to clear the statistical bar.)
When Gyawali’s team pooled the data from all of the trials in a meta-analysis at the 95% CI level, the experimental drugs didn’t worsen (or improve) survival. (I explain meta-analysis here.)
Noninferiority trials may be attractive because of the high probability of success 8 … However, our data show that institutional review boards and drug regulators should take an active role in adjudicating whether the noninferiority design is acceptable for the given question. When noninferiority design trials are considered important, the criteria to define noninferiority should be clearly defined based on a widely accepted rationale and should incorporate patient input.
Incorporating patient preferences into designing non-inferiority trials was the subject of another paper in June, by Sergio Acuna and colleagues. That’s good to see. There are lots of untested assumptions about what matters to people in this area, including attitudes to placebos.
A common argument for using non-inferiority designs is when a placebo control group isn’t acceptable. But that’s not enough reason on its own. There can still be reasonable active treatments for comparison for effectiveness trials. Simone Lanini and colleagues used simulated data to argue earlier in August that another option, adaptive randomized trials, have advantages of non-inferiority trials, while still being a superiority trial. (Adaptive trials are planned to change course based on analyses as the trial progresses.)
These recent ones aren’t the first studies on these questions, and they haven’t all had worrying results – a study of 170 trials in 2010, for example (challenged here). With the pace and influence of these trials heating up, I hope there’s a good review coming down the line to get this into perspective. If the number and influence of these trials keeps growing without them being rigorous enough, this is a critical issue.
In their cautiously optimistic 2012 paper, Murthy & co’s caution came from a fear that we could get to the point where new drugs were being approved based on non-inferiority with other drugs that had themselves been approved on non-inferiority trials. It would be reassuring if we’re still not there yet. But people could be relying on deeply inferior evidence like this for other decisions, too. Proving something is only loosely possibly non-inferior to another possibly only non-inferior something might well be a quick and easy road: but it’s several degrees removed from being shown to be superior than doing nothing.
Also in the last couple of months….
2 trials, same result, different interpretations: @paulpharoah has written a wonderful post to start a discussion about non-inferiority margins and #Clinicaltrial interpretation at https://t.co/9GvSXg8c0I
— Frank Harrell (@f2harrell) July 17, 2019
The data behind the chart for the rise of non-inferiority trials is here. I chose the cut-off date of 2015 to allow for some time lag in PubMed tagging of clinical trials. Each of the 4,418 search results is not necessarily a non-inferiority trial, and every non-inferiority trial will not be included: it’s an indicator of a trend only. I used the search strategy for non-inferiority trials from the paper by Venkatesh Murthy and colleagues (2012), running it without the non-inferiority terms as well. Both searches were done on 31 August 2019:
(“non-inferior” OR “non-inferiority” OR “noninferior” OR “noninferiority”) AND (clinical trial[pt] OR randomized controlled trial[pt])