Physicians have a standing joke they like to tell whenever a new wonder drug comes on the market.
“Use it quickly, because it will be useless after the first year.”
They are not talking about the natural decay of pharmaceuticals over time. (Everyone knows you should never take meds that have been in your medicine cabinet for over a year. They deteriorate.)
No, the objects of the joke are the newly-discovered drugs that promise to alleviate depression, calm agitation or lower blood pressure.
“They seem to work well for the first six months they are around,” says Bob, a family physician. “But a year later, no one is talking about them any more. It’s funny that these drugs seem to lose their effectiveness as time goes on. I don’t know why. Perhaps it’s my imagination.”
It’s not imaginary. The decline effect, as it is popularly known, has been noted not only in medicine, but also in the basic sciences including biology, chemistry and physics. Papers trumpeting the discovery of marked phenomena are followed by research that reports much more modest effects. But the decline effect is most pronounced in medicine, particularly among mood-altering drugs such as tranquilizers and anti-depressants.
Steven Novella, a clinical neurologist, described the effect in a 13 December 2010 blog. A January article in Nature News by Jonathan Schooler dealt with the same phenomenon.
(An interesting aside. A passing comment by Schooler that the very fact of scientists observing a phenomenon could “change some scientific effects” brought a rebuke from Novella who described the comment as “dangerously close to quantum woo.” Novella’s discussion appears in a June 2011 blog.)
Despite that disagreement, Novella and Schooler concur on the more prosaic causes of the decline effect; but neither organizes these causes in a coherent way, so I’ll do it for them. We can slot them into three non-disjoint categories: problems with preliminary investigative research (cherry picking part one), study design glitches (polishing the cherries) and publication bias (cherry picking, part two).
Cherry picking, part one.
I was once involved in a major observational study designed to tease out the major risk factors and preventive factors for Alzheimer’s disease. We interviewed more than 10,000 subjects, checking them for signs of the disease and also looking at factors as diverse as diet, exercise, occupation and exposure to chemicals. The data analysis included an assessment of whether each of these factors — and there were more than 70 of them — was associated with Alzheimer’s, other dementias or cardio-vascular diseases. We found the usual culprits: older people had higher rates of dementia; overweight people were more at risk of stroke or heart disease. But one factor stood out as preventive. People who drank a lot of tea had remarkably low rates of Alzheimer’s disease.
Now this was a cherry prime for picking — an unexpected result, a simple prevention for a nasty disease. Publication could bring substantial fame. But, to its credit our research team was circumspect about announcing the tea result. Instead of rushing to print, the neurologists, neurophysiologists and neuropsychologists hit the literature to find a physiological rationale for this surprising result. Nothing.
By then, most of us suspected that we were looking at a statistical artifact — we had just happened by chance to select a sample of subjects that included a lot of tea drinkers who also happened to be free of Alzheimer’s disease. The protective effects of tea drinking never reached the general medical community. And sure enough, the effect declined in future studies.
But publishing cherries like this brings the decline effect to light. One group of researchers I knew thought they had discovered a genetic marker for Alzheimer’s disease. Surveying a long list of blood proteins, they found one that was particularly prevalent in Alzheimer’s patients. This shotgun approach is a usually generates bad results. Look in enough corners and chance will come through with a significant result. An added trouble was that the researchers had to delete some data to make the statistics come out so that their desired results came out backing their contention. They published. There was a bit of excitement. And in confirmatory research, the effect disappeared. What had appeared to be a nice ripe cherry had turned out a lemon.
It does not take much imagination to suspect that pharmaceutical companies also indulge in cherry-picking. By chance an early trial shows an outstanding effect for compound X. All research focusses on compound X; negative results are downplayed and positive ones followed up. Compound X goes on to the clinical trial stage.
Polishing the cherries
Even if a particular treatment or preventive measure reaches the clinical trial stage, the design of the trial can artificially enhance a result by using a select group of patients and a select set of outcome measures. Steve Novella described the situation in a recent Skeptics’ Guide to the Universe podcast (13 June 2011).
“There [are]. . . subtleties in how the research is designed. . . . There are lots of choices … as to how to design a study. It’s not always obvious or straight forward. For example, your inclusion and exclusion criteria — what people are you going to study the drug on? . . . . We don’t want people to have too many . . . coexisting conditions or to be on certain kinds of other drugs . . . . But also the outcome measures. We choose the outcome measures that have the best chance of looking positive. [The researchers] may do some preliminary testing where they look at four or five different outcome measures. Then they pick the one that looks really good and they use that in their big trial. So there’s lots of subtle ways to tweak a trial so it looks totally good on paper but the process was all geared towards exaggerating the positive effect of the study. And then when it gets used in the real world on patients with every kind of disorder and other drugs and . . . more real world outcome measures are being used, you can’t expect that in the pristine . . . context of the clinical trial that the effect size is going to be the same.”
As Novella notes, there is nothing necessarily deceitful about this. Clinical trials have to be pristine in order to determine the fundamental efficacy of an intervention; the pragmatic studies that follow determine how effective the intervention is in the real world.
Cherry picking, part two
Here’s a fact that health sciences students learn or should learn in their statistics classes. One out of every twenty ineffective treatments tested by clinical trials turns out to be significantly effective. I won’t go into details here; just know that if the treatment under scrutiny is in fact ineffective, there is a five per cent chance that a carefully-conducted clinical trial of the treatment will render a false positive.
Now consider the fact that there are more than 100,000 clinical trials underway around the world at any time. Let’s be conservative and say that just 10 per cent of these trials (10,000 of them) are testing treatments that are in fact ineffective. Then five per cent of those 10,000 or 500 trials are going to show statistically significant effects.
And those 500 are the ones that get published; the results of the other 9500 never see print. The trouble is, the results of those 9500 are more valuable than the phoney 500. But journal editors don’t like insignificant results.
This phenomenon is called publication bias and it leads to the spread of drugs that appear at outset to be effective, but whose effectiveness vanishes within a year of licensing.
In his Nature article, Jonathan Schooler addresses suggests a solution: register all trials at their outset. Then the nonsignificant results will show up. Schooler says:
“I suggest an open-access repository for all research findings, which would let scientists log their hypotheses and methodologies before an experiment, and their results afterwards, regardless of outcome. Such a database would reveal how published studies fit into the larger set of conducted studies, and would help to answer many questions about the decline effect.”
Notice that Schooler does not restrict his recommendation to medical research.
In fact, medical researchers are already on this and open registries for clinical trials are already on-line. Putting “clinical trial registry” in your favourite search engine should bring up several links.
The causes of effect modification remind me of the statistical phenomenon called regression to the mean, which says essentially that an exceptional event will likely be followed by a more mundane one. So it is with the newest wonder drug — initially exceptional, finally mediocre.