There’s been a kerfuffle in the media recently (March 2012) about a blood test that will with “90 per cent accuracy” predict whether people over 70 will develop Alzheimer’s Disease within three years. You can Google “test Alzheimer” to get the details. Or go to Voice of America, which quotes neuropsychologist Mark Mapstone:
“The biomarker[s] were able to detect at 90 percent accuracy those who would go on to develop the disease. . . .”
On the other hand, Steve Connor in The Independent writes
The test raises ethical concerns, however, as it is only 90 per cent accurate in its current form – meaning that up to one in ten people could be wrongly diagnosed with a disease for which there is no effective treatment.
Both reports acknowledge the limitations of a test for a disease for which we have no cure, but both radically misinterpret what the “90 percent accuracy” means. Ironically, Connor, in attempting to raise the legitimate concern about false positives, gets it dead wrong. The “90 per cent accuracy” refers only to those who will contract the disease; it says nothing about how the tests perform on people who will not get the disease. In fact, there is nothing in either news report that mentions this. Yet when we account for the people who will not get the disease, we find that the accuracy of the test is far from 90 per cent.
How to evaluate a medical test: be specific as well as sensitive
All this is old news to epidemiologists, who are experts in evaluating medical tests. One thing they are clear about is this. No test is perfect, but there are two — not one — measures of just how good a test is. The sensitivity of a test is the proportion or percentage of diseased individuals who come up positive on the test. The 90 per cent that the reports are talking about is the sensitivity of the blood test. But what about the non-diseased people? Do 90 per cent of them register as positive on the test? If they did, this would be a bloody awful blood test. If the test is any good at all, you would like as many as possible of the non-diseased subjects to come up negative on the test. The percentage of non-diseased subjects who register as negative on the test (a good thing!) is called the specificity of the test. The news reports ignore this important statistic.
Why is specificity important? Well, let’s see.
Suppose we have a test with 90 per cent sensitivity and, say, 95 per cent specificity. (There is usually no relation between sensitivity and specificity: they are rarely the same.) Now let’s unleash our test on 1000 people. How many will test positive and how many negative?
There’s a problem here: we don’t know who has the disease and who doesn’t. After all, if we knew that, we would not need the test in the first place. But we often know the prevalence of the disease — what percentage of people in the population have the disease. For example, we know that two per cent of the over-70 population will develop or have developed Alzheimer’s. So about two per cent (or 20) of our 1000 subjects have the disease. This two per cent is also often called the prior probability of disease because the prevalence of disease is often a measure of this probability.
Now, although we don’t know exactly who has the disease, we do know that about 90 per cent of the diseased people will test positive. (That’s what 90 per cent sensitivity means, remember?) S0 0.9 times 20 (that’s 18) of the diseased people will give positive tests. The other two will give (erroneous) negative tests. So far so good.
But 98 per cent of the 1000 (or 980) do not have the disease and of these 95 per cent (the specificity of the test is 95 per cent) will give negative test results (a good thing) and five per cent will give false positive results. Now 95 per cent of 980 is 931. And five per cent of 980 is 0.05 times 980 = 49.
(The astute reader will already be objecting to my cavalier announcement that exactly 18 (and not 16, 17, 19 or 20) diseased subjects will provide positive tests. The astute reader would be correct if we were talking about a single experiment. But I am talking about the big picture: if we ran this study thousands of times, then the average would be 18. The same goes for the contents of the other cells.)
So, to summarize, we have
Diseased subjects — 18 test positive, 2 test negative
Non-diseased subject — 49 test positive, 931 test negative
So what do we see? We see 18 + 49 = 67 positive tests and 2 + 931 = 933 negative tests.
What we don’t see is which of those 67 positive test actually have the disease. If we knew that we would not need the test in the first place. So, do we tell those 67 people that they have a 90 per cent chance of having the disease? After all, the test is 90 per cent accurate, isn’t it?
In fact only 18 of the 67 test-positive people have the disease. So the chance that any one of the 67 has the disease is 18/67 = 0.269 or about 27 per cent. That’s a long way from 90 per cent.
We can summarize all this in a simple table that classifies the various counts I listed above.
The columns of the table count the number of subjects who have the disease (D+) or who are free of the disease (D-). The table rows contain the number of test-positive (T+) or test-negative (T-). For example, at the intersection of the T+ row and the D+ column, we see that 18 subjects were both D+ and T+.
It will help your understanding if you use a spreadsheet to reproduce this table. If you are brave enough, you can organize things so that you simply need to enter the sensitivity, specificity, prior risk of disease and the sample size and let the spreadsheet fill in the table. I list some hints at the end of this post.
Now you can see directly from the table that the predictive value of the test is 18/67. This (and I know I am repeating myself) is the probability that a person in the test-positive (T+) group actually has the disease. Officially, this probability is called the positive predictive value of the test. There is another number you can get from the table and that is the negative predictive value. What do you think that is? Think a bit before you read on.
Reading on . . . . The negative predictive value of the test is the probability that a person with a negative test (T-) actually does not have the disease. For this test it is 931/933 = 0.998. Pretty good. If you come up negative on the test, you can be almost certain that you don’t have the disease.
Here is a modification of the first table that includes calculation of the predictive values.
Doing it all in one fell swoop. Run, math cowards, run!
If you made up your spreadsheet in the way I suggest below, you can fiddle about with the sensitivity, specificity and prior probability of disease to see what makes for the best predictive values. In doing so, you may have noticed that setting the specificity and the prior probability high results in an excellent PPV and setting the sensitivity high and the prior probability low makes NPV high (at the expense of PPV). So what exactly is the relationship between all these parameters? To answer that question, we turn to our old friend, high school algebra.
First, notice what happens when I fix the sample size on my table as 1, rather than 1000.
Although the numbers in the table drop to one thousandth of their former value, the predictive values are unchanged. In fact the numbers in the table now represent probabilities rather than counts. For example, 0.018 represents the probability that a randomly-chosen subject is both test-positive and disease-positive.
Now let’s get down and dirty with the algebra. We first assign letters to the three parameters that determine our predictive values. I usually use Greek letters for this, but rather than set the math cowards into a dither, I’ll use Latin letters. Hell, no. Let them dither. Denote the three parameters as follows.
Sensitivity: . Specificity: . Prior probability: .
With these general terms, the table becomes
In short the formulas for the two predictive values are:
Hm. That formula looks okay to me. I have no idea why Latex did not like it. Anyway, I leave it as an exercise for the reader, as the maths texts say. The upshot of these two formulas is: (1) If you want a high PPV, make sure the specificity (not the sensitivity) is high and (2) If you want a high NPV, make sure the sensitivity is high. Counter-intuitive, no?
And, yes, I did notice that I proved Bayes’ Theorem just now.
Hints for making the spreadsheet table
Start by setting up spaces for the sensitivity, specificity, prior probability and sample size. Set up the blank table. Go to the bottom-most right-most cell of the table (where the 1000 appears in the example) and copy the sample size into it. Now go to the total of the D+ column (where 20 appears) and put in the formula for the total times the prior probability. Simple subtraction gives you the total D- group. Now applying the sensitivity and specificity to these two (D+ and D-) totals will give you the T+D+ and the T-D- cells. You are on your own.