Your doctor looks at you in silence. The results of your medical test: positive. Due to a clerical error at the lab, you were tested for the wrong disease, a relatively rare, terminal disease. A disease that's rare enough that only 1 in 100,000 people have it.
You question your doctor again. He assures you that yes, he has verified the results. After reviewing all the data, he tells you that the test is 99.99% accurate.
What is the likelihood that you actually have this disease?
Most people have an intuition that it's not exactly 99.99% - and that's correct. Maybe your guess is somewhere between 90% to 99%. After all, a 99.99% accurate test sounds like it doesn't leave much room for error.
Actually, that guess is too high. Much too high. It's not anywhere close to 90%.
In fact, it's not even anywhere close to 50%. It's much lower than that.
How is this possible?
To address this question, we first need to be more specific about what the accuracy of a test means. There's four possible outcomes to the test:
- Test is positive, and the patient is sick: This is a correct diagnosis.
- Test is negative, and the patient is sick: False negative.
- Test is positive, and the patient is healthy: False positive.
- Test is negative, and the patient is healthy: This is a correct diagnosis.
Let's visualize this as a 2x2 table:
|Test positive||False positive||True positive|
|Test negative||True negative||False negative|
Now comes the part that is confusing for most people and the root cause of this fallacy.
Let's assume that the false positive and the false negative rate is the same (99%). This is not true in most tests, but it simplifies the discussion a bit and doesn't change the main point of the explanation.
With this setup, the test accuracy says: given the population of sick people, how many times will the test show positive. Or similarly, given the population of healthy people, how many times will the test show negative - in other words, the columns in the table above. We'll call this "chance of a positive test when ill."
But that's not actually what we're interested in. You don't know if you're sick or healthy, since that's what you're trying to determine from the test. So what you're actually interested in is: given the population of people who test positive, how many of them are sick. Or similarly, given the population of people who test negative, how many of them are healthy. These are the rows in the table above, NOT the columns. We'll call this "chance of illness when positive test." This is different from the test accuracy!
To make this more concrete, let's fill in the table with the numbers from the example:
- Assume a population size of 100,000.
- The base rate is 1 in 100,000 (so there's 1 sick person in the population).
- Both the true positive and true negative rate is 99.99%
Then we get:
Then from this table, it's easy to see that if we tested this entire population, we would find 10 sick people, and only 1 of them would actually be sick, yielding a percentage of 10%.
Why does this happen? Part of the reason is that the test accuracy doesn't measure the actual value you're interested in, as described above. The other part is that the base rate is already so low, that even a highly accurate test is catching mostly false positives. Even with an accurate test, the final confidence level isn't anywhere close to the accuracy of the test.
The mathematics of this can be formalized in Bayes theorem. Here we just want to get the intuitive idea: A test result doesn't tell you the odds of something happening; instead, it shifts the odds from your previous belief. So if something was "very rare" to begin with, even a very accurate test might only move the odds to "rare" instead of all the way to "likely."
This leads to the question: how do you find the base rate?
In some cases, you may be able to find the base rate by using other measurements. For example, for diseases, you might be able to observe the actual occurences over an extended period of time, and use that to compute the base rate. In other cases, you end up having to use expert judgment. This is typically embodied by how confident you are in the original diagnosis, or how many other plausible alternatives you can think of.
Even if you are off in your base rate estimate, with enough independent tests, you'll eventually converge on the right answer. This can get expensive though and so you do want to try to have as accurate a base rate as possible. In medicine this can take the form of family history, recent travel, etc, to try to get an accurate assessment of how likely you are to have a disease.
This fallacy, or variants of this fallacy, turns up in all sorts of places. For example, here is a version from criminal justice, called the "prosecutor's fallacy."
In crime investigations, fingerprints are often used to identify potential suspects. Let's say you're at a crime scene, and you've found a fingerprint. You're able to narrow down the perpetrator to a limited geographic range around the scene of the crime, yielding a total of 5,000 possible suspects. Through the magic of a hypothetical thought experiment, you compel every candidate to come in to give their fingerprint.
You get lucky and you get a match on the print. How likely is it that they are the perpetrator of the crime?
First we need to know how accurate the fingerprint test is. We'll use the numbers from the NIH study :
- False positive rate of 0.1%
- False negative rate of 7.5%
The prosecutor's fallacy would say that since the false positive rate is 0.1%, the positive test means that the suspect was 99.9% likely to have actually committed the crime (or at least, something close to this amount). Therefore this suspect must be guilty.
But this is another example of the base rate fallacy. We can use an analysis similar to the above to find the true rate of the crime. First we need to estimate the base rate. Since no other evidence has been presented to make one suspect more likely than any other, we'll assume that all people in this geographic region were equally likely. With 5,000 suspects, we get a base rate of 1 in 5,000 or 0.02%.
This gives us:
So this match only gives you a 20% chance of guilt, not a 99.9% chance. (Incidentally, it is also incorrect to use this evidence to argue that this test doesn't implicate the suspect. The suspect may still not be "likely" to commit this crime. But prior to the test, the suspect was only 0.02% likely; now they're 20% likely. This erroneous argument is called, appropriately enough, the defense attorney's fallacy.)
You can get a sense of the impact of different base rates by trying the calculator. As the false rates climb further above the base rate, the predictive value gets progressively worse.
Base rate: %
False positive rate: %
False negative rate: %
Chance of sickness with positive test:
-  Ulery BT, Hicklin RA, Buscaglia J, Roberts MA. Accuracy and reliability of forensic latent fingerprint decisions. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(19):7733-7738. doi:10.1073/pnas.1018707108.
- The p-value and the base rate fallacy - A graphical example of the fallacy, along with some discussion about its impact on hypothesis testing
- Bayes' Rule: Guide - Explanation of Bayes rule with varying technical depth for different backgrounds