[I]f a rare event (10/1000) is reported by a very reliable witness (80/100), the chances that the rare event happened is closer to its base rate (10/1000) than the accuracy of the reliable witness (80/100)
Psychology of Compliance & Due Diligence Law: "What does a 18th Century Philosopher have to Offer the 21st?"
I didn't do well with statistics in university. Didn't do it at all, really. Which is my loss, because now probability fascinates me: it's the next frontier for reason for most of us.
The column was sparked by another in the Science and Law Blog, "Helping Legal Actors with Bayes’ Theorem," that posed the exact same problem in two different ways, the probabalistic (first) and the frequentist (second):
A disease occurs in 1% of the population, and a test has been developed which has an 80% accuracy rate (i.e., if you have the disease, there is a 80% chance the test will pick it up), and a 9.6% false positive rate (i.e., if you don’t have the disease, there is a 9.6% chance of getting a positive result anyway). Sam tests positive for the disease. What is the probability that Sam has the disease?
A disease afflicts 10 out of 1000 people in the population. For people with the disease, 8 out of 10 will have positive test results. For people without the disease, the test will still (erroneously) yield a positive result 95 out of 990 times. Sam tests positive for the disease. What is the probability that Sam has the disease?
Take a moment now to think and come up with an answer.
Most people get it wrong whichever way the problem is posed. But there are more wrong answers with the probabalistic way of putting the question.
The general inclination is perhaps to say 90.4%, because the false positive rate is 9.6%. This conclusion, however, is wrong because it does not account for the rarity of the disease in the general population. (As doctors are often trained to think, if you hear hoofbeats, think horses, not zebras.) Using Bayes’ Theorem—and here I will spare the reader the mathematical details—one can show that the probability that Sam has the disease is 7.8%. Intuitively, this is because given the rarity of the disease, it is more likely that Sam is actually one of the false positives than one of the people with the disease.
There's a really good extended comment to the Science and Law Blog piece that critiques the analysis — but the point remains that for most of us, this ain't an easy ride, whichever train we take.
To help a bit with the frequentist approach, the Psychology of Compliance etc. blog entry gives us a chart and an explanation that might just tip us further towards understanding:
There are 1000 people, which explains the lower right corner. Since only 1% of the population gets the disease, then the last column must be 10 and 990. Finally, since the test is only 80% accurate, then the left to right diagonal must be 80% of last column. So if the test is positive, column 1, we have 103 individuals of which only 8 have the disease, 7.8% -which is more than 1%, but considerably lower than 80%.