An Excursis Into Bayes’ Theorem… More or Less

[I]f a rare event (10/1000) is reported by a very reliable witness (80/100), the chances that the rare event happened is closer to its base rate (10/1000) than the accuracy of the reliable witness (80/100)
Psychology of Compliance & Due Diligence Law: “What does a 18th Century Philosopher have to Offer the 21st?

I didn’t do well with statistics in university. Didn’t do it at all, really. Which is my loss, because now probability fascinates me: it’s the next frontier for reason for most of us.As has been discussed a fair bit lately, thanks to potential catastrophes such as terrorist acts and incoming asteroids or bird-borne viruses, we human beings tend to do the Chicken Little a tad to quickly, not being good at thinking things through when it comes to acting now in relation to a possible negative future. See an interesting and amusing comment in the Times Online “Terrorism: let’s do the maths“. For something more extended, look at Risk, “A False Sense of Insecurity” [pdf] by John Mueller. So I was intrigued by the recent post quoted above.

The column was sparked by another in the Science and Law Blog, “Helping Legal Actors with Bayes’ Theorem,” that posed the exact same problem in two different ways, the probabalistic (first) and the frequentist (second):

A disease occurs in 1% of the population, and a test has been developed which has an 80% accuracy rate (i.e., if you have the disease, there is a 80% chance the test will pick it up), and a 9.6% false positive rate (i.e., if you don’t have the disease, there is a 9.6% chance of getting a positive result anyway). Sam tests positive for the disease. What is the probability that Sam has the disease?

A disease afflicts 10 out of 1000 people in the population. For people with the disease, 8 out of 10 will have positive test results. For people without the disease, the test will still (erroneously) yield a positive result 95 out of 990 times. Sam tests positive for the disease. What is the probability that Sam has the disease?

Take a moment now to think and come up with an answer.

Most people get it wrong whichever way the problem is posed. But there are more wrong answers with the probabalistic way of putting the question.

The general inclination is perhaps to say 90.4%, because the false positive rate is 9.6%. This conclusion, however, is wrong because it does not account for the rarity of the disease in the general population. (As doctors are often trained to think, if you hear hoofbeats, think horses, not zebras.) Using Bayes’ Theorem—and here I will spare the reader the mathematical details—one can show that the probability that Sam has the disease is 7.8%. Intuitively, this is because given the rarity of the disease, it is more likely that Sam is actually one of the false positives than one of the people with the disease.

There’s a really good extended comment to the Science and Law Blog piece that critiques the analysis — but the point remains that for most of us, this ain’t an easy ride, whichever train we take.

To help a bit with the frequentist approach, the Psychology of Compliance etc. blog entry gives us a chart and an explanation that might just tip us further towards understanding:

There are 1000 people, which explains the lower right corner. Since only 1% of the population gets the disease, then the last column must be 10 and 990. Finally, since the test is only 80% accurate, then the left to right diagonal must be 80% of last column. So if the test is positive, column 1, we have 103 individuals of which only 8 have the disease, 7.8% -which is more than 1%, but considerably lower than 80%.

Fun, no?


  1. I am glad that you found the explanation helpful.

    The two by two table that you printed should become a common tool for the general public when trying to understand probabilistic claims of covariation, correlation, and causation.

    To know whether there is a real correlation between two items, it is important to know the values of all the entries. It is a common error to focus only of the top left corner’s large value. The correct question is: Given a state of nature, the columns, is an event, a row element, more or less likely than some other state of nature, another column. These calculations can be done easily after completing the 2 X 2 table.

    We cannot rely upon our pre-theoretic intuitions about covarations because they are so badly formed.