Thursday Thinkpiece: Findlay and Chalifour on Probability

Each Thursday we present a significant excerpt, usually from a recently published book or journal article. In every case the proper permissions have been obtained. If you are a publisher who would like to participate in this feature, please let us know via the site’s contact form.

Science Manual for Canadian Judges
National Judicial Institute / Institut national de la magistrature / Chapter 2: Scott Findlay and Nathalie Chalifour. “Science and the Scientific Method”
Ottawa: National Judicial Institute, 2013

© National Judicial Institute 2013. All rights reserved. Permission to reproduce is granted for the purposes of research or private study. For further reproduction privileges, please contact the NJI.

(Excerpt: Chapter 2, at pages 64-67)


As noted in The Logical Relevance of Expert Scientific Opinion (at p. 46), the Mohan and Abbey criterion of logical relevance requires judges to evaluate scientific relevance with respect to a particular scientific hypothesis. The scientific relevance of expert (notionally scientific) evidence is determined by the extent to which, in the opinion of the trier of fact, the proffered evidence changes the probability that the scientific hypothesis under consideration is true (or, conversely, false). The greater the change in this probability, the greater the relevance and probative value that can be assigned to the adduced evidence.

Using the example of R. v. Thief (see p. 61), one could imagine two different studies that might be introduced as evidence by the defence. In a first study, subjects who are assigned only to one recall time treatment are asked to recall general spatial patterns (not specifically facial features), and no attempt is made to verify that subjects allotted shorter recall times are actually more stressed than those given longer times. By contrast, a second study provides each subject with a wide range of facial images designed specifically to evaluate different dimensions of facial recognition (skin colour, hair colour, ethnicity, facial shape, etc.), each subject is tested over the full range of recall time periods, and a blood sample is taken from which serum concentrations of several well-described stress hormones are measured.

Before each study is conducted (that is, a priori), the hypothesis is as likely to be true as it is to be false. Once the results are in, we can ask: How likely is it now that the hypothesis is true? In the first study, the low a priori inferential strength of the study design means that this probability will not be much different from the a priori value of 0.5 because any result will be rather equivocal owing to limitations in the experimental design. By contrast, in the second study, any result will be less equivocal, i.e., the estimated a posteriori probability will be closer to 0 (so, we are quite sure the hypothesis is false) or 1 (so we are reasonably convinced it is true). Thus, the (absolute) difference between the a priori and a posteriori probabilities will be greater in the latter case than the former. Or in other words, the second study has considerably higher scientific relevance, and hence, probative value.

The above discussion highlights the need for judges to have a reasonably firm grasp of the concept of probability, at least as it is employed in the testing of scientific hypotheses. Probability is a superficially simple concept for which two principal interpretations are employed in science: frequentist and Bayesian. In elementary statistics, one invariably is taught the frequentist interpretation of probability. Under this interpretation, the probability of some ‘‘event’’ or ‘‘outcome’’ or ‘‘result’’ is the long-run frequency of that event relative to other possible outcomes. ‘‘Long run’’ means the repetition of the same experiment under the same conditions a large number of times.

For example, in an experiment in which we roll a (fair) die, there are six possible outcomes, and the long run probability of rolling, say, a six, is one in six, or 1/6. Note that for a small number of experiments, the frequency of rolling a six, even if the die is indeed fair, need not be 1/6. For example, if we roll a fair die six times, it is entirely possible that none of the rolls will produce a six, in which case the observed frequency is 0/6 = 0, not 1/6. This is why under the frequentist view, probability is considered the ‘‘long run’’ frequency of an outcome or result based on a large number of independent experiments.

Under the Bayesian interpretation, the probability of an outcome is a measure of our belief that, in the experiment in question, a particular outcome will result. Thus, Bayesian probability is interpreted as a measure of the current state of knowledge. So, in the context of a dice-rolling experiment, the Bayesian probability associated with the event of rolling a six is, in essence, my belief that on this roll, the outcome will be a six.

Judges are ultimately concerned with the probability that a scientific hypothesis is true (or false) in the situation before the court. Expert evidence will be introduced in an attempt to convince the court that it is true (to some standard of proof) or, alternatively, false. To support one or the other contention, probability estimates of various sorts will be presented. What are the implications of different approaches (frequentists versus Bayesian) to the interpretation of these estimates?

For frequentists, an hypothesis is either true or false — there is no probability about it. Rather, the probability that is estimated under a frequentist interpretation is the probability of obtaining the observed data, given the specified hypothesis. We might hypothesize that the die is fair, and ask: What is the probability of getting four sixes in 10 rolls if the die is fair? This probability (0.054) is in fact rather small, which would lead us to question the validity of the hypothesis.

By contrast, under a Bayesian interpretation we ask: What is our belief (as quantified by probability) that the die is fair, given the observed results? To calculate this probability, we must first specify our belief that the die is fair before we conducted the experiment. That is, the Bayesian probability that the hypothesis (viz., that the die is fair) is true, given the experimental results, is estimated with reference to the prior probability, i.e., the probability of the hypothesis being true before the study was undertaken (so, before any results are known). Under a Bayesian approach, as more experimental tests of an hypothesis are conducted, the prior changes (i.e., is updated) to reflect our changing belief.

To return to the dice-rolling experiment, under a Bayesian approach, for the first experiment to test whether the die is fair (the hypothesis), it is reasonable to set the prior at 0.5, i.e., in the absence of any information whatsoever, there is an equal chance that my hypothesis is true or false. Suppose that in the first 10 rolls, I roll four sixes. For the 11th roll, now the prior is substantially less than 0.5 because the chance of rolling four sixes in 10 tries, given the die is fair, is rather small. Thus, the estimated probability of the hypothesis being true, given the results of the 11th experiment is very different because of constant updating of the prior based on the results of previous experiments. In other words, while I might initially have believed that the die was fair, after 10 rolls of which four turned up a six, I am now rather skeptical.

Given the same results (viz., rolling four sixes in 10 throws), we have two probability estimates: one (Bayesian) based on an initial prior, which gives the probability that the die is fair, given the results, and another (frequentist) that gives the probability of the results, given the die is fair.

These estimates are not the same, for two reasons. First, in the Bayesian case, the estimated probability depends on the prior initially chosen; change the prior, and the estimated probability changes. Second, while both probabilities are conditional (probability of the die being fair, given the results (Bayesian); probability of results, given the die is fair (frequentist)), they are nonetheless generally different. This fact alone means that they may differ dramatically.

To see this, consider swans, most species (and most individuals) of which are white. Therefore, if I know a bird is a swan, there is a very high probability that it is white, i.e., the probability that the bird is white, given it is a swan, is high. But the probability that it is a swan, given it is white, is in fact quite low, as there are many species of birds that are white but are not swans.

The issue of which interpretation (Bayesian or frequentist) is more appropriate in a given situation is not primarily a scientific question. Both interpretations are reasonable and have their underlying logic. But there are least two points of which judges should be aware.

  1. The estimated probability of an event or hypothesis under a frequentist interpretation of probability may differ dramatically from that estimated under a Bayesian approach. For example, in forensic DNA cases, frequentist and Bayesian estimates of the so-called ‘‘random match probability’’ may differ a million-fold or more (see the courtroom example People v. Puckett at p. 68).
  2. Scientific witnesses are often asked to give probability estimates for certain events, outcomes, or hypotheses, such as the probability of a DNA match or the probability that a convicted offender will reoffend if granted parole. Given the potentially large differences in estimated probabilities based on frequentist versus Bayesian interpretations, witnesses should be explicit not only about the set of results from which the estimate is derived, but also about which interpretation is being used to generate the probability estimate. When Bayesian estimates are given, the witness should provide a clear statement of the prior probability employed in the estimate, and its justification.

Read more… (Please refer to Appendix 2 at p. 129)

Contrasting Frequentist and Bayesian Probabilities
Frequentist and Bayesian Probabilities in Forensic DNA Profiling


12 For a lucid and concise description of a number of statistical misconceptions, several of which involve the concept of probability, see Jonathan J. Koehler, “Misconceptions about Statistics and Statistical Evidence” in Richard L. Wiener and Brian H. Bornstein (eds), Handbook of Trial Consulting: A Psychological Perspective (New York: Springer, 2010) at 121-136.


  1. David Collier-Brown

    That’s really quite brilliant: not just for the Judges among your friends, but for anyone who has to make decisions based on someone else having done the science