Using Artificial Intelligence for Demeanour Evidence
Demeanour evidence holds a controversial role in evidence law. Centuries of common law have allowed trial judges to assess the behaviour, conduct, and mannerisms to make findings of credibility. Often these findings can be useful to judges, especially when the only evidence available on crucial determinations of fact is viva voce testimony from each side.
In “Relying on Demeanour Evidence to Assess Credibility during Trial – A Critical Examination,” Amna Qureshi provides some background on the use of demeanour evidence,
The fact that trial judges can and do assess credibility based on demeanour during trial has also been held as an important reason for why appellate courts show considerable deference to the findings of a trial judge.
In R v Belnavis the Supreme Court stated that “the reasons for this principle of deference are apparent and compelling. Trial judges hear witnesses directly. They observe their demeanour on the witness stand and hear the tone of their responses. They therefore acquire a great deal of information which is not necessarily evident from a written transcript, no matter how complete.”
[citations omitted]
The inability to assess demeanour evidence was centrally important to determining the impact on trial fairness, including the ability of counsel to scrutinize and cross-examine the witness. In evaluating the allegations of sexual harassment by a lawyer, the Ontario Court of Appeal in Law Society of Upper Canada v. Neinstein reviewed the use of a Hearing Panel of demeanour evidence to conclude that the complainant gave evidence in a forthright and honest manner and withstood cross-examination well. The court ordered a new hearing, stating,
[66] …There is no insight provided as to why the Hearing Panel found [the Complainant] to be “forthright” and no indication of why it concluded she “withstood” cross- examination. Bald generalized assertions defy appellate review. Furthermore, while demeanour is a relevant factor in a credibility assessment, demeanour alone is a notoriously unreliable predictor of the accuracy of evidence given by a witness: see R. v. G. (G.), 1997 CanLII 1976 (ON CA), [1997] O.J. No. 1501, 115 C.C.C. (3d) 1 (C.A.), at pp. 6-8 C.C.C.; R. v. P-P. (S.H.), 2003 NSCA 53 (CanLII), [2003] N.S.J. No. 171, 176 C.C.C. (3d) 281 (C.A.), at paras. 28-30.
Demeanour evidence was most thoroughly evaluated in recent years in the Supreme Court of Canada case of R. v. N.S., where a sexual assault complainant wanted to testify while wearing her religious face covering (a”niqab”). The underlying decision in N.S. by the Ontario Court of Appeal highlighted some of the problems with demeanour evidence and its impact on trial fairness,
[55] Mr. Butt, counsel for N.S., makes the valid point that credibility assessments based on demeanour can be unreliable and flat-out wrong. Assessments of credibility based on demeanour can reflect cultural assumptions and biases. Judgments based on demeanour are no substitute for those based on a critical analysis of the substance of the entire evidence. Appellate courts have repeatedly cautioned against relying exclusively or even predominantly on demeanour to determine credibility. Mr. Butt also makes the valid point that the trier of fact does not lose all aspects of demeanour evidence if the witness wears a niqab. The trier of fact will still be able to consider the witness’s body language, her eyes, her tone of voice and the manner in which she responds to questions. All are important aspects of demeanour.
[56] It is, however, undeniable that the criminal justice system as it presently operates, and as it has operated for centuries, places considerable value on the ability of lawyers and the trier of fact to see the full face of the witness as the witness testifies. Appellate deference is justified to a significant extent on the accepted wisdom that trial judges and juries have an advantage over appeal judges in assessing factual questions because they, unlike appeal judges, have seen and heard the witnesses: R. v. M. (R.E.), 2008 SCC 51 (CanLII), [2008] 3 S.C.R. 3, [2008] S.C.J. No. 52, at p. 22 S.C.R. Similarly, the principled approach to the admission of hearsay evidence recognizes the value, insofar as the assessment of reliability is concerned, in the trier of fact’s ability to observe the witness’s demeanour as the witness made a statement which is proffered as evidence of its truth: see R. v. B. (K.G.), 1993 CanLII 116 (SCC), [1993] 1 S.C.R. 740, [1993] S.C.J. No. 22, at pp. 763-64 S.C.R.
[57] A witness’s appearance while testifying, in addition to assisting in assessing the witness’s credibility and providing non-verbal cues to assist the cross-examiner, may also further cross-examination in other ways in certain specific cases. There may be cases where the identity of the witness is an issue. In those cases, the opportunity to look at the witness may be essential to [page183] identifying the witness, which, in turn, may be crucial to effective cross-examination of that witness or to other aspects of the defence. For example, a witness may claim to have spoken to an accused at a certain time and place. Seeing the face of the witness may allow the accused to indentify the witness in a way that will assist the accused in the cross-examination and may perhaps also provide some explanation as to why the witness might fabricate or be mistaken about the content of the conversation.
The Supreme Court in this case affirmed the ability of trial judges to use demeanour evidence, but their conclusion was highly qualified based on the record before them,
[17] We have no expert evidence in this case on the importance of seeing a witness’s face to effective cross-examination and accurate assessment of a witness’s credibility. All we have are arguments and several legal and social science articles submitted by the parties as authorities.
[18] M—d S. and the Crown argue that the link is clear. Communication involves not only words, but facial cues. A facial gesture may reveal uncertainty or deception. The cross-examiner may pick up on non-verbal cues and use them to uncover the truth. Credibility assessment is equally dependent not only on what a witness says, but on how she says it. Effective cross-examination and accurate credibility assessment are central to a fair trial. It follows, they argue, that permitting a witness to wear a niqab while testifying may deny an accused’s fair trial rights.
…
[26] Changes in a witness’s demeanour can be highly instructive; in Police v. Razamjoo, [2005] D.C.R. 408, a New Zealand judge asked to decide whether witnesses could testify wearing burkas commented:
. . . there are types of situations . . . in which the demeanour of a witness undergoes a quite dramatic change in the course of his evidence. The look which says “I hoped not to be asked that question”, sometimes even a look of downright hatred at counsel by a witness who obviously senses he is getting trapped, can be expressive. So too can abrupt changes in mode of speaking, facial expression or body language. The witness who moves from expressing himself calmly to an excited gabble; the witness who from speaking clearly with good eye contact becomes hesitant and starts looking at his feet; the witness who at a particular point becomes flustered and sweaty, all provide examples of circumstances which, despite cultural and language barriers, convey, at least in part by his facial expression, a message touching credibility. [para. 78]
[27] On the record before us, I conclude that there is a strong connection between the ability to see the face of a witness and a fair trial. Being able to see the face of a witness is not the only — or indeed perhaps the most important — factor in cross-examination or accurate credibility assessment. But its importance is too deeply rooted in our criminal justice system to be set aside absent compelling evidence.
Since then, the Ontario Court of Appeal has expressed some caution about the use of demeanour evidence, stating in R. v. Rhayel,
[85] Cases in which demeanour evidence has been relied upon reflect a growing understanding of the fallibility of evaluating credibility based on the demeanour of witnesses. It is now acknowledged that demeanour is of limited value because it can be affected by many factors including the culture of the witness, stereotypical attitudes, and the artificiality of and pressures associated with a courtroom. One of the dangers is that sincerity can be and often is misinterpreted as indicating truthfulness.
[citations omitted]
Here, the trial judge’s review of the videotaped statement for the truth of its contents, and attaching undue weight to the demeanour in the video, was a reviewable error that allowed a successful appeal.
Similarly, the general assertion of credibility by a trial judge in R. v. Hemsworth of how a criminally accused individual should present, without any baseline of how the accused would normally speak, was considered troubling and a reviewable error by the Court of Appeal, especially given the undue emphasis placed demeanour in assessing credibility.
Although the Supreme Court lacked expert evidence in N.S., other developments in science, and in particular through the use of artificial intelligence, demonstrate even greater deficiencies with the manner that demeanour evidence has been used by our courts. Dr. Kang Lee of the Dr. Eric Jackman Institute of Child Study Applied Psychology and Human Development OISE/University of Toronto highlighted these developments at the keynote for Part II of IP Osgoode’s Bracing for Impact: The AI Challenge conference series.
Lee noted that up to 93% of all emotions when lying are not expressed physically, through body language, facial expressions, or other gestures. What this means is that what courts have been relying on as outward physical demeanour evidence for centuries is only a fraction of what is actually going on inside of a witness’ mind.
What Lee’s research has found is that emotions when lying can be far better detected using what he describes as “Transdermal Optical Imaging,” which looks at the blood flow patterns in a subject’s face. The technology detects the hemoglobin and melanin in the blood concentrations, discards the latter using machine learning, and is able to observe how changes in the blood are reflected during an emotional response.
Most of Lee’s research has emerged and focuses on children, and he is quickly expanding his applications to all other populations. In a TEDTalk in 2016 points out that social workers, child protection lawyers, and even judges, are basically at 50/50 or chance for detecting whether a child is lying.
In a TEDTalk in 2016, Lee explains further how the technology works,
We know that underneath our facial skin, there’s a rich network of blood vessels. When we experience different emotions, our facial blood flow changes subtly. So by looking at facial blood flow changes, we can reveal people’s hidden emotions. We have developed a new imaging technology we call transdermal optical imaging. To do so, we use a regular video camera to record people when people experience various hidden emotions. And then, using our image processing technology, we can extract transdermal images of facial blood flow changes. And using this technology, we can now reveal the hidden emotion associated with lying and therefore detect people’s lies with an accuracy at about 85 percent.
What this means is that science has already disproven the ability of humans to accurately assess demeanour, even when operating in specialized fields like law. It also means that the science is already superior to detecting lying as compared to our human counterparts.
Does this mean that assessments of credibility should be replaced by technology? There are several reasons to suggest why at present it should not, and why at least in the short term, it won’t.
As with most machine learning and artificial intelligence applications, there is still a long way to go to completely remove all biases. The melanin information removed through Transdermal Optical Imaging is likely highly variable across different ethnicities, and an absolute level of confidence that this does not impact findings would be necessary.
Most affective and behaviour assessment technology works best with individualized baselines. It’s the deviation from these baselines, much like with a polygraph, that gives rise to reactions worthy of greater scrutiny. However, how people emotionally react to different stressors is also highly variable across cultures.
In an article in Integrative Medicine Research, Nangyeon Lim looks at differences between individualistic and collectivist cultures, and the activation of the autonomic nervous system in different societies. High levels of arousal of emotions amplify the nervous system, and are based on individual perceptions during the affective experience. Lim concludes,
Emotional arousal is a fundamental and important dimension of affective experience, along with valence. Findings consistently support cultural differences in the levels of emotional arousal between the West and the East. Westerners value, promote, and experience high arousal emotions more than low arousal emotions, whereas the vice versa is true for Easterners.
Finally, the technology only provides accuracy of 85%, which is still far superior than humans, but will inevitably provide false positives for lying. Whereas the false positives already used by humans who are the trier of fact are an accepted practice, the introduction of a known and discernible error rate is likely unpalatable for the justice system.
Qureshi provides an overview of the continued overconfidence of the abilities of judges, juries and lawyers to determine credibility, which results in a general reluctance to accept social science research,
We have seen this in many instances such as false confessions, eyewitness identification, child witnesses and battered women syndrome where it took considerable time for the courts to recognize the problems with current approaches despite the increasing breadth of scientific research on the subjects. Some have argued that the reasons for the general reluctance by the legal system to learn from the world of social science is an attempt to defend the justice system from being “usurped by outside experts” or “dehumanized.”
In 1987 for instance, when the Supreme Court ultimately rejected the polygraph as an admissible form of evidence, they focused not on the unreliability of the tool in determining truth and lies but instead on the potential for it to usurp the functions of the triers of fact. In this case, Justice McIntyre for the majority stated that “it is a basic tenet of our legal system that judges and juries are capable of assessing credibility and reliability of evidence” and stated further that no expert evidence was required on this point. More recently in R v Marquard, Justice McLachlin (as she then was) made similar comments:
“a judge or jury who simply accepts an expert’s opinion on the credibility of a witness would be abandoning its duty to itself determine the credibility of the witness. Credibility is a matter within the competence of lay people. Ordinary people draw conclusions about whether someone is lying or telling the truth on a daily basis.”
[citations omitted]
Retaining “human” control of findings of credibility within the legal system is likely a good thing, but it is only useful if those making those findings are fully aware of the fallibilities in doing so. For the judiciary, this can more readily come about through further training, but for lay persons of the jury, this proves incredibly challenging, especially as the science increasingly demonstrates the lack of competency in this regard.
While the use of Transdermal Optical Imaging is unlikely to find its way into courtrooms any time soon, there are a whole host of other applications for which it might be used, including client intakes and witness preparation. Because this technology is now being extended to pre-recorded video footage that does not use specialized equipment, there might be other applications for it as well, including public statements by parties to a litigation.
Although videotaping of examinations for discovery are still rare in Canada, Ontario’s Rules 34.19 and 36.01 of the Rules of Civil Procedure allow for this on consent or by order of the court. The machine learning findings of these tapes need not be admissible for their utility for counsel, who may simply use it for identifying statements worthy of further examination or scrutiny.
The best part of Lee’s Transdermal Optical Imaging is the manner in which the blood flow manifests itself on people who are not telling the truth. The blood flows for these people decreases in the cheeks, and increases in the nose.
He calls it the “Pinocchio Effect.”
Computers are not intelligent. They are merely very fast processors of data and thus limited in their utility.
Bradley,
The computers we are talking about today are not what we were talking about a generation ago.
They do not just process data more quickly than we do. They are capable of finding patterns and drawing conclusions with far greater precision and accuracy than the human mind.
I’ll leave aside what “intelligence” is and how it is defined.
What I will do is point to Thomas Friedman recently in The Times, who notes that artificial intelligence is still best used in conjunction with human intelligence. In fact if you’re programming or interacting with the artificial intelligence, your own skills develop as well.
This is what I allude to in the piece above, which is the use of tools such as this to illustrate where the assumptions we have made about demeanour evidence are categorically false (the studies have proven our limited human minds are ineffective, and therefore of limited utility), and better refine our own complex assessments of credibility.