One of the substantial concerns about the implementation of artificial intelligence (AI) in the legal space is about bias, and evidence has shown that this concern is warranted. Given the urgency of this topic as these systems are being sold and deployed, I was happy to be able to speak about it at the Canadian Association of Law Libraries Conference in May and the American Association of Law Libraries in July. Here are some of my thoughts on AI that may not have made it into the presentations.
First some discussion of AI itself — while it’s fun to talk about AI broadly, it is helpful to break down what kinds of technologies people are generally talking about when they discuss AI. Essentially, there are two types:
- The first runs complex statistical analyses and makes inferences and predictions based on input data. It is based on past activity and has assumptions that can be played with to explore different ways of predicting outcomes in the future.
- The second kind uses computer programs to run over data and draw their own conclusions. The input data can be in different formats including numerical or textual sources. This type is called “self learning” and requires less data than the first kind.
The economic impact of AI is expected to be primarily felt in the way we perceive and value decision making, because AI is often used to match patterns in human decision making to make decisions in similar situations. Like the introduction of spreadsheets, which made bookkeeping cheap and efficient, AI is expected to reduce the effort and cost of decision making. Decision making in rare or uncommon situations is another matter: “AI cannot predict what a human would do if that human has never faced a similar situation.” Machines are and will continue to be bad at predicting rare events. While automating decision making won’t eliminate all jobs, its economic impact is likely to change them: a school bus driver might not drive an autonomous bus, but someone will still be needed to supervise and protect the children in it.
These are still speculation, because the technology and its implementation have not caught up to people’s ideas of what might happen. In the legal space the primary data source being used for AI is free text in the form of written prose, which is drawn from sources like court judgements, legislation, and other legal writing such as commentary or court filings. AI systems are not capable of understanding complex meaning and extracting facts from text yet. It is, however, starting to be effective with text at the length of a paragraph, and does well with sentences and phrases, but it is incapable of analyzing the full text of a long document and drawing conclusions. This can best be discussed in the context of machine translation: it is getting better, but it still works segment by segment or sentence by sentence and can’t understand language beyond that level. You can’t run AI over text and teach it to speak, but there have been advances in having it compose formal forms like weather reports, and it can start to identify what a document is about.
AI systems quickly run into the issue that the way things were decided in the past is not a great way to make decisions in the future. If this technology was adopted in the 1960s women still wouldn’t be going to university much or getting mortgages, and we still wouldn’t have diverse artists or lawyers: “Big data processes codify the past. They do not invent the future.” These limitations are compounded in the legal space by the high stakes that can be attached to the way they are used and the nature of the data that is being used to train the systems.
There are two primary sources of bias in AI systems. The first arises when the dataset being used doesn’t represent the full underlying population, so the insights derived may be mathematically incorrect. This can be thought of in law by considering the choice to use court judgements to describe likely outcomes for particular legal issues. Court judgements leave out considerable amounts of information, as any issue where the parties settled, that was resolved in mediation, or that was issued orally and won’t be included in the set. There are mathematical ways to try to resolve this issue, but there continues to be the risk that the insights derived aren’t correct, because only a subset of outcomes is included.
The other main source of data encountered when using AI is when the data may represent the full population, but the dataset as a whole is biased. This means that AI may reinforce existing discrimination and unfairness, consider how important fair results are for situations like justice in hiring, bail decisions, and immigration hearings. This bias is not a mathematical error, but there are ways to reduce the error such as truncating the data by removing dimensions such as gender or race. An alternative is to add more data into the system if outcomes are not reflecting the backgrounds of different groups, for example, if people of African ancestry are more likely to have bail denied, or women are less likely to be hired for technical roles. Adding these parameters can be a way to show that different groups have different experiences and markers for outcomes. It could show that early interactions with police for African Americans are less predictive of reoffending than they are for white offenders, or that women have different backgrounds that point to success when working with technology than men.
Artificial intelligence may amplify patterns in data — a finding that in 70% of cases humans make decisions a certain way may lead to an automated system making recommendations that way 100% of the time. This means that the source of the bias is in the data, but we don’t want bias based in human behaviour in the AI systems. In the COMPAS recidivism system, police are given the option to collect data in different ways. They can ask defendants to fill in a questionnaire themselves, they can take down answers to prewritten questions verbatim, or they can have a guided conversation. These results can be biased based on the attitudes of the interviewers who can affect the outcomes disproportionately through their interview style.
Really considering what constitutes fairness is important for successful deployment of these systems. Probability of outcomes for individuals in a particular moment is not the same as probability of outcomes for individuals in society. In a recidivism system the differences in experience leading up to that moment including differences in experience with policing and the justice system, mean that identical outcomes for people being entered into the application may still not be fair, as the sets of people being considered in the process is biased.
Tianhan Zhang, one of the experts in AI I spoke to in preparation for my presentation had not considered the use of AI systems in the legal space before. His response was that the use of AI for applications like recidivism prediction “sounds scary”, and his response to the use of AI in case prediction was that it “sounds like a waste of money”. That said I think there are promising applications for AI in the legal space, and I hope people will continue to explore them to make systems work better, but we need to be careful about how we use them and query the technology and what data is going into it in a sophisticated way.
I would like to thank Kim Nayyer for asking me to talk on this subject and moderating the sessions; Tianhan Zhang, Marc-André Morissette, and Chiah (Ziya) Guo for speaking with me about this topic before the presentations; Randy Goebel, Pablo Arredondo, Khalid Al-Kofahi, for presenting with me; and Marcelo Rodríguez for suggesting the topic. It has been an interesting experience working with you all on this.
 Ajay Agrawal, Joshua Gans, and Avi Goldfarb, Prediction Machines: The Simple Economics of Artificial Intelligence, Harvard Business Review Press, 2018, p. 99.
 Ibid, 149.
 Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, reprint edition, Broadway Books, 2017, p. 204.
 Sara Wachter-Boettcher, Technically Wrong: Why Digital Products Are Designed to Fail You, WW Norton, 2017, p. 126.