The term artificial intelligence (AI) has been justly criticized for its lack of specificity. Essentially it means anything that we are still impressed that a computer can do, which is, of course, a moving target. The most talked about AI technology is currently machine learning, and this is what is driving the majority of black box systems that are raising concerns in the legal sector.
In this context, black boxes refer to systems that accept inputs and present outputs of various kinds without making it explicit how the decision was reached. Black boxes can occur for many reasons, some technical, and some not. This is of primary concern when discussing decision assistive applications, as other black box systems like e-discovery or legal research software tend to have different stakes for individuals. Some of the technical reasons for black boxes involve the structure of legal data, and some involve the ways machine learning systems work.
The main technical reason for black boxes in law is that most data is structured in a way that limits the ability of machine learning systems to point to the origin of their outputs. The data sources that are most particular to law tend to have little structure: instead they are in the form of free text documents like court decisions, legislation, court dockets, or law firm files. These documents usually have some defined elements and tend to have some consistency for formatting of elements like parties’ or judges names in court decisions, but they don’t tend to be defined in structured ways within documents. Instead they are often set off visually in part of the document, such as a header. This may not be easily parsed by automated systems, especially as they are often not consistent (if you’d like to see what structured case law data can look like, the Supreme Court of Canada website has a feed of case data in JSON format).
This is even more of an issue when looking at documents of different types from different sources. These different formatting choices mean they are analyzed term by term without clear information about what parts correspond to what. There are some sources of data that are more structured, such as the Supreme Court Database from Washington University Law, but these are the exceptions.
These limitations are generally well suited to techniques like “bag of words”. Which is what it sounds like: words are analyzed based on occurance in a document without considering context. Researchers have worked on including more context in their analyses by using word combinations called n-grams instead of single words, but these are limited in what they can say. Generally, tags are required for the criteria for recommendations to be clear.
Going forward, it may be possible to create legal documents in ways that create more tractable data by publishing law in structured formats. This is related to finding ways to design machine readable law. This is possible, but it would require significant research and process changes on the part of lawmakers to integrate this into their systems from the point of drafting. A great many people would have to change their work processes in order to implement a change like this. This is where research will need to be done to make systems that will support the needs of systems now and into the future.
Other reasons for black box law have to do with business decisions. For example, many computing systems are treated as trade secrets and are not shared. Even where research is published, the level of detail required to replicate systems is generally not included. This means systems cannot be replicated and studied so that they can be externally validated.
In addition to these issues, it’s possible that creators may not want to disclose the details of their systems because they may not conform to what would be used in an analogous analogue process. For applications like recommendations for bail and refugee hearings this is a concerning issue, especially if there aren’t clear and accessible options for appeals.
Even where these systems do give data outputs, they are generally presented in probabilistic terms. This can mean that the system outputs indicate that particular outcomes are likely, e.g. a particular defendant has a 63% likelihood of rescinding. When users examine what basis was used to make the recommendation they may get an answer like, this output is 21% driven by variable x and 17% by variable y. Or 19% of the outcomes of a particular system are based on a particular element. This makes the details of a particular output difficult to examine closely.
The elements these recommendations are based on may not be used in ways that make sense to us. It’s interesting to consider how comfortable we are with accepting outputs based on inputs that have been found to be correlated but which are not known to be causative. Correct and defined process is a required component of our justice system, and these systems may not adhere to this process in a way that we would accept from human actors. This raises interesting questions: aspects of people’s lives are correlated with things that happen to them. Do we accept that these things are acceptable to be admitted in an adjudicative process that stands in the place of a court hearing? Especially if these events are also correlated with unjust outcomes and discrimination?
Outside of these personal attributes, machine learning systems have found that aspects that seem like they should be irrelevant to proceedings are actually predictive. What does it mean if things like days between a hearing and the decision being rendered are predictive of litigation outcomes? Do we want to reinforce their position as proxies for things we cannot readily measure?
These are not easy questions to ask or answer. In many cases we are having to take it on faith that the developers of these systems are behaving appropriately and that they are fit for purpose. Over time, I anticipate that it will not be accepted to make decisions in this way, and there will be regulations in place to ensure that they are used appropriately. We will have to balance the positives and negatives and come to compromises that we as a society are willing to accept. Unfortunately, these restrictions may mean that promising avenues for efficiency gains are missed, though I would be remiss not to mention that there are heuristics that would make analogue decision making processes easier that we choose not to use. This is in many ways no different than that.
This column is based on some of the thoughts I had as I prepared to present a session called “Black Box AI: Trust, but Don’t Verify?” with Kim Nayyer and Parminder Basran at the ALM Legal Week on July 14. Thank you to Kim and Parminder for discussing these topics with me and presenting. I’d also like to thank the staff at ALM for organizing the conference and inviting me to speak.