Column

New Panic Over Old Mistakes: Judicial Sanctions and Hallucinated Citations

In the midst of the ongoing concerns about hallucinations, particularly related to citations in documents filed with courts, I wonder if the particular focus on AI generated errors, and the penalties that have been imposed in response, are at least partly due to perceptions of these tools as cheating or aesthetic ideas about how “real” legal writing should happen. And I query the rationales for recent instances of judges issuing sanctions against people who have inadvertently included them. It seems that mistakes in AI generated documents are treated differently from mistakes that can and do appear in any piece of writing. And the particular concerns about these errors occurring in court filings and the possibility that they will be duplicated in judgements seem especially interesting. One can find errors, whether minor or major, in many (all?) collections of published documents and precedents, whether traditionally published research materials, firm precedent banks, published academic articles, and court decisions.

We should assess whether and how AI hallucinated errors differ from other types of errors. Tools like commercially published precedent collections and firm document banks have certainly existed for decades and possibly centuries, so it has never been the norm that every document must be written from scratch each time. And it was always possible to take one of these documents and not check or update it properly. There have also been many instances where a particular case comes to be accepted as a reference for a particular point of law, but when the original case is checked, it says something different. In the case of incorrect citations in particular, legal publishers already all have policies to address how to handle these errors as they are of such long standing and so common. The real difference in an environment with generative AI is the speed at which these errors are proliferating, not their type.

The major change is not that the category of these errors is different, but that the volume of generative AI created content is making judges’ and lawyers’ jobs more difficult, and they are becoming understandably frustrated. Systems that make applications almost instant remove the friction that inhibited many people from initiating actions in the courts. Here is a story from The Guardian last year by Aisha Down and Robert Booth that outlines how this is affecting construction planning in the United Kingdom: “AI-powered nimbyism could grind UK planning system to a halt, experts warn.”

This is paired with general lack of savviness on the part of many people in the legal system using these tools. This group of people includes members of the legal profession and of the public, as well as students, professors, paralegals, and dare I say judges, which means that there is a mental shift required for this content as it forces creators and recipients of these documents to move from primarily creating to verifying while not really understanding what the tools they are using are doing.

One significant change is that generative AI tools allow people who create materials for use in court that look significantly more professional with less effort and background knowledge than was previously possible. This means that the common lack of understanding of how generative AI tools work doesn’t cause the outputs to look less proficient in a cursory review. Whereas a document created by someone who uses tools like word processors or thesauri without knowledge or care is quite immediately apparent. This change in the signalling — and the common accompanying increase in length — means that judges are forced to approach all materials with added suspicion.

The work of a judge cannot be easy, and I hope that judges feel personal responsibility to be correct in their reasoning. I discussed this topic with Matthew Waddington, and he attributed part of the frustration with the rise in computer generated randomness to the fact that we are accustomed to computer outputs being determinative, and we are having to shift our perceptions around how they behave. In contrast, human outputs are not determinative, our senses and minds did not evolve to tell us the truth. They evolved to keep us alive. In any extended exchange or conversation among people, it is almost certain that there will be things said that are not accurate.

Large language models behave more like us in this way, which gave rise to the famous criticism that they are “stochastic parrots.” This criticism from Emily M. Bender et al in their article “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” They asserted that as large language models are trained on large bodies of language data to provide plausible sounding language that is not based on actual meaning or understanding, it encourages people to ascribe meaning where there is none. This argument was refuted by Sam Altman, and there was extensive discussion about the issue, but the underlying reality these systems create more random results continues to be true.

We are all accustomed to interacting with people around us and evaluating how much trust we put into their outputs, and extensive structures within the legal system surrounding evidence, professional practice, and cultural norms, are designed to address these concerns in different ways. We are less equipped to deal with the automated systems we are confronted with now.

It seems unfair to penalize self represented litigants for errors in cited material in their submissions, as they use new tools that allow them to navigate the justice system more smoothly than they could have in the past. The primary concern appears to be that judges are frustrated by the systemic issues that generative AI is exacerbating, and that is not these people’s fault. However, examples of unsophisticated use of generative AI by lawyers fits more easily in the category of sanctionable behaviour, as it should be the responsibility of professionals to understand how to operate the tools they use appropriately. Nevertheless, these situations do fit within larger groupings of existing errors, and it isn’t necessarily clear why they are categorically different.

This column continues my theme of AI driven hallucinations and legal citation from my last column, which you can read here. I’d like to thank Matthew Waddington, Samuel Dahan, and Lachlan Deyong, who spoke with me as I wrote this column.

Start the discussion!

Leave a Reply

(Your email address will not be published or distributed)