Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization
Artificial intelligence is making its way into courtrooms around the world, and not always for the better. Judges have been caught embedding AI-generated fictitious case references in judicial decisions, in Canada and internationally; and there are no doubt other, more subtle, machine delusions slipping into case law undetected. Judicial misuse of AI tools has profound consequences for the administration of justice and for public confidence in the courts. But a less obvious threat also deserves our attention: a growing body of research indicates that large language models (LLMs) have a homogenizing effect on writing and analysis, meaning that judges’ increasing reliance on AI may stifle the common law’s development.
Judges in Canada are prohibited from delegating their decision-making authority to AI tools. Short of that, some judges are using AI tools to assist with judgment writing. This could involve, for example, asking a tool to generate text summarizing the facts of a case or synthesizing the relevant law. Some or all of this text may then make its way into a judgment. This marks an important shift: the judge moves, by degrees, from author of a judgment to editor of machine-generated text. The concern is that if judges increasingly start from machine-generated text, the arguments, ideas, and language of their decisions may start to become overly influenced by AI and converge on certain tendencies found in these tools. And with such convergence, the future horizons of the common law, as a whole, may start to narrow as well.
A growing body of research suggests this risk is real. A recent study comparing human-written essays and AI-generated essays found that “essays generated by different LLMs…converge to a smaller set of main arguments, sub-arguments, and paragraph-level structures.” In other words, the study found a “tendency of different LLMs, built by different frontier industry labs, to return to the same small set of plausible arguments rather than span the broader range of arguments humans make.” The researchers coined the term “argument collapse” to describe this effect.
Another recent set of studies, again looking at AI-generated essays in comparison to human-written essays, found that “despite their potential to enhance individual creativity, the widespread use of LLMs could diminish the collective diversity of creative ideas.” Over three studies, human-written essays were found to have, collectively, two to eight times more creative content than the AI-generated ones. The researchers noted that “this homogenizing effect persisted even after a range of enhancements to the diversity of the GPT-4 writings, including prompt and parameter modifications”, leading them to conclude that “even as AI tools continue to improve and produce ever better and more creative output, they may still contribute to an overall homogenization of ideas.”
Relying on AI tools not only risks narrowing the arguments and ideas that users engage with, it can also have a homogenizing effect on the language used to describe these arguments and ideas. The opening paragraph of a recent New York Times article paints a vivid picture:
In the quiet hum of our digital era, a new literary voice is sounding. You can find this signature style everywhere — from the pages of best-selling novels to the columns of local newspapers, and even the copy on takeout menus. And yet the author is not a human being, but a ghost — a whisper woven from the algorithm, a construct of code. A.I.-generated writing, once the distant echo of science-fiction daydreams, is now all around us — neatly packaged, fleetingly appreciated and endlessly recycled. It’s not just a flood — it’s a groundswell. Yet there’s something unsettling about this voice. Every sentence sings, yes, but honestly? It sings a little flat. It doesn’t open up the tapestry of human experience — it reads like it was written by a shut-in with Wi-Fi and a thesaurus. Not sensory, not real, just … there.
As the article summarizes “[o]nce, there were many writers, and many different styles. Now, increasingly, one uncredited author [AI] turns out essentially everything.”
Inspired by these research studies and observations, my suggestion here is that judicial use of AI tools for judgment writing risks narrowing and flattening how the common law is described and developed, and that this is a risk we should strive to avoid.
Moves to a more singular, algorithmically driven judicial voice run contrary to the very nature of our common law system and our commitments to a diverse, independent judiciary. It is widely acknowledged and accepted that judges bring their own “conceptions, opinions [and] sensibilities” to their work and that this is a strength of our justice system. Multiple and diverse perspectives facilitate the fullest development of the common law. The common law grows precisely because different judges reason by different routes and in different terms, and the law is richer for having different paths through complex legal problems.
To be sure, judicial decisions are not a place where we want unbridled creativity, as we might in the realms of fiction or poetry or art. Some conformity in judgment writing is a good thing. Judges are, of course, necessarily constrained by precedent as well as formal and informal norms about legal reasoning and judicial writing. Judicial writing has been criticized when it veers outside of perceived conventions, for example, by inserting pop culture references or literary flourishes in legal decisions. In his recently published PhD dissertation, Canadian lawyer Jon Khan makes a compelling case that Canadian common law could benefit from even more standardization in written judicial reasons.
Even so, where judges converge on shared analytical and linguistic practices, there are good reasons to want those alignments to be products of human judgment exercised by thoughtful judges, not a byproduct of machine defaults. The priorities embedded in LLMs are not chosen with justice system values or the public interest in mind. The argument structures and turns of phrase these tools favour are artifacts of training data and developer choices, not a considered view about how a judicial decision should be built to serve the parties, the profession, and the public it speaks to. In other words, to the extent that approaches to judicial decision-writing converge, they should converge around the values and priorities of the legal community, not around the preferences an AI model happens to favour.
The tendencies of AI tools may not only differ from the priorities of the legal community; they can sometimes be actively harmful. Take, for example, a study published earlier this year wherein ChatGPT was used to generate over 140,000 legal memos and the model was found to exhibit “a prosecutorial default bias…systematically recommend[ing] prosecution–even when prompted from a defense perspective, confronted with minimal evidence, or presented with clear constitutional violations.” AI technologies, like all technologies, are not neutral.
No doubt, the severity of the risks raised here scales with the depth of reliance. The risk of undue homogenization is far greater for the judge who lets an AI tool draft a decision, or significant parts of one, and then simply edits the result than for the judge who turns to AI only to check grammar or work through a stubborn sentence. The risks also depend on which tool is used. Some tools are now being built specifically for judges, designed—at least in principle—to better reflect the demands of judicial work. Whether they deliver on that promise remains to be seen.
The ultimate plea of this column echoes one I’ve made before: judges must be careful about using AI in the judgment-writing process, even when they are using it in ways that might be thought of as merely “assistive” and therefore low-risk. The subtlety of the risks raised in this column, as compared to those arising from fake cases, requires, in my view, a heightened vigilance. The types of changes described will not necessarily be immediately obvious and will not happen overnight. As a senior English judge recently observed: “[t]he danger is not a dramatic coup by machine. It is of a gradual drift: standardised prompts, standardised summaries, standardised risk scores and eventually standardised dispositions. A court system may appear formally independent while its informational architecture has been captured by technology we cannot inspect, challenge or control.”
It is easy to be seduced by the ease with which generative AI tools produce seemingly high-quality text; but in the judgment-writing context, there’s a lot at stake when writing is outsourced to machines. A body of common law written by judges who all lean on the same handful of models is a common law that stops sounding, and reasoning, like many minds and starts sounding like one. And that one voice is not oriented toward the values of our justice system. Worse, it gravitates toward a synthetic, flattened and sometimes biased default. This is a future we can, and should, resist.


Start the discussion!