Racing to Nirvana – Legal Research Edition

Will Amazon start delivering packages before you order them? They’re getting close.

Will your autonomous vehicle know your destination before you tell it? Probably, if you are sticking to a routine.

Will legal research databases give you what you need before even you know what that is? Don’t bet against it.

In Tim Knight’s recent Slaw post on the black box of artificial intelligence, he talked about the importance of understanding the “how” of the underlying algorithms as we become more reliant on both their results and their predictive capabilities. Unsaid but implied in Tim’s post was that, yes, these capabilities will become more and more advanced, and we will become more reliant on them even as we risk becoming less aware of how they work.

Legal research has, and will for the most part always be undertaken for the purposes of exploring and applying the law. Our tools, however spectacular and magical they may be, are only half of the research equation. Whether as part of a historical review, in the context of a specific set of facts, or for forward looking policy purposes, the human strengths of rhetoric and strategic thinking are needed to give the results shape and bring them to life. So why, then, should we be thinking any differently about the implications of the current wave of technological tools and black box algorithms?

Speed. And assured abandonment of past practice.

Change is happening far faster than the LSAT to Bar Call cycle can ever hope to match. And for the great majority of legal researchers, evolution to newer tools has routinely threatened, if not severed completely for many, our connections to older tools. In changing how we work, the tools inevitably change how we think. They change how we practice law.

Consider this.

The common law is built on precedent. For decades, centuries even, we’ve relied on the editorial judgment of case law reporters to help us identify important points of law and indications of its evolution. Many venerable publishers like the Incorporated Council of Law Reporting (“…established in 1865 as the authorised publisher of the official series of The Law Reports for the Superior and Appellate Courts of England and Wales”) carry on the tradition today. They remain relevant for only so long as courts deem them relevant, and then, only among those who deem them relevant. Just like in the story of Peter Pan where the Darling children are told they can fly if they believe, but in losing faith or growing up, they lose the ability and lose the path back to Neverland.

When the transition from print to digital started to take off in the 1990s, Neverland began to lose its allure as a destination for successive generations of lawyers. Yes, many courts in most countries still express a preference for print citations, or at least to citations only to those cases they or case law publishers have deemed of precedential value. But as Gary Rodrigues observed in a Slaw column over four years ago, even the much vaunted curation role associated with publishing case law reporters had mostly separated from reality as the books were increasingly compilations of lesser cases selected mostly for their ability to fill pages in a targeted print run.

Print-based research is at the margins, and with it any sense of selectivity. The digital era facilitated research across a much broader body of law, and with it, a different style of advocacy. Lower court decisions, once deemed not even worthy of reporting, now appear with significant frequency in citation networks. Indeed, even lower courts from one jurisdiction are now cited by superior and appellate court decisions in another – not just occasionally, but routinely.

In the past 20 years, we’ve changed the way we research. That changed the way we argue. That is changing the law.

In the past five years (that LSAT-to-Bar Call window I mentioned), the major research platforms have tried to move the profession towards away from precise Boolean string searching and towards natural language searching. There has been some push back from top tier researchers who can appreciate precisely what risks being lost in this trade-off, but just as we’ve moved printed case books to the margins, we are surely moving Boolean searching to the margin.

Changes to the way we think and argue as a consequence of this new research paradigm are starting to peek through. But more importantly, because we’ve once again reshuffled the cases upon which we build our arguments, our arguments are changing too.

The next wave is upon us. The rise of data analytics gives us new tools to look at molecular and 40,000 foot level aspects of our case, of the corpus of relevant law, and indeed of the players in the debate. As we accept that there is more to know, demand for the tools that will give us that knowledge will grow.

Enter natural language processing, machine learning, and augmented intelligence.

Just as we can’t look at the engine of a Tesla and expect to apply what we learned fixing our 1973 Dodge Dart Swinger, we will eventually have to abandon any pretense of applying research methodologies built around print as the basis upon which we evaluate the effectiveness and quality of the next generation of legal research tools.

Tim’s post concluded with the thought that even if we were to gain insight into the artificial intelligence black box, it would still be pretty black inside. This is all but certain to be the case in the not-so-distant future where research tools applied to law are powerful systems of general application that, among many other domains, can also dissect legal information. In the near term, we can seek to intelligently train the machines that will drive our research and our law.

Maybe I’m wrong. Maybe the way we research hasn’t changed the way we argue. Or if it has, maybe it won’t this time. Then again, maybe it will.


  1. Colin,

    Whether you favour natural language or Boolean methodologies, or some search methodology yet to come, YOU tell me this based on whatever search methodologies you’d use to search whatever online data bases you want.

    You’re providing an opinion on what you think the SCC will do in a personal injury case where, shall we say, the major issue is factual cause. Your conclusion is that the SCC will decide the appellate judge made errors sufficient to allow the SCC to allow the appeal. You now have to decide if you think the SCC (1) will find for the plaintiff(s) – the clients of the lawyer who consulted you; the defendant(s); or send the case back for a new trial. You’re doing this over the next few months before the SCC says anything more on point, so the last 3 cases you that might help you decide what the current panel will do, bearing in mind we have new judges are Clements, Ediger and Benhaim.

    What else are you going to look at and hoow are you going to program the black box to weight factors that aren’t anywhere in the record(s) that you have access to.. Because, for example, you coul make some nastry assumptions – but not about anything illegal – that would explain how to fit the results of the three cases together. By result, I mean only whether the cases were sent back for a new trial or judgment was given for one or the other of the parties.

    Then maybe create an algorithm that purports to select for cases decided on a “rule of law” basis, somehow weighted to identify various factors that you think are more important orr less important.

    Maybe you’ll decide you need to know what the judges had for breakfast that morning, the number of reserves, whether the judges who nominally granted leave are on the panel if it’s not the full court,

    Maybe you’ll decide you need to input the leave to appeal and appeal facta, too, and somehow take into account what was in there in assessing the significance of Clements, Ediger and Benhaim to the decision you need to make to render your opinion bearing in mind you don’t actually know what grounds of appeal the SCC thought mattered and, remarkably enough, sometimes the SCC delivers comments on issues that (arguably) weren’t dreamt of by appellate counsel;

    If you’ve got a black box that can handle all of that, yet – or if you think one is just round the corner – let me know. I’d like to shake ze’s hand, so to speak. (It’ll pass a Turing Test.) I’ll ask it if it nows, H.A.R.L.I.E (v. 1 or 2) or even Marvin.

    Absent that, though? I’ve just described a good doppleganger for Eugene Meehan & his colleagues. I’d still take them over your black box, at least for the next decade, especially where lots of money is at stake.

    Where it’s not? Heck, it’s only people’s lives sometime. Lives don’t really matter as much as money, do they? (Look down south).



    If you have all of that, you might be able to second guess, say

    the SCC the current panel decides

  2. Hi David,

    I’m going to take a shot at responding. Two shots, in fact. My first response will be based on what I thought my article was about. The second, on what I’m sensing you thought was my article’s thesis.

    Version 1:

    My third sentence notwithstanding, in this article I described how the tools we use change the way we search, change the results we examine more deeply, and change the way we argue matters. I emphasized that “however spectacular and magical [the tools] may be, [they] are only half of the research equation” as “the human strengths of rhetoric and strategic thinking are needed to give the results shape and bring them to life.” What I did not do at any point was claim the box would know “the answer”. The evolution of research methods has and will change how we research, what results are surfaced, what we do with those results and, inevitably, how the law progresses from there. I’ve made no claims as to machines achieving Eugene Meehan-level insights or suitability of any system results to purpose. While others may claim that given enough data, algorithms can be fine-tuned to the level of predicting outcomes, I have not. Consequently, my response to your scenario is the best outcome and strategy will follow where the black box is put at the service of Eugene and team.

    Version 2:

    Let’s imagine the world where the straw man of your scenario looks to the black box alone for guidance. How might we get close to something resembling reliable guidance? Well, there are two parts to the equation. First, for every input to the black box algorithm, there would need to be a relevance quotient and weighting derived after examining the entire body of potential relevant information. And second, the system would need to be able to not only read and synthesize the material, but be able to engage in “discussions” with the user in a manner that allowed for nuance, gaps and more in order to both correctly identify and classify inputs and provide outputs in a form the user could understand and act upon.

    If we wanted the judge’s breakfast among “n” inputs, we might assign that element a label of i’ and give it a provisional weighting of w’, with the actual weighting subject to revision based on some set of rules that assess how a given input has historically influenced outcomes. Once all the pieces came together, we’d wind up with some formula incorporating [(i’w’);(i”,w”);(i”’,w”’)….(i~n, w~n)].

    No matter how close this got us to an objectively “correct” answer, it’s a mug’s game guessing what the SCC is going to say on anything. No computer can ever guess where the Court will choose to go. And as all admin law scholars will attest, when it comes to things like standard of review, no human can ever guess either. The better question is whether the machine could distill an answer and strategy that has a high probability of success at a lower court. There the answer today is probably, and the answer tomorrow is quite likely.

    Humans being humans, however, the objectively “right” answer is rarely the driving factor, as the determination of “right” often only follows the judge’s declaration.

    I will close with my all-time favourite AI-in-law tweet:

    ” ‘the judge will have to accept it. It’s not an opinion, it’s mathematic.’

    Judges ain’t gotta do shit. Welcome to the law, son.”

  3. “The better question is whether the machine could distill an answer and strategy that has a high probability of success at a lower court. There the answer today is probably, and the answer tomorrow is quite likely.”

    Only if the problem is essentially axiomatic. Very few are but, so long as one is prepared to assume they are, maybe the wrong decisions will be balanced by the right decisions in some utilitarian sense. (Of course, death or its equivalent isn’t yet reversible, of course, but you can’t make an omelette without … etc, right).

    And the tweet? Only if the problem is only axiomati; governed by a priori true axioms. But most of law isn’t that way.

    I’m curious, too. How will you program for a judge (or jury) intentionally ignoring the law? A judge can distinguish it of course. The distinction may not have an air of reality, but the process is how the case would be distinguished, if the distinction was valid. Is one of your weighting factors going to be the tendency of judges to do that sort of thing?

    I realize that, in Canada, we don’t (notionally) have the problem of juries believing they have the right to ignore the law. But remember Morgentaler?

    You’ve set a very low target, and bluntly, you’re assuming (1) you know the judge you’ll get, because otherwise all you can to is provide answers for all the judges in the jurisdiction, with a caveat that if there’s new judge, you’ll give the money back (2) the judge knows what the law is supposed to be – or are you simply going to series of simulations based on various degrees of lack of knowledge and (3) the SCC, or a higher court in the hierarchy, doesn’t do what Clements did to Athey and “retcon” out of existence 10 years worth of your database.

    Most of legal decision-making is still inductive, not deductive. There’s always the possibility of a black swan. How do you program a deductive algorithm to allow not just for something you don’t know, but something it can’t conceive of.

    We already have very simple versions of your black box. It was the Ontario rules used by insurers to decide which driver is at fault for car accidents. So long as they apply, it’s black and white.

    I’m not arguing against what the AI folk are trying to do. I’m only saying that there’s a lot of snake oil in what I’m hearing – at large – I’m not referring to you – and that snake oil is there because there’s so much money to be made.

    Best of luck,


  4. David

    Thanks for clarifying that your comments are directed at an at large discussion of AI in the law. I couldn’t quite figure out why it seemed you were commenting as if i had promised a future of mathematical certainty.



  5. Colin, No I don’t think search has changed the way we argue, rather both
    are under pressure to do more with less.
    One of the things I respect about the legal profession is somewhat more
    consciousness about employment of people (lawyers and staff) as compared
    to the computer industry in which human resources are simply the fuel
    of corporate profit.
    For that reason, I think software innovation will have to work closely with
    the people who most understand what the law is, and it’s an important
    fact that the law is somewhat opaque to even the brightest human minds.

    Seemingly by coincidence, this article appeared in recent newsletter,
    which suggests the ieee recognizes the issue :

  6. Colin,

    You’re welcome.


  7. Robert

    Sorry, the law isn’t opaque to the brightest minds in any useful meaning of opaque.

    Before I go on, though, I’ll tell what the ONLY meaning “opaque” could have to make your statement true. It has to be something that amounts to this:

    “law is completely arbitrary, irrational, nonsensical, whimsical, that that’s no basis for making any reasonable guess as to anything that will happen based on anything that happened before, not even based on what the judge had for breakfast, not even based on the assumption you’ve bribed the judge with enough money that the fix will last.”

    If that’s what you believe, that’s what you believe and you should stop reading this response here.

    But if you don’t believe that aberration, then:

    If you want to believe what you seem to mean by your opaqueness statement – that there are merely some areas of law that, for some unknown reason, even the brightest human minds can’t understand, for whatever reason, go ahead.

    One problem with that assumption is that there’s always the possibility of a brighter mind. (Look up the black swan fallacy). Another is that, if you’re not one of the brightest minds, how can you know what they can or can’t understand. There are other errors in your assertion that I can spot, and I’m going to stipulate that I’m NOT “one of the brightest minds”.

    Because, if I was one of the brightest I’d tell you that there’s nothing opaque about any area of the law. Then you’d have to believe me, wouldn’t you, unless you’re prepared to call me a liar. But if you know I am a liar, that’s because you know as much about the law as I do. If you don’t, what’s your basis for calling me a liar?

    So, Just understand your very wrong in your belief about the opaqueness of whatever it is about law that is unique to law. (The non law stuff? IP, science, advanced math? If that’s opaque, it’s not because it’s law.)

    Even if you’re not prepared to believe that, do what you should do to test if you’re wrong. Make the opposite assumption and follow it through to the end.

    Here is part of the reason why people make the claim you seem to have made. (I’m going to avoid using a short description because some readers will get offended.) Instead, I’ll describe the problem. You can apply the label yourself.

    I’m saying absolutely nothing about whether this part applies to you.

    Assume a lawyer is lecturing to a group of younger insurance adjusters and accountants about some area of law both need to know. At least some of them will be doing those jobs because, for whatever reason, they didn’t get into law school or they got in and had to leave without completing law school. If the lawyer is honest and says

    – this most of the area isn’t complex, difficult, compared to say, chemistry or physics, there’s just a lot of detail to remember but, guess what, you usually don’t need to remember it immediately, just know where it is

    – and, in any event, most us never deal with the really complicated stuff because we have people in our office who are good at that

    – we deal with the meatball stuff that’s your bread and butter and mine, because it’s profitable for the company and you

    Guess who is offended?

    Usually ever person who didn’t get into law school or dropped flunked out. Because they take the lecturer’s comment to mean they are dumb. But that’s not what the lecturer said, was it?

    So, if you want to keep your client’s younger insurance adjusters and accountants happy, you tell them law is very difficult. (Unless your friends are there, in the background. Then you say, especially for people like John & Phil over there, but then they used to be rugby props, or goalies.)

    But, still, if you want to keep buttering up the legal profession, you’ll probably find few who will tell you to stop.



  8. A “3 Geeks and a Law Blog” post earlier this week discussed the matter of algorithmic accountability in legal research tools and how human biases skew the machines.

    I submitted my piece to Slaw a couple weeks ago, so I’m pleased by the contemporaneous publication of this complementary piece.

    Summarizing a recent study from Susan Nevelow Mart, the 3 Geeks article makes some pointed observations:

    “a little bias and assumption on the part of the people developing the computer algorithms can cause dramatic changes in the results produced with similar content and search terms.”

    “What is a researcher to do in this day and age of very little Algorithmic Accountability? First, researchers need to call upon these database providers to give us more detailed information about how their algorithms are set up, and the technical biases that result from these rules. ”

    “Until we better understand the processes that go on in the background, researchers today should expand their searches, and use multiple databases in order to reduce the effects of technological bias. ”

    “Until legal research providers begin to open up their black boxes and adopt more Algorithmic Accountability, researchers will need to expand our own legal information literacy with a better understanding of how each database compiles, categorizes, indexes, searches, and prioritizes the results.”

    For those interested in the topic, I encourage you to read the 3 Geeks piece as well as the research on which it is based.

    SSRN version of “The Algorithm as a Human Artifact: Implications for Legal {Re}Search” available at

  9. Colin,

    I think it’s very ironic that, as I see it, the core belief – call it the religion -of many of the “we can create, we have created, useful and becoming better, AI search algorithm for law” adherents are the literal opposite of Mr. Semenoff’s apparent belief.

    As you’ve read, he claims that law is “is somewhat opaque to even the brightest human minds.” “Opaquness” is like pregancy, or Judaism, or in Western religion (suject to one exception that isn’t universally accepted any more) death . You either are or you aren’t.

    (I digress to add: Isuppose I could have added, the “X” and “not X” of the situation where the SCC tell us that what Canadian lawyers are to now understand is the ‘correct’ meaning of words in an older SCC decision where the SCC now says that the older meaning, long acted on, long accepted – even buy the SCC – isn’t actually what those words were ever intended to mean; isn’t actually how those words should ever have been understood; and this isn’t because the dog ate somebody’s homework or other equally worthwhile excuse.)

    Mr. Semenoff’s belief, if correct, means that, unless the brightest human minds are capable of creating a mind that is brighter than those, or a yet-to-born child is brighter yet, or somebody gets very lucky, law as it is known, now, will remain permanently opaque to human minds.

    On the other hand, the conceit of the AI programmers / researchers, is that law isn’t opaque at all; rather, that eventually we will have adequate AI algorithms, that will eventually conduct legal analysis at a level equal to the best human analyst, and capable of doing it faster.

    Somebody has to be wrong

    Maybe even me. After all, I was wrong about what the SCC was going to do in Clements. In my defence, though, I point out that my co-author is now on the SCC bench.

    Yes, there are non-sequiturs in what I’ve written above. That’s intentional. That’s because good chunks of seemingly standard logic used in law is, in fact, not standard logic at all. It starts at non-sequiturs and goes down for there yet the answer may, nonetheless, be a valid answer in law. The process by which the answer was produced my also be valid, in legal terms.

    Do you say that the AI world is designing an algorithm of legal research that thinks in acceptable to law non-sequiturs? (Some already exist as I’m sure you know. They’re called inadequately prepared judges, lawyers, law students, etc. I was going to start and stop at students, but decided that would be unfair: to the students. Many of them don’t know better because they haven’t yet put in their “10,000 hours”.

    I’ll continue this response in a first level posting, that’ll go up within a few days, as I have other things to do first and so that I can use the better composition tools available to first level posts.



  10. OOPS.

    >> I think it’s very ironic that, as I see it, the core belief – call it the religion -of many of the “we can create, we have created, useful and becoming better, AI search algorithm for law” adherents IS </b) the literal opposite of Mr. Semenoff’s apparent belief.<<

  11. Without the extraneous “</b)

  12. >>Yes, there are non-sequiturs in what I’ve written above. That’s intentional. That’s because good chunks of seemingly standard logic used in law is, in fact, not standard logic at all. It starts at non-sequiturs and goes down from there yet the answer may, nonetheless, be a valid answer in law. The process by which the answer was produced my also be valid, in legal terms.<<

  13. David, It’s not just a conceit but a real possibility that machine intelligence could
    be teaching us things about the human experience that no actual human could even conceive
    of. Given a rich enough database of human experience. Maybe that would be a good thing.

  14. It depends on the area of law and what the client needs. Do you really think for compliance for innovative building design and (engineering) building code interpretation works? It works for initial submission of drawings for checking on dimensions, symbology, etc. The organization I’m with, has developed a software checker for initial drawings submissions. But still requires a SME human being to interpret and review other infrastructure details for compliance, structural history if its a retrofit, etc.

    Probably already happening, but the client is screened by answering a series of questions and auto-triaged. Right now, there is a linearity that’s time-consuming and familiar to us via phone help lines.

    But probably already some basic forms could be/ are autofilled…but would truly need in-person authentication by the client/applicant, etc.

    We are also forgetting that law …is culturally based in its expression of law, interpretation by the legal services provider and comprehension (emphasis) by client for next steps or the legal English.

    Like self-driving car technology if such cars have their own dedicated lane, robo legal advice from start to finish for client with a done solution, works under several assumptions about the client, simplicity of the legal problem (if it is a legal problem vs. something remedied by better communication involving the right parties that is less adversarial/costly), etc.

    Can we trust AI to probate an estate without human intervention all the complexities of financial information in different sources, stakeholders and if the will was explicitly worded? It’s a lot at stake to use only AI.

    How can the client trust the most recent version of law has been integrated into the design of AI? Who do we hold accountable if not?