The Big Data Problem for AI in Law

Artificial intelligence is a big deal. It will change our society, and the way we do things. Just maybe not immediately, and in law it might be even longer.

The function of artificial intelligence is directly connected to the concept of big data. The superior functioning of artificial intelligence over current processes is based in part on the superior ability of computing large amounts of information, data sets that are so large and so complex that the traditional means of processing this information simply isn’t adequate enough when compared to techniques like predictive analytics.

For this reason, much of the research in artificial intelligence is directly connected to big data. For example, the Intel Science and Technology Center for Big Data at MIT is hosted in the Computer Science and Artificial Intelligence Laboratory. One of the greatest challenges still being explored is how to maintain data integrity and security while performing searches across trillions of records in an expedited manner.

The bigger problem for the legal industry in Canada is not how to conduct these searches, preserving the integrity of the data, or even the utility to the target industry. What we have is a scarcity of data, especially when dealing with predictive tools based on reported decisions.

To be sure we have plenty of legal information, generally. There are countless texts and treatise on the law in Canada, enough to allow for bigger and better research tools.

What we don’t have is a wealth of written judgments by judges. We certainly have some written judgments, and even databases of written judgments, but not data sets large enough to provide adequate precision and predictive value as promised by artificial intelligence.

There are many reasons for the nature of our data sets, but we can start with our relatively small population as compared to American jurisdictions. What also features into this is the litigious nature of our society, or lack thereof.

To illustrate, 2015 Litigation Trends Annual Survey by Norton Rose revealed that the number of American respondents reporting over 50 legal proceedings commenced against them was nearly double that of the Canadians. The American respondents, who largely reflected legal representatives from large companies, were also more likely to sue others than the Canadians were.

More lawsuits means more legal decisions. Presumably.

Unless things settle, which they often do. Although the pressures to settle are also present in American jurisdictions, our lower damages quantum, a distinct costs regimes (i.e. the English Rule), and legal reforms pressuring settlements means that Canadian civil lawsuits are far less likely to proceed to trial and produce a written judgment than American civil lawsuits are.

Confidentiality clauses in such settlements typically preclude these terms from ever being publicly disclosed, let alone combined into one massive data set for analytical purposes.

The problems this creates for artificial intelligence systems operating in law might be highlighted by a new entrant into the Canadian market, Premonition, which was recently discussed in Canadian Lawyer,

“We keep track of wins and losses and all kinds of interesting things that have never been done before,” said Toby Unwin, who, along with Guy Kurlandski, runs the U.S.-based company.

It’s premised on the notion that the only thing that matters to a client in litigation is winning. “Winning is the best indicator. I truly believe that if I am the client, I want to win,” said Unwin, who holds an LL.B. from a U.K. law school and developed the system.

Aside from the fact that there’s rarely a “winner” in civil litigation cases that have some merit, only varying degrees of success, keeping track in the Canadian field is obscured through all the hurdles identified above. To date, Premonition only has a data set of several thousand cases.

Some Canadian insurance companies (I won’t name them) have been developing their own data sets and rudimentary systems for predictive analytics over the past years. They use these figures to assess the past success of similar claims, and can also enter variables like the counsel involved, and increasingly, the presiding judge.

These companies can then use this information to provide instructions to defence counsel on how to proceed in the litigation, or more likely, their positions on settlement. By targeting slightly lower than what these systems predict the best outcome may be, they can slowly over time reduce the expectations for settlements from plaintiff lawyers.

Plaintiff-side lawyers in personal injury especially have been anecdotally reporting a downward trend in settlements, but lack objective data or the ability to aggregate settlement data across different practices to discern any objective conclusions. The competitive nature of the plaintiff-side bar, and need to protect client information under the Rules of Professional Conduct, prevents the type of collaboration that would allow them to push back against larger entities on the other side.

Unlike other artificial intelligence systems being introduce in law, which can analyze contracts or assist in research, the use of reported decisions for predictive decision making in litigation operates in a very different way. Although I agree when Premonition told me that the more Canadian cases they add to their system the better it will be, I have a strong feeling, a premonition even, that this is not going to happen any time soon to have the same level of predictive power with any measure of statistical accuracy.

There is no fault here of any vendor or software attempting to replicate services which are being rolled out in the U.S. It’s simply a function of the differences in data in our jurisdictions. And that’s a big problem for the development of artificial intelligence in Canadian law.



  1. The system is designed for opinion to be presented after thoughtful consideration of fact, free from bias, ignorance or prejudice.

    Technologists, jurists, advocates, and the public should be very excited by the new tempo.

    Justice delayed is justice denied.
    -A distinguished jurist from yesterday.