Vagueness and the Scope of Caselaw Databases

Caselaw databases are frequently described as being “comprehensive” collections of cases with the meaning of the word “comprehensive” left undefined. The exceptions, of course, are databases based on print series of law reports which are by definition “selective”.

Some but not all database providers do say that they have so many hundreds or so many thousands of judgments covering specific years or time periods. Some say nothing at all. A few provide further details of the number of decisions by court level but, in general, vagueness is the order of the day.

“Vagueness” is not an acceptable standard

Legal researchers on the other hand need to know for certain what data they have in fact reviewed or considered as part of the research process. “Vagueness” is not an acceptable standard. A major commercial advantage should ensue to the legal publisher that can claim that it has audited its databases and can clearly state the scope of its databases in terms that can be easily understood by the user.

Auditing databases is not a difficult task. It is merely time consuming. LexisNexis engaged in one such exercise when Canada Law Book announced that it would remove its criminal and labour arbitration databases from Quicklaw. By the time the proprietary databases were removed, LexisNexis had identified the missing cases in its own databases and replaced the content.

Cases reported in print and available online

It should be possible for publishers who claim to have “comprehensive” collections of cases to be more specific about their claims.

Do they in fact have all cases reported in print? If not all cases, what cases? While it is not possible to say that a collection includes a particular series of law reports published by a competitor, it is possible to say that a database includes all cases reported in print from a specific date. It is also possible to say that a collection includes all cases decided prior to a specific date that are subsequently considered or referred to in cases decided after that date.

Similarly, it should now be possible to establish a mechanism to ensure that all cases issued by a court in fact were received by the publisher and mounted online. It is a very basic element of even the most limited quality assurance program to check to see that all of the cases that should be in a database are really there.

The era of claiming that a publisher has “lots of cases” or a “a critical mass of cases” or “more cases” than the next guy should be coming to an end. Meeting customer needs includes letting them know exactly what they are getting for their money.


  1. Gary brought up an interesting issue in the middle of at least three great religions’ special holidays plus the high point of our consumers life. This guy really knows how to manage time. Anyway, let’s try imitating him for a couple of minutes to discuss “comprehensiveness” of case law.

    I see two bases for defining “comprehensiveness”: one based on what courts and tribunals produce, and the other based on a publisher can do.

    First, let’s look at the courts and tribunal side of “comprehensiveness”. Based on courts activity, one can say that being comprehensive can be defined as having everything that a chosen group of courts have made accessible for distribution over a specific period of time. Is it that simple? Not really and for many reasons.
    – Courts do not distribute every judgments for publication, the frontier between what is for distribution and what is not is blurry, changing from court to court and at worst even changing from judge to judge in a court;
    – The main sources of differentiated distribution treatments are:
    o Written v. oral judgments;
    o Final disposition v. interlocutory matters;
    o Family matters (some courts are reluctant to distribute those for publication);
    o New and important legal issue v. mundane ones (some judges of the old school – even do very well respected – do not want to have every and all of their decisions distributed for publication).
    – Even though disappearing, in the past, through its contacts at a court, a publisher could obtain some judgments that the colleagues were not getting. However, the rival publishers were also having their contacts and through them they could obtain some other judgments that the first one was not getting.
    – Courts do not publish a list of their decisions, so it is not easy for anybody to precisely know if they have received everything (court’s web sites help a lot in this respect).

    However, improvements seem to be coming. The Canadian Judicial Council’s JTAC Committee is currently discussing recommended practices with regard to judgments distribution. An important goal of these new recommended practices would be that each court has known policies and processes for judgments distribution. If this committee work was to be fruitful, the end-result could improve the situation and lead to a much clearer picture of what has been distributed by a court in a defined period of time. At the end, this will help assessing comprehensiveness.

    Second, each publisher must develop some sort of workable definition of “comprehensiveness” for case law. I can tell how we define comprehensiveness at CanLII.

    For a case law database, “comprehensive” means that within our continuous coverage (which is clearly stated) we have all the judgments which can be identified by one of these four approaches:
    – The court website (when applicable);
    – The neutral citation (some courts use a strict sequence of numbers);
    – The citations found in the Reflex set of 33 report series;
    – The citations found in CanLII documents.
    This is it. Furthermore, even though I’m not writing this to promote CanLII, I guess that our users appreciate the CanLII’s transparency with regard to its holdings. Everything is out there to be assessed: from what date, how many judgments. One can even browse the whole database.

    Back to what a publisher could do I want to add that other publishers have a supplementary way to check the comprehensiveness of their databases: they can check out CanLII. And they do.

    In the old days, “comprehensiveness” has been treated by some as a strategic advantage . This is a bit bizarre. I guess that a publisher job is to enrich what is received from the sources, to sort out the most important judgments, or as we do at CanLII to make legal information easily accessible with good tools.

    Having three or four case law publishers trying to have comprehensive databases without the possibility of checking with each other is somewhat childish. This sort of situation motivate me when I set up the Canadian Citation Committee (CCC) 12 years ago. I wanted to create a forum where all publishers and important stakeholders could collaborate to improve legal publishing practices. All major publishers – but LexisNexis which quit a couple of years ago – are participating in the committee. The CCC has done the original work which is currently discussed in the Canadian Judicial Council’s JTAC. Let’s hope that this sort of collaboration between all interested parties will lead to a better distribution of judgments.

    Happy holidays to everybody, I’m going back to my more festive leisurely activities. I wish Gary will spare some time to do the same.


  2. Ruth Epstein, Vice President, Canada Law Book

    I want to address Gary Rodrigues’ comments dated December 29, 2008 where he said “Auditing databases is not a difficult task. It is merely time consuming. LexisNexis engaged in one such exercise when Canada Law Book announced that it would remove its criminal and labour arbitration databases from Quicklaw. By the time the proprietary databases were removed, LexisNexis had identified the missing cases in its own databases and replaced the content.” Gary’s claim contradicts LexisNexis’ own statements on its website where it says that reported decisions back to 1970, and those decided prior to 1970 that have been cited after 1970, are included.It should also be noted that our research shows that LexisNexis has not replaced all of the cases which had been provided in Quicklaw by Canada Law Book.

    Happy New Year to all.


  3. Gary P. Rodrigues

    Ruth is correct in her comments on my statement regarding the scope of the LexisNexis audit and the enhancement of the content of its databases that took place after Canada Law Book decided to withdraw its content from Lexis Nexis.

    The audit and the subsequent data build were restricted to cases decided since 1970 and cases decided prior to 1970 that were considered in cases decided after 1970. Both the announcements and the statements of the scope of the databases indicate that point clearly. I should have been less “vague” and stated that Lexis Nexis replaced the cases that fell within the scope established for its caselaw databases.

    Only Canada Law Book offers the complete collections of Canadian Criminal Cases and Labour Arbitration Cases.

    Thanks Ruth.

  4. I have read this thread with great interest, and would be very grateful if any of you could help me with a related issue. I am an LLM student at Osgoode Hall, and my research involves a statistical analysis of cases in which the Office of the Children’s Lawyer made a recommendation about a custody or access issue. My dissertation is about the extent to which courts follow these recommendations when they are made.

    I have identified all of the cases in the QuickLaw database since 2003 which meet these criteria. The challenge is figuring out whether or not these cases from QL are a representative sample of the population of cases in which one of these recommendations was considered in an Ontario court.

    The Quicklaw people were only able to tell me that they publish everything which they get from the “central distribution division of the Ontario Court.”

    Is anyone familiar with this “central distribution division?”

    Does anyone know of any specific bias in the QuickLaw database, e.g. they report more or fewer cases from certain parts of the province?

    There’s a broader issue, which is of interest to anyone using caselaw databases for statistical/quantified analysis (as opposed to precedential value). How can one know whether a given database is a representative or a skewed sample, with regard to a given characteristic?

    Any help would be very welcome indeed!