Following CanLII’s multiple announcements in the last weeks and months, we wouldn’t blame anybody for failing to see the big picture from these individual pieces. I thought I would use this column to recapitulate and give some perspective.
Individually, the steps we took in the last few years can be seen as merely incremental, but the overall result is that CanLII became a radically different beast, for the better of course. This post strings together these individual announcements with the objective of presenting a clearer picture of what CanLII has become, and to show its the increased potential.
Let’s start with the publication of all decisions in the Dominion Law Reports (DLR) that had been cited in the CanLII collection prior to the start of this project in the fall of 2016. Choosing content on the basis of citation frequency is a new strategy for the development of the historical collection at CanLII. While we would like to be able to scan and publish all the decisions ever reported in Canada (or to aim for a specific date range), there are always individual “per document” costs which make the project prohibitive. Aiming only for those decisions that have been referred to in the last 15+ years in Canada allowed us to maximize the return on investment for our community of users. We would argue that date ranges are now nothing more than artificial standards that are less and less correlated to the actual usefulness of a collection anyways. The DLRs were the second biggest collection (in terms of the number of times it’s cited in Canada) after the SCRs (which we already had), so to be able to say that we have “all DLRs that matter” is an important development.
The first batch of DLRs (those decisions since 1980) was published in our standard HTML model. The second batch (older than 1980) coincided with the deployment of our new way of displaying historical decisions (in PDF). f you happened to never have stumbled on a decision in this format, you can see an example here. This is perhaps the most important news associated with this project. This publication model is considerably cheaper for historical documents than our default model that goes all the way to converting the documents to “standard” HTML. Yet, the loss of functionality is minimal in comparison to our (still preferred for current case law) HTML model. That is to say that the quality of the OCR is very high and allows highlights and text selection, the loading speed of the documents is almost the same as for HTML documents thanks to a software trick we call “Spiffy PDFs”, and on top of that we still manage to add links to cited primary law (with Lexum’s Reflex software). CanLII has always aimed at having a collection of historical materials that favourably compares with any other offering, but scanning projects were prohibitively expensive. This development, combined with the project management and editorial expertise developed over the years, gives us the means to finally accomplish our long time objectives with respect to case law coverage.
We’ll continue to leverage this technology to do successive incremental additions to the collection through scanning projects for historical case law, but we should note that this new PDF rendering technology is already at work putting other types of content online on CanLII and elsewhere. For instance, Lexum worked with the Supreme Court to scan and publish all of its reports, which are now available cover to cover in that format. It’s also the type of publication model we implemented for the publication of law reviews and other content in our commentary section (by integrating the “Spiffy PDFs” to Lexum’s Qweri software) since it allows us to take documents whose shape or form may vary greatly from one source to the other while staying true to their original formatting.
In the spring of 2017, we deployed Lexbox on CanLII. This is an important shift for CanLII from a “consult and leave / search and forget” website (if that’s a thing) to a true research environment that continues to provide users with relevant information when they leave the site (through alerts), allows them to save searches and documents, and allows them to access a trail of their recent activity on CanLII so that they can pause their research at any time and pick it up later right where they left it off.
In February of this year, Lexum deployed Solex, the new version of the CanLII search engine that is more scalable, faster and more flexible. Solex will, for instance, allow CanLII to leverage cloud computing to continue to provide the fastest search experience in Canada. It also gives us the means to use AI, a field in which Lexum is increasingly active, to further improve our tools.
A bit before that, Lexum’s Qweri software was added to CanLII in order to present texts in our commentary section in a more elegant, dynamic and feature-rich interface. Some of Qweri’s magic is not obvious to our users but it’s as important as what users can see: it helps convert Word documents (and we all know how long Word documents can get slow and difficult to work with) in a web-ready HTML format with minimal manual intervention. We aim to allow authors to contribute to works that have never been published elsewhere before soon, and assuming these authors will submit texts in Word format, Qweri is what will make the process of converting the Word files sent to us in an elegant web interface quick and easy.
Last March, we took an important step towards fulfilling a foundational goal of CanLII: to provide a robust collection of commentary on the law. In addition to CanLII Connects and the small collection of books we started publishing in 2012 (most importantly the eText on Wrongful Dismissal and Employment Law generously provided by Lancaster House), we now have law reviews, reports and newsletters. By the time this post is live, we’ll have more than a thousand documents in the commentary section of the site. Perhaps most importantly, thanks to the above technologies and to the work of Lexum and CanLII staff at developing the right editorial processes, databases and metadata fields (etc.) for this new collection, we now have the ability to take on much more content without incurring a correlated spike in the costs of operating the service.
Step by step, we have created a version of CanLII that has the means to achieve its ambitions. We published a lot of new content in the last few years, but most importantly we made sure to simultaneously develop the tools, and to fine tune already sophisticated editorial processes, to allow us to sustainably grow our collections while keeping manual interventions to a minimum (and therefore keeping the costs in check). The only variable that will now dictate how far we’ll be able to go is the legal community’s willingness to embrace the publication of commentary on an open national platform.
As with primary law, it’s our view that the default model for the majority of commentary should be open, or at least that there are no longer any reasons not to have a complete and authoritative open equivalent to any piece of commentary that’s relied upon by practitioners today.
Finally, speaking of sustainability, I would note that the number of times I refer to Lexum or its technology or know-how here (“Spiffy PDFs”, Qweri, Lexbox, Solex, sophisticated editorial processes, etc.) speaks to the importance of strengthening the relationship with the team that makes all of this happen. Many of you already know that we did that by acquiring Lexum in February of this year, which is probably the biggest and most exciting news of all. I’m spending a considerable amount of time with the Lexum team and continue to be amazed by the breadth and depth of talent in this team and the new ideas we’re already generating by being even more directly in contact with them than before. I tried my best to avoid finishing this post with a cliché, but truly the best is yet to come.