Nomus: A New Canadian Caselaw Search Engine

Here’s a turn-up for the books: there’s a new entry in the Canadian legal search engine market. CanLII notwithstanding, Kent Mewhort, a McGill law student and experienced software engineer, has launched Nomus, a free search engine for Canadian legal decisions.

This is no Google-based amateur effort, but rather a serious tool running with at least one interesting algorithm and one valuable additional feature. I’ve had a small exchange of emails with Mr. Mewhort, and some of the material in this post comes from that.

First the scope: the database is drawn from publicly available, i.e. governmental, sites publishing court decisions and at the moment comprises only decisions from the following courts forward from the dates set out:

SCC (1876); FCA (1988); FC (1990); TCC (1996); ABQB (2003); ABCA (1998); ABPC (1998); BCCA (1990); BCSC (1990); NBCA (2008); NSCA (1995); NSSC (1999); NSPC (2000); ONCA (1999); PESC (2000); PECA (2009); QCCS (2001); QCCQ (2001); QCCA (1987); YKCA (2001); YKSC (2001); YKSM (2002)

I imagine that it will continue to expand the number of courts in time as they put their decisions online directly.

The principal notion behind Nomus is the ability to search in an unstructured way using terms or phrases, although search operators are indeed supported, including the helpful “term boost” that weights results in favour of a particular configuration of your search terms. Results, which can be filtered in various ways, are delivered according to the following relevancy considerations, as described in the FAQ:

Term frequency, a measurement of how often a search term is found in a particular document;
Term uniqueness, a measure of how specific or unique a search term is;
Term proximity, a measure of how close search terms are to one another in a particular document; and
Term precedential importance, a measure created from an analysis of the citation networks between cases. This measurement includes a consideration of how frequently a particular judgment has been cited in association with a search term, the authority of the citing courts, and how recently the citing judgments were delivered.

This last is interesting. As explained by Mr. Mewhort:

Let us consider the relevancy of R. v. Grant, 2009 SCC 32. In R. v. Wong, 2010 BCCA 160, the B.C. Court of Appeal mentions that “[a]s a result of the decisions in Grant and Harrison, trial courts have been directed to take a view of all relevant circumstances in making a decision about admissibility of evidence under s. 24(2) of the Charter.” This will boost Grant (and Harrison) on any search for ‘admissibility of evidence’ or for ‘Charter s. 24(2)’ because these terms are mentioned in a context that refers to Grant.

The interesting feature I mentioned at the outset is the “Case Box,” which allows you to drag representations of search results onto a graphic, where the information is stored until you decide to download, email or print the results.

I asked Mr. Mewhort why he felt that this was a worthwhile enterprise, given the existence of CaLII. Essentially, his view is that the Nomus algorithm gives better results for the way in which most people search, which, he believes, is with unstructured keywords; and he points to the citation-in-association feature just mentioned:

With the primary goal of result relevancy, the Nomus search engine uses several sophisticated factors in search relevancy calculations that, to my knowledge, are not currently used by other legal search engines. Foremost, the search engine analyzes how often a particular judgment is cited in association with a particular topic being searched for.

For me this is still a bit of a head scratcher: the engineering is highly sophisticated but the coverage is so spotty as to make it of small use in many cases; certainly someone in Ontario, for example, is unlikely to get much joy from it, given the absence of trial court judgments. I see this as akin to a testbed, where algorithms can be developed and refined that might prove useful to CanLII, or, I suppose, the commercial databases. Which raises another matter, that of money. As the last line of the FAQ says, “For the time being, Nomus is entirely free.” Mewhort is seeking donations to support this service, a difficult quest at any time and no joke in the present climate.

I’ve not run comparison searches with Nomus and CanLII — but I’m sure that Slaw readers will do that and report to us.

Comments

  1. I agree that the spottiness of the database will detract from the coolness of the tool. I believe the index now contains about 394,000 cases (I got this figure by clicking “similar cases”, which seems to return ALL cases sorted by degree of similarity). Contrast that figure with CanLII’s 784,731. Obviously, Lexis and Westlaw are larger still.

    Hopefully this database will grow, because it has the potential to be really, really useful.

  2. Canlii’s search function is terrible, it would indeed be great if they could adapt some of this guy’s work in the area.

  3. “third year” could you give some clarification on how/why CanLII’s “search function is terrible”? I’m sure CanLII is concerned about user’s opinions.

  4. I’m not sure what “third year” is talking about – I love the CanLII interface. It is clean, simple, and easy to use. Sure, everything could use improvement as time goes on, but I think CanLII really has found a great way to present information to people.

  5. Patrick Fawcett

    Sometimes the phrase “blank is terrible” should really be read as “I don’t like blank”. As an old systems librarian who has designed search engines in the deep mists of time (known also to the Elves as the early 1980s), I’ve been very impressed by CanLII’s functionality. It’s not perfect, but it combines an impressive array of features in a very simplistic frontend. It someone thinks that CanLII is terrible, I’d be interested in their reasons.

  6. I like CanLII but third year is right… the search function is not that great… I use it constantly and I have to input data several times, in various ways before I can find the results I am looking for unless I know the exact reference to the case.

    I am in the process of trying Nomus now and so far so good… It is not amazing but when doing research, as many free tools as possible… CanLII, Nomus… it’s great!

  7. I agree a bit about the search function. It’s frustrating that when you search for Grant on CanLii it doesn’t even show up in the first ten results. I’ve yet to figure how it sorts its search results, but its neither by date (which would be good) or by importance (which would also be good). Just for kicks I tried it on Nomus, got the SCC decision first, and a quick link to 46 cases citing it. Then again CanLii had 120 cases citing it.

    Bottom line: Potentially better search function, but less data. Also not so on board with the layout, a bit cluttered and not clean enough.

  8. I ran my standard search for new e-discovery cases on Nomus and came back with different results than the ones from CANLII. When sorted by date, the “newest” case on Nomus really had nothing that related to the terms I had searched on. With CANLII the terms are highlighted in the text as well, so they’re easy to find.

    “Meh” about the Nomus interface – I like the extracted text, but not enough to make it my primary source. I’ll probably use it as a backup.

  9. Benjamin, if you search simply for [grant] in CanLII in the case name area you get a lot of grant v grant, because, I suppose, twice is more significant than once. What would help here is the ability to filter by court hierarchy. A substitute for that in many cases would be the filter for “most cited”; clicking that brings up your SCC case to the top.

    If you’ve “yet to figure out how it sorts its search results” I have to say you can’t have figured out very hard: right above your search results and to the right is an underlined string: Sort: Relevance | Decision Date | Most Cited

  10. Nomus is interesting. I find it very Google-ish and perhaps that is a benefit to some users. If you want a search engine to make a best guess on what you want without necessarily giving it a lot of data to work with, you may prefer the way Nomus performs.

    Personally, I like the Advanced Search options that CanLII has perfected.

    I worry that technology “improvements” encourage searchers to be less thoughtful about what they want. No one would have ever used Classic Quicklaw in a firm and just put in “Grant” as a general search in CJ. The cost would have been outrageous and there would have been plenty of reminders (like the max number of hits) to be a better searcher.

  11. And the mainstream media woke up to this story this morning – see the Montreal Gazette, which suggests the site’s getting 200 hits per day. Slightly less than CANLII I would suspect.

  12. Its search engine algorithm seems to have some flaws, at least for boolean “and” searches. Flaws like in being broken and not working, so apparently producing the same results as the OR search.

    For example, a boolean search on resurfice & causation pulls up cases that don’t have any version of “resurfice” at all. It shouldn’t be because the algorithm is finding alternatve spellings of “resurfice” – i.e., “resurface” because that term isn’t in the cases, easier, at least in the obviously wrong hits.

    resurfice causation as well as resurfice AND causation also produce the same results and number of hits 3783.

    It also produces 3764 hits for cheifetz & causation. Not bloodly likely, or correct, including the SCC’s Resurfice v. Hanke as the first one. Now, I know as a fact I’m not expressly mentioned in the reasons. For what it’s worth, the results are the same for the cheifetz causation search

  13. David, thanks for pointing out this admittedly major error with the & operator! The problem was that the FAQ incorrectly stated that the ‘&’ could be used, whereas in fact only ” AND “, as well as ‘+’, were supported.

    Given that the ampersand is a familiar operator for many, I’ve now updated the engine to fully support it as an AND operator.

  14. It’s still flawed, but not as badly. Note results below.

    Search results for ‘cheifetz AND causation’:Showing 1 to 10 of 19 found
    1. Resurfice Corp. v. Hanke, 2007 SCC 7
    2. Resurfice Corp. c. Hanke, 2007 CSC 7
    3. Miller v. Budzinski et al, 2004 BCSC 1730
    4.Walker Estate v. York Finch G…, 2001 SCC 23
    5.Walker, Succession c. York Fi…, 2001 CSC 23
    6.Bohun v. Segal, 2008 BCCA 23
    7.Farrant v. Laktin, 2008 BCSC 234
    8.Hutchings v. Dow, 2007 BCCA 148
    9.B.P.B. v. M.M.B., 2009 BCCA 365
    10.Jackson v. Kelowna General Ho…, 2007 BCCA 129
    11. Condominium Corporation No. 9…, 2009 ABQB 493
    12.Cempel v. Harrison Hot Spring…, 1997 Nomus 726 (BCCA)
    13.Aberdeen v. Township of Langl…, 2007 BCSC 993
    14.Misko v. John Doe, 2007 ONCA 660
    15. Wilson v. Bobbie, 2006 ABQB 22
    16.Cragg v. Tone et al., 2006 BCSC 1020
    17.Ingles v. Tutkaluk Constructi…, 2000 SCC 12
    18.Gravel v. City of St-Léonard, 1977 Nomus 94 (SCC)
    19.Gravel c. Cité de St-Léonard, 1977 Nomus 94 (CSC)

    Search results for ‘cheifetz AND resurfice’:Showing 1 to 10 of 10 found
    1.Resurfice Corp. v. Hanke, 2007 SCC 7
    2.Resurfice Corp. c. Hanke, 2007 CSC 7
    3.Walker Estate v. York Finch G…, 2001 SCC 23
    4.Walker, Succession c. York Fi…, 2001 CSC 23
    5.Bohun v. Segal, 2008 BCCA 23
    6.Condominium Corporation No. 9…, 2009 ABQB 493
    7.Hutchings v. Dow, 2007 BCCA 148
    8.Farrant v. Laktin, 2008 BCSC 234
    9.Jackson v. Kelowna General Ho…, 2007 BCCA 129
    10.Misko v. John Doe, 2007 ONCA 660

    Again, you won’t find my surname in most of those cases. You won’t find Resurfice in others. You won’t find either term in at least two case: Cragg v Tone and Ingles v Tukaluk.

  15. Okay, thanks for the detailed info, David. The site is still in beta and there are a few glitches still cropping up here and there that remain to be work out; such feedback much appreciated so I can address these issues.

    I can report that this problem is now fixed, though I will be continuing to perform deeper testing on all search operators. Searching for either for ‘cheifetz AND causation’, or ‘cheifetz & causation’, now returns eight results. I checked that all contain both your name and ‘causation’.

  16. Unfortunately, it appears that NOMUS is no more.