The 2013 AALS Annual Meeting featured a panel discussion on Understanding Search Engine Algorithms: Can We Effectively Teach Research Without Them? From what I gathered from the tweets, the panel—which featured, among others, Ian Koenig (Chief Architect for Lexisnexis) and Ed Walters (CEO of Fastcase)—gave a peak into their “black box” search functionality, something that has been a subject of vigorous debate among researchers since WestlawNext + WestSearch was introduced three years ago. I think Sarah Glassmeyer (Director of Content Development for CALI.org) summarized many researchers’ feelings best when she tweeted:
Google (and google type search interfaces) infantilize people and take away user power to control search results.
This notion of controlling search results is interesting because it presupposes the answer to any query will either be yes or no. But as lawyers we know that answers to a law-related query are rarely binary, which is why we continue to run controlled search after controlled search until our gut tells us, “I think we got it” (or we know the client won’t pay for any more research time). In his forthcoming paper on quantitative legal prediction, Dan Katz states the problem nicely:
[W]hen individuals engage in legal reasoning they engage in a high level, high dimensional search of the space of possible reference cases. In that search, similarity and dissimilarity are the drivers. Heuristics are used to define the stopping conditions. The science of legal search (legal information retrevial) is driven in substantial part by a notion of similarity. Humans do not (cannot) exhaust the space and this is just one reason why humans + machines > humans or machines. Legal search intermediary companies such as Google, Lexis, Westlaw, etc. aid lawyers by allowing them to make better sense of the sea of potentially relevant legal information. The problem with today’s legal search is that the body of results is typically substantial and thus the human (lawyer) must still engage in substantial filtering of the results. Much of the weight is put to the human reasoner to determine which cases are potentially useful or harmful to their particular position.
Katz, Quantitative Legal Prediction – or – How I learned to Stop Worrying and Start Preparing for the Data Driven Future of the Legal Services Industry, 63 Emory L. J. ___ (2013 Forthcoming) (emphasis added).
As lawyers, our arguments depend on similarity, usually to a seminal or starter case. But as Katz states, obtaining these similar (and hopefully “on all fours”) cases is “actually fairly difficult because most cases share some level of similarity with other cases.”
It’s a similarity problem that WestSearch was attempting to solve, as suggested by the numerous posts written about it and most recently by U.S. Patent No. 8,321,425 (Nov. 27, 2012), titled Information-Retrieval Systems, Methods, and Software with Concept-Based Searching and Ranking.
One problem identified by the present inventors concerns operation of typical search engines, which require queries and documents to contain matching words. This is problematic for at least three reasons. First, search results may include documents that contain the query term but are irrelevant because the user intended a different sense (or meaning) of that query term that term matching fails to distinguish. This ultimately leaves the user to manually filter through irrelevant results in search for the most relevant documents.
Second, reliance on matching query terms to document terms can also result in search results that omit conceptually relevant documents because they do not contain the exact query terms entered by the user. Retrieving these relevant documents using a traditional search engine requires the user to appreciate the variability of word choices for a given concept and construct better queries. Alternatively, users may simply do without these valuable documents.
And third, traditional keyword search engines score and rank the relevance of documents based on the presence of query terms in those documents. This means that some documents with matching query terms and with non-matching but conceptually relevant terms may be ranked lower than desirable given their actual conceptual relevance to a given query. These erroneous lower rankings may force the user to wade through lesser relevant documents on the way to the more relevant documents or to overlook some of these documents completely.
Accordingly, the inventors have identified a need to further improve how information-retrieval systems process user queries.
As Thomas Smith observed in Web of Law, the U.S. citation network is highly skewed, with much of the legal authority concentrated in very few cases and the majority of it “dead.” Smith recognized that to truly explore the case web, we needed to move beyond terms towards concepts, which is why he launched PreCydent. Since then, concept-based searching (and now ranking) has only gotten more robust, and from where I sit, it would appear that concept-based search—as expressed in its various equations—and ranking is the best model for facilitating discovery. And the more we use it, as the feedback economy will tell us, the better it will get.
Now, the argument that search algorithms are attempting to supplant thinking (thus the cry of infantilization) is not entirely lost on me. It is a hard charge to refute when we start talking about how we will achieve the mechanical replication of a lawyer’s analogical reasoning. On the other hand, the case web is like a rising sea, and if we don’t stop hand-wringing over algorithms and longing for the good old days, we will drown. And to avoid drowning, we need technology to make good on its promise to deliver guided-search tools. More importantly, we need a transformative product, one that moves beyond the box and expresses itself in ways that Katz suggests—
People who cite Case X also cite Case Y.
Lawyers who argue this principle also typically argue this principle.
Given the mixture of argument and content in your brief, have you considered this argument and content which is largely analogous to your argument and content?
We need a product that says, “Why yes Dave, I can do that.”