Search Engine Results as Evidence

by John Gregory

Comments

Mike

January 16th, 2015 at 12:40 pm

From paragraph 54 of the judgment:

“In a second report, Mr. Trumper reviewed Mr. Joffe’s report. He describes the report as meaningless, inaccurate and misleading, based on false assumptions and incorrect data. He says that Mr. Joffe has misused statistical models and has failed to apply logical reasoning. Among other criticisms, Mr. Trumper points out that the number of initial “hits” identified on a Google search is not reflective of the number of times the results actually appear on web pages and still less reflective of the underlying content of the particular pages. Moreover, the fact that there are a number of “hits” in response to the query “Canon Digital Camera Error” does not tell one anything about the underlying truth of the assertions made on the web pages.”

What should be done: bring better evidence
Ken Chasse

January 17th, 2015 at 12:33 pm

Keyword searching by search engines is unreliable. See this article: Victoria L. Lemieux and Jason R. Baron, “Overcoming the Digital Tsunami in e-Discovery: is Visual Analysis the Answer?” (2012), 9 Canadian Journal of Law and Technology 33 at 35:

Indeed, we know that current e-discovery search methods are not sufficient to overcome the digital tsunami: the most common methods currently used in e-discovery – keyword searching and linear review – are increasingly ineffective for the massive volumes of data that must be sifted through for each case. There have been a number of studies highlighting the limitations of existing search and retrieval techniques. In one study lawyers overestimated the effectiveness of their keyword-based search strategies by as much as 55%. Dabney (1986), Bing (1987) and Schweighofer (1999) all provide in-depth reviews of the limitations of full text searching for legal documentation. More recently, a multi-year study evaluating the efficacy of various search methods known as the “TREC Legal Track” demonstrated that traditional Boolean search methods failed to find up to 78% of relevant documents that other automated search methods accounted for (Tomlinson et al, 2008). … .

All of these prior reports and studies are in line with results of an online survey of legal and technical professionals in the UK and two roundtable discussions on e-discovery conducted by PwC [PricewaterhouseCoopers] indicating that keyword searching is increasingly untenable. Panelists noted the difficulties of choosing key words, reporting that ‘[e]ven if you have a brilliant, absolutely focussed search, you are still going to end up with too many documents to review and within those there will still be a very large proportion of irrelevant material.’ Data volumes are quickly becoming such that even with the best keyword search terms and an army of reviewers, it could still take months or years to sift through all the data and there would still be no guarantee of satisfactory results. New approaches are therefore very much needed.” [footnotes omitted]

Therefore, is the efficacy of “predictive coding” and other “technology assisted review” devices, used to reduce the cost of the “review” stage of electronic discovery, undermined by their reliance on keyword searching strategies?

“Predictive coding” is a document review technology that allows computers to predict particular document classifications (such as “responsive” or “privileged”) based upon coding decisions made by those knowledgeable as to the subject matter. In the context of electronic discovery, this technology can find key documents faster and with fewer human reviewers, thereby saving much time to conduct document review for finding relevant and potentially privileged documents.

A detailed description of the use of predictive coding devices is found in, Dynamo Holdings Ltd. Partnership v. Commissioner of Internal Revenue (U.S. Tax Court, Nos. 2685-11, 8393-12, Sept. 17, 2014); online: (click “available here” at the bottom of the page). And it is mentioned in, L’Abbé v. Allen-Vanguard Corp. 2011 ONSC 7575, [2011] O.J. No. 5982, at para. 23: “Various electronic discovery solutions are available including software solutions such as predictive coding and auditing procedures such as sampling.” But whether predictive coding can make common, ordinary size litigation affordable to a majority of the population is yet to be proved. And given the substantial criticism of keyword searching, is predictive coding’s efficacy undermined by relying upon keyword search strategies?

Most Recent Comments

Michael Jakeman on Risk Management Revisited (Again): Navigating the Frontier of AI Regulation:

AI use and regulation are exciting topics but challenging to follow. The background you’ve provided is appreciated and I hope… more »
Michael Litchfield on Risk Management Revisited (Again): Navigating the Frontier of AI Regulation:

Thank you Kari. Very happy to be contributing again and nice to hear from you! more »
Michael Litchfield on Risk Management Revisited (Again): Navigating the Frontier of AI Regulation:

Thank you Kari. Happy to be contributing again and its great to hear from you! more »
John Willinsky on AI Today: Grand Theft Auto or Public Benefactor?:

Thanks, Verna, I'm in full agreement on the importance of crediting sources, both as a reward and for verification purposes.… more »

+ -

Spring Roundup of Legal Information News From Washington DC

From Pillar to Post: Signs of the Times in Law Publishing

Risk Management Revisited (Again): Navigating the Frontier of AI Regulation

AI Today: Grand Theft Auto or Public Benefactor?

Does AI Have a Soul? Can AI Show Empathy?

Describing a Police Shooting: A Lesson in Legal Writing

Search Engine Results as Evidence

Comments