Predictive Coding, Discovery, and the Conservation of Quantum Information

In classical physics, energy can neither be created nor destroyed—Albert Einstein. In quantum physics, information cannot be created nor destroyed. If information is missing from one system, it must be in some other system. Therefore, one could re-construct a headache if one could get sufficient information as to exactly how the aspirin used, worked on nerves and other parts of the brain and body. Such procedure would enable problems having many variables to be solved much faster.[i]

Electronic “predictive coding” devices that automate the “reading” of thousands of records for making production for electronic discovery, present such problems. The words and phrases in records require thousands, even millions of choices to be made as to issues of law and fact concerning relevance and privilege. “Predictive coding” is a document review technology that allows computers to predict particular document classifications (such as “responsive” or “privileged”) based upon coding decisions made by those knowledgeable as to the subject matter. In the context of electronic discovery, this technology can find key documents faster and with fewer human reviewers, thereby saving much time to conduct document review for finding relevant and potentially privileged documents. A detailed description of the use of predictive coding devices is found in, Dynamo Holdings Ltd. Partnership v. Commissioner of Internal Revenue (U.S. Tax Court, Sept. 17, 2014), being a case that rejects the objection that predictive coding is “an unproven technology.”[ii] It quotes the follow paragraphs from a recent article:[iii]

Unlike manual review, where the review is done by the most junior staff, computer-assisted coding involves a senior partner (or team) who review and code a “seed set” of documents. The computer identifies properties of those documents that it uses to code other documents. As the senior reviewer continues to code more sample documents, the computer predicts the reviewer’s coding. (Or, the computer codes some documents and asks the senior reviewer for feedback.)

When the system’s predictions and the reviewer’s coding sufficiently coincide, the system has learned enough to make confident predictions for the remaining documents. Typically, the senior lawyer (or team) needs to review only a few thousand documents to train the computer.

Some systems produce a simple yes/no as to relevance, while others give a relevance score (say, on a 0 to 100 basis) that counsel can use to prioritize review. For example, a score above 50 may produce 97% of the relevant documents, but constitutes only 20% of the entire document set.

Counsel may decide, after sampling and quality control tests, that documents with a score of below 15 are so highly likely to be irrelevant that no further human review is necessary. Counsel can also decide the cost-benefit of manual review of the documents with scores of 15-50.

But note this phrase in the second paragraph above, “review only a few thousand documents to train the computer.” With its present capability, predictive coding will at best, render big records-dependent litigation sufficiently less expensive, but not the cost of litigation of lesser size, which is most cases. And, there has been some criticism of the accuracy of keyword searching strategies, as are used in predictive coding.[iv] The cost for an experienced lawyer to feed-in and review the results necessary to train the device to a sufficient level of accuracy will not be affordable for any but the bigger cases. Therefore, the underlying strategy should be changed, instead of waiting for predictive coding to be made sufficiently economical by improved applications of the law of the conservation of quantum information.[v]

Clients should index all of their records as they create them or receive them from other sources, just as they sort their financial information into their financial records. For all records and information, a “front-end indexing” is a far better way of sorting than is a “back-end searching and reading” of them. That enables the client’s lawyer to use the client’s index to both search and review records in the clients records system, as a combined, single job.[vi] And if there are very large piles of records to be indexed by the client, predictive coding can automate such work. But like the keeping of financial records, such “sorting” is best done as an on-going, everyday process. Clients know their technology and therefore the technical terms to be used for indexing their business records. Such indexing and sorting into types of records and information is done to facilitate accessing information continuously for daily business purposes, as well as for litigation, auditing, or other investigations.

The National Standard of Canada, Electronic Records as Documentary Evidence imposes the following indexing requirements:[vii]

6.5.1 General

Indexing is a vital part of storing and retrieving information on an RMS [records management system] program. Indexing, which can be automated or manual, shall include the following functional requirements:

a) the specification of the indexing methodology and scheme used;
b) type and structure of indexing used, including the primary index element as well as all additional levels of indexing;
c) methods for performing quality control of indexing;
d) procedures in place to amend inaccurate index data;
e) where an index entry references deleted or expunged information, the index shall reflect the deleted or expunged status; and
f) procedures for performing quality assurance of the indexing.

6.5.2 Index retention, rebuilding and recovery

Index data shall be kept for the retention period of the SRI [set of recorded information] to which it relates. The procedures for rebuilding an index, changing an index structure, and recovering a damaged or faulty index shall be authorized and documented, as well as all results of such events.

By complying with such indexing requirements instead of manual or machine reading of texts, the same three features that facilitate legal research can be brought to clients’ records management: (1) highly indexed and summarized materials; (2) expert searching and reviewing; and, (3) the speed of electronic searching. As a result, the “proportionality” concept of electronic discovery is not needed to limit the amount of legal research that one party inflicts upon an opposing party by way of raising many issues and bringing many applications before and during trial. And the need for electronic discovery’s “proportionality concept” should be much reduced if not eliminated.[viii] Similarly, a client doesn’t give its accountant thousands of records containing financial information and say, “here, you make up the necessary financial records, and then do the audit.” Instead, the client does the sorting of financial information into its financial records on a continuing, daily basis for purposes of accessing information used throughout each business day, as well as for auditing. Accessing, sorting, and reviewing records is far more cost-efficiently done by way of a “front end” indexing of records when one is familiar with them, than by a “back end” reading of records because one is no longer familiar with them.

Those using electronic records as evidence should know the national standard. It is also necessary for applying the electronic records provisions of the Evidence Acts (e.g., ss. 31.1-31.8 of the Canada Evidence Act; s. 34.1 of the Ontario Evidence Act). They were enacted to enable all digitally stored records to be accepted as original records. Their key phrases, “records integrity,” “electronic documents system,” and “integrity of the electronic records system,” were written to depend upon authoritative standards, such as the national standard, for definitions, and for authoritatively established principles for the use of electronic records management systems.[ix]

[i] See these articles: (1) “New analysis eliminates a potential speed bump in quantum computing” :;

(2) “The Road to Quantum Computing” :; and,

(3) “Researchers prove quantum algorithm works by solving linear equations on a quantum computer” : .

[ii] Dynamo Holdings Ltd. Partnership v. Commissioner of Internal Revenue (U.S. Tax Court, Nos. 2685-11, 8393-12, Sept. 17, 2014) online: <> (click “available here” at bottom of the page). And predictive coding is mentioned in, L’Abbé v. Allen-Vanguard Corp. 2011 ONSC 7575, [2011] O.J. No. 5982, at para. 23: “Various electronic discovery solutions are available including software solutions such as predictive coding and auditing procedures such as sampling.”

[iii] Ibid at 13-14: Andrew Peck, “Search Forward: Will Manual Document Review and Keyboard Searches be Replaced by Computer-Assisted Coding?” L. Tech. News (Oct. 2011), at 29. The Court then states: “The substance of the article was eventually adopted in an opinion that states: ‘This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.’ Moore v. Public is Groupe, 287 F.R.D. 182, 183(S.D.N.Y. 2012), adopted sub nom. Moore v. Publicis Groupe SA, No. 11 Civ. 1279(ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012).’”

[iv] See: see: Victoria L. Lemieux and Jason R. Baron, “Overcoming the Digital Tsunami in e-Discovery: is Visual Analysis the Answer?” (2012), 9 Canadian Journal of Law and Technology 33 at 35: “…the most common methods currently used in e-discovery — keyword searching and linear review – are increasingly ineffective for the massive volumes of data that must be sifted through for each case. There have been a number of studies highlighting the limitations of existing search and retrieval techniques.” The conclusion states: … “effective information retrieval in today’s complex litigation requires a variety of tools and approaches, including a combination of automated searches, sampling of large databases, and a team-based review of these results.”

[v] The articles sited in note 1 supra, state inter alia, that quantum computers could work much faster than conventional computers in handling complex processes such as image and video processing, genetic analyses, weather prediction, and internet traffic control. “We would be able to search through a large amount of data, regardless of their nature.”

[vi] See: Ken Chasse, (1) “Solving the High Cost of the ‘Review’ Stage of Electronic Discovery” (April, 2014, on the SSRN: (for a free .pdf download). And also on Slaw, April 17, 2014. And (2) “The Dependence of Electronic Discovery and Admissibility Upon Electronic Records Management” (January, 2014, on the SSRN:

[vii] The National Standards of Canada for electronic records management are: (1) Electronic Records as Documentary Evidence CAN/CGSB-72.34-2005 (“72.34”), published in December 2005; and, (2) Microfilm and Electronic Images as Documentary Evidence CAN/CGSB-72.11-93 (“72.11,” updated to 2000; first published in 1979 as, Microfilm as Documentary Evidence). 72.34 incorporates all that 72.11 deals with but 72.11 has remained the industry standard for “imaging” procedures, i.e., the large industry devoted to converting original paper records to digital storage, of which many organizations still have large volumes. These standards were developed by the CGSB (Canadian General Standards Board), a standards-writing agency within the federal department of Public Works and Government Services Canada. They are currently being updated by a CGSB-sponsored expert drafting committee. The CGSB is accredited by the Standards Council of Canada as a standards-development agency.

[viii] “Disproportionality” objections based upon bad records management should not be tolerated. See the list of records management defects listed in these articles by Ken Chasse: (1) “The Sedona Canada Principles are Very Inadequate on Records Management and for Electronic Discovery” (Nov. 25, 2014, on the SSRN at:; and, (2) “Electronic Records as Evidence” (May, 2014; on the SSRN at:

[ix] See: Ken Chasse, “A Legal Opinion is Necessary for Electronic Records Management Systems” (Oct. 2014, on the SSRN).

Comments are closed.