Using Data to Leverage Access to Justice

During the closing session of the Canadian Bar Association’s Envisioning Equal Justice Summit: Building Justice for Everyone, held in Vancouver in April, we were asked to come up with one idea from each table to help access to justice and to move justice reform forward. Through the lens of legal information and based on the sessions I attended Saturday, the thing I would like to discuss further is developing expertise in analyzing and explicating existing datasets and creating the structures to collect new data about the legal system to assist in evaluating the effectiveness of programs and demonstrating what aspects of existing programs are most effective.

In “Measuring Effectiveness of Access to Justice Initiatives,” a panel discussion held Saturday morning, there was a discussion about the lack of standard definitions, which limits organizations’ ability to evaluate and compare initiatives’ effectiveness. For example, what is a “case” for the purpose of analyzing effectiveness of a program? Legal professionals tend to think of a “case” as an individual matter and would generally measure it that way, but the speakers observed that individuals interacting with the justice system tend to think of all interactions with the court system as one experience, so criminal, family law, landlord and tenant, and employment proceedings may all affect perceptions of each other. How can cause and effect be measured when there is no standard linking system among matters that are linked in the minds of the parties? Because participants’ perceptions of the assistance provided by organizations are a component of success, this is not an irrelevant consideration in development of analysis techniques for access to justice initiatives.

Some data surrounding court processes is starting to be made available online. For example, British Columbia’s DataBC lists the following datasets from Court Services. This is meaningful progress, but the available datasets don’t provide detailed information that would enable analysis of the social aspects of the justice system, though a simple analysis of numbers of matters opened and closed would be possible, which is one possible measure of productivity. Court Services Online, the closed system providing access to internal court documentation and information, provides much more extensive information on a cost per use basis. It is likely privacy constraints will continue to make full public access to information in a format that is easily linked with individuals undesirable, but that doesn’t mean organizations and researchers looking into access to justice issues shouldn’t collect the information to be used internally and potentially shared with other organizations and researchers.

This lack of shared definitions of elements in the legal environment also extends to matters’ outcomes. Contrary to the expectations created by Hollywood movies, where so many lawyers “have never lost a case,” legal outcomes are complex and can be considered wins or losses depending on context. It is difficult to evaluate the effectiveness of interventions when situations and outcomes are so varied.

Legal information also tends to be distributed in semi-structured formats, which makes it difficult to extract outcomes without manual examination of the text. Some information can be extracted using automated algorithms or techniques like regular expressions, but these require an available set of standard documents to work from. This documentation is not always readily available for the kinds of issues most commonly dealt with in access to justice initiatives, as written judgements are often not published for routine matters.

This is the reason commercial products like “quantum” services are attractive – and expensive. If the information on criminal sentencing and damages awards was more standardized and readily available, it would be much easier to run keyword searches for particular terms, and there would be less need for human indexing. This may change in the coming years with improvements in computer assisted analysis of free text, when it may be more feasible to run dynamic queries against large bodies of information for concepts like “assault” and related outcomes on verdicts and sentences with structured outputs.

This is unlikely to happen any time soon as many relevant court documents are not published, and part of the value in publishers’ offerings is their willingness to obtain unpublished, but public, documents from courts and provide information relating to those matters in a way that allows for easy comparison. From my understanding, the problem of developing a computer program to analyze the documentation is less likely to be a limiting factor than the lack of accessible documentation from the courts in various jurisdictions. With existing levels of access to public but obscure documentation from the courts, it is not surprising that highly confidential documentation relating to legal assistance programs is even more difficult to obtain and analyze. It was observed in the panel discussion that collection of data by organizations, without releasing or controlling it, limits its use for wider research and analysis, which in turn limits organizations’ ability to evaluate innovation.

This is not intended to say that ideas for innovation in access to justice should be censored before they can be fleshed out or evaluated because their impact cannot easily be measured. Qualitative outcomes may be as or more important than quantitative ones. It was observed in the closing plenary that the consistent demand for additional data before making a decision can be a way to avoid change. It may also be that computer science has progressed to the point where it doesn’t make sense to try to develop standards for evaluation, because improvements in techniques like text mining and semantic analysis will facilitate analysis of legal information in its native format in a way that is accessible to small organizations.

There have been great improvements made in medical care by implementing evidence based medicine, which encourages the regular consultation of the medical literature for best practices and the synthesis of large bodies of statistical information to provide the best information available. Creating shared definitions for baselines, interventions, and outcomes, then measuring them regularly, may help initiate similar improvements in access to justice, by creating ways to compare interventions and providing rationale for implementing them in other organizations. This would help share successful techniques and build on them over time.


  1. David Collier-Brown

    In a previous life, I had to extract concrete information from semi-structured data, rather like case reports (they were standardized computer documents).

    I was pleasantly surprised to find, among the multitudinous stylistic markup, some tags that specifically identified the data I wanted.

    I learned later that they had been applied in Word by a summer student, who colored specific phrases and words either red, green or black, and a macro (or perhaps an xslt script?) then turned them into hidden markup for the benefit of a team who wanted to use the data.

    I suspect the same thing could be done after the fact by underpaid students, and in real time by the case-report editors, if someone with the appropriate skill could do a bit of Word/WordPerfect macro-writing.


  2. Great post! Readers might be interested in — an impressive example of archiving and structuring legal text for this kind of analysis.