The Case for Linked Data as Legal Information Infrastructure

The promise of technologies related to the semantic web is coming closer to realization. These innovations have interesting potential as ways to provide better navigation of legal information and to work as infrastructure to encourage innovation in both software development and content generation. This could be achieved by providing the means to separate the development of applications, the production of secondary content, and the development and maintenance of databases of primary legal information. This is important for legal researchers as it has the potential to remove the barriers among publishers’ platforms and facilitate better utilization of content from multiple sources.

The developing standards that form the semantic web generally, and linked data in particular, provide interesting possibilities for software development and legal information. Linked data is a format for publishing data online in a machine readable way, which allows different sources to be aggregated and referenced without knowing every potential source of information in advance. This means that once the data infrastructure is created, software developers will be able to create products leveraging the content accessed through the linked data standard, and content creators will be able to publish their paid or free content in a way that would make it discoverable by any system pointing at the linked data instance. Both activities could be carried out at any degree of cooperation with or separation from each other.

There are several proposed and existing projects to make linked datasets available for legal information. The Legal Information Institute is particularly active in making American materials available in projects such as this one for Federal regulations. Most existing legal linked data projects are looking at publishing linked primary law, though some institutions are making subject classification schemes available. For example, the Library of Congress has made the Library of Congress Subject Headings and Classification available in linked data format here, both of which include legal information.

It is important that these classification projects proceed, because they have the potential to enable navigation of legal information based on subject divisions, by aggregating information from different sources that have been identified as having the same topic. These could also be aggregated by libraries to provide more information to enrich local collections leveraging their investment in existing catalogues to provide controlled, relevant results without the added in-house work of manually identifying and describing them. The problem with accessing the data published in the previously mentioned projects in particular however, is that if they are used for Canadian institutions it will be difficult to separate Canadian from international content, and these subject classifications may not be suited to Canadian materials, which raises issues that have been confronted in Canadian legal research before. Previously developed solutions to similar problems include the development of KF Modified Classification Scheme and the Index to Canadian Legal Literature. In order to leverage the full potential of these technologies in the Canadian legal context going forward, more work will need to be done to create a workable infrastructure for Canadian legal information in the context of the semantic web.

Through our involvement with the CALL-ACBD KF Modified Committee, Tim Knight, of Osgoode Law School Library, and I have identified the conversion and publication of the KF Modified Classification schedules in linked data format as a logical next step toward a meaningful linked data environment for legal research in Canada. This will allow the linking of resources based on subject as laid out in the classification and the development of software that facilitates exploration of legal materials using a conceptual base that was designed to make sense for Canadian research. One important element of this plan is that semantic web standards provide for description of physical as well as electronic materials, and they have the ability to describe diverse sources of information including print books and people to the level of individual items, so materials can be included in the developing system without any barriers to discovery regardless of format.

This project is only the start of the data infrastructure needed to develop a legal information ecosystem that can make the most of these emerging standards, and it is projects like this that are the most important components in developing a working linked data infrastructure, because they are the linking sets, which can link other datasets. Without the publication of the originating classification standards, linked data systems have less control over what is linked together, restricting the ability to navigate among them and maintain controlled results. Going forward it is to be hoped that more classification schemes, such as an open caselaw taxonomy, will be published as linked data to continue to provide more richness to what can be discovered in this way.

Promisingly, the semantic web and open data standards have the potential to create the necessary infrastructure to facilitate opportunities for the integration of information from multiple sources into multiple platforms. And each additional information standard that is made available enriches the others, as it is possible to navigate among the standards by selecting elements that are attached to more than one schema: here is a link to the linked data cloud, showing the existing web of data. Enriching the linked data infrastructure by making Canadian legal information available in this way will ensure that the benefits of making resources more discoverable and research fit better into workflow are available to the Canadian legal industry.

