In October, the Harvard Law Library announced that it is digitizing its entire collection of United States case law. Coincidentally, I am spending a year with the Harvard Law Library Innovation Lab (the part of the law library responsible for the digitation project) as a Research Fellow, so I’ve had a front row seat to their digitation efforts. (Literally. The shelves where they store books for scanning are right by my cubicle.) I’m not directly involved with the digitation efforts – thus far I’ve been spending my time researching how state governments publish their law online. An excruciatingly detailed report is forthcoming, but in the meantime, given this recent news and discussion here on Slaw about it, I thought it would be a good time to share some of what I’ve learned and unpack the term “closed” with regards to published law.

Access to law does not exist on a binary of “closed” or “open.” Rather, there’s a gradient of openness, mainly determined by the use case of the published law. When thinking of use cases, I found Colin Lachance’s categorization of retail vs. wholesale needs to be very helpful. Retail legal publishing is aimed at either the public or legal practitioners who are using the publishing platform for research purposes. Wholesale legal publishing is meant for the creators of a legal information secondary market and usually takes the form of bulk data publishing. Some publishing practices are perfectly suitable for retail needs, yet have the effect of being “closed law” for wholesale needs.

Through the course of my research, I found several ways in which state published law is “closed.” As you will see, some of the factors causing “closed-ness” are overlapping and related. In most cases, the “closed-ness” does not result in an absolute barrier to access the information, but rather a hurdle – it depends on the situation whether or not that particular hurdle will be the one that makes the legal information completely inaccessible. Also, they almost all start with the Letter C. I’m not entirely sure how that happened. They are (in no particular order):

  • Cost – Law in the United States is in the public domain, so one would assume that a state wouldn’t even consider charging money to access it. Wrong! This can range from charging a few dollars for a “certified” or “official” PDF copy of case law (as in Georgia) or up to $119 to access a database of regulations (as in Massachusetts.)
  • Citation – There are two aspects to this. One, most jurisdictions in the United States require usage of The Bluebook, a proprietary citation system, to properly format citations in court documents. Two, the citation rules often require cites to “official” versions of law, which I found more often than not to be a costly version of a resource.
  • Corporate Outsourcing – It’s very common for states to rely on private publishing companies to publish their state law. This isn’t inherently terrible, as professional publishing operations are much more likely to publish law in a timely manner. However, being a for-profit business does drive up the cost of law. Additionally, I have found instances where corporations have wrapped the public domain law with copyrightable material, making it hard to extract the “free law” from a publication.
  • Copyright – As mentioned above, law in the United States is presumed to be in the public domain. However, that does not stop states from claiming copyright on their law. See, for example, the initial warning notice on the Arkansas’ Code. There is also an interesting case based on Carl Malamud’s distribution of the Georgia Code which contains annotations which may or may not be an “edict of government” and thus non-copyrightable.
  • Content – If you are reading this, you are probably aware of the intricacies of legal research that there is a certain amount of content necessary to search in order to conduct thorough research. This is a fancy way of saying that only having case law back to the mid-1990s is probably not sufficient for most research needs and, unfortunately, that is the current case law offerings of most states. There is also a dearth of archives maintained in the regulatory and statutory arenas.
  • Container – The most often used container for publishing law online is the PDF. While PDFs are an open standard, this poses some challenges for both the wholesale and retail user of law. For the wholesale user, it is very difficult to extract text from the PDF, and in a discipline like law where the very placement of a comma is critical, this is unacceptable. For the retail user, the PDFs are difficult to search as well as unwieldy to use. I found instances of 1000+ page PDFs posted online which took several minutes for a hard wired connection to download. Considering that many people use mobile devices and data to access the internet, this makes accessing the law contained within these PDFs impossible.
  • Context – Law does not exist in a vacuum. It is very likely that a legal situation will be covered by a mélange of case law, regulatory law and statutory law. However, no government legal information distribution site allows for cross searching of types of law. One must visit at least three different websites to access the law, some times more for updating purposes.
  • Citator – Related to the above, the later interpretations of law are important for understanding its relevancy to a particular use. In the United States, it is an ethical lapse to not “Sheparadize” or “note up” a piece of law to check it accuracy. As it stands, no state provides a citator for its law, opening retail users up to a possible mistake in use.
  • Currency – Law is a constantly changing body of data. What may be good law today could be overturned or repealed tomorrow. It is vital that a researcher use as current as law as possible. However, some states are lax in either updating their legal offerings or not making it entirely clear as to when the law was published.
  • Correctness – It was a shock to realize that, especially in the case law arena, courts were not guaranteeing that the cases that they were publishing online were the actual correct law. In fact, most were publishing just slip opinions and not indicating when or if they were swapped out for later changes by the court. For an example of this, see this article describing the (fortunately) now no longer needed Scotus_Servo.
  • Control – Even if a state doesn’t try to copyright their law, I found several instances of them applying a Terms of Use to their legal information publications that attempted to put limits on how a person could use the information. Some examples include “for personal use only”, “for non-commercial use” (leading one to wonder if a practicing attorney could use it in her practice), and prohibitions against selling the information.
  • Cataloging – Legal information is a tough environment to navigate – the information is dense, voluminous and uses terms of art that wouldn’t necessarily be considered by a non-practitioner or a practitioner new to the subject area. I personally have my doubts whether or not full text searching is sufficient or useful in accessing legal information. As it stands, no state provides an index to its case law.
  • Search – Finally, in spite of what I just said above, having search mechanisms that allow for advanced searching and are dedicated to just the legal information on the website (instead of the entire website) would greatly improve access to the law. However, very few of the legal information websites that I visited allowed for this.

I entered into the project with a very idealistic notion that all law published by governments should be fully open and available. After all, access to information is access to justice, and if there’s one thing the legal world needs, it’s increased access to justice! In retrospect, I now realize that most of my definitions of “open” were mainly applicable to the wholesale distribution of law. While it could be argued that wholesale distribution of law will create more retail (free or low cost) legal information distribution points and thus increase access to justice – indeed, it would solve many of the closed problems of legal information that I listed above – I’m now not entirely convinced that this is the only way to go for state government publication efforts. Is a functional retail site better than wholesale distribution which may or may not be transformed into something useful? One of my main takeaways from this research is that we need to decide how we want our government to publish law – retail, wholesale or both? And related, how open is enough?


  1. Wonderful analytical summary of key points. Wish there was a like button.
    A post that should be shown to any researcher. Even newbies.

    Perhaps websites have to post a disclaimer or user notice, that downloading memory intensive, very long .PDFs for full text law, etc. can be slow/problem with mobile devices. Unless there is brilliant way to compress a document AND it can be reliably opened by any version of viewing software in-perpetuity. It’s a constant problem not just in the legal sector.

    Lack of human vetted product, a legal citator for noting up law, “Shepardize”, while greatly desired and used (by primarily trained legal professionals) in legal sector, wonder how other disciplines deal with equivalent noting up on critical info. Has the legal publishing industry checked the latest online “note-up” methods for medical publishing (other than auto-semantic matches with synoymns for drugs, etc.)? I choose medicine because information is life-risk oriented.

  2. David Collier-Brown

    PDF is a brilliant printer control language. Alas, that’s not what we need.

    A very plain text with a few tags from html 5 is what we should use for text documents, roughly what most email programs support.

    Add a font tag and send it to a printer, take out a few tags and call it an e-book. Simplicity is the lawyer’s friend. And the publisher’s friend. And the nerd’s friend.

  3. David Collier-Brown

    Speaking to Jean’s point, a former employer had to get their text in a very simple format to be able to reliably recognize citations, and note up the work. The same is true of Science Citation Indices, the academic’s “Shepardizing”.

    We all have much the same needs, and if our work is to be possible, whether retail or wholesale, we need the same thing: wholesale access, and not just private profit.

