The Sweet Morbidity of Link Rot
A couple years ago, the New Yorker ran a great, comprehensive piece on “link rot”—that scourge of dead-end links and vexing “404” errors that annoys us all and ensures the Web’s enduring reputation as an “ethereal, ephemeral, unstable, and unreliable” ravel of non-sequiturs.
The article charts the curious history of the Wayback Machine—that most indispensible weapon in the fight against link rot—and mentions the “disastrous” effects for lawyers and judges who seek to erect houses of reason on the quicksand of internet sources.
It is all quite topical given the Supreme Court of Canada’s recent move to tackle the threat of link rot in its own cases. See Michel-Adrien Sheppard most recent post.
The New Yorker article offers galling facts that underscore the immediacy of the link rot threat:
According to a 2014 study conducted at Harvard Law School, “more than 70% of the URLs within the Harvard Law Review and other journals, and 50% of the URLs within United States Supreme Court opinions, do not link to the originally cited information.”
I know what you’re probably thinking:
- 50 percent of US Supreme Court links don’t work?
- Surely judicial citations should offer better odds than the house edge in a casino?
- Is a half-sober blackjack player really more bankable than the links we’re putting in our jurisprudence?
Understandably, more and more courts are taking action. And that’s good, since the alternative is appaling. One source indicates that the odds of a link rotting out is 20% in just one year, and that by year five the odds of link rot rises to 50%.
This all made me a bit curious about the SCC’s links. Since Michel-Adrien Sheppard already shared the link to the SCC’s list of internet sources and terms of use, it was easy for me to take the next step in my own curiosity. The SCC list includes the original URL citations plus corresponding archival versions for every link ever referenced in an SCC decision. If that sounds to you like it might be a very long list, you will be surprised to learn, as I did, that it is in fact quite a short one.
The SCC releases around 60 decisions per year on average (some years less, others more), yet from 1998 to the end of 2016 the total number of cases with links to internet references tops out at 119 cases. They account for a total reference count of only 206 URLs over 18 years.
With a relatively manageable data set, it was easy to test the original links and see how healthy they were. You can see that even for the most recent link citations (total of 29 URLs over 16 SCC cases in 2016), while 72% are healthy (21 of 29 report OK) almost a quarter are deteriorating (7 of 29 are redirects) and one is already broken.
Note the table has no data for 1999-2000 since no links are listed by the SCC in those years.
Because links can have different degrees of health—and because some bad links fail outright while others are propped up by redirect crutches—I used a link checker that also recorded server response codes. Green indicates healthy links (server code 200), yellow indicates page links with redirects or that are permanently moved (code 302 & 301), and red indicates pure 404 fails. In my assessment, a redirected or permanently moved webpage is moribund, if not exactly dead, as it’s usually just a matter of time before the redirect also becomes a casualty.
At 2011, the dreaded five year mark, the original SCC internet source links are right near the morbidity sweet spot. It’s not quite 50% like the US statistic, but out of 17 URLs only three report as OK—10 indicate redirects, and four outright fail.
Incidentally, the SCC appears to use a bespoke tool that generates PDFs of cited internet sources and stores them on the Court’s own servers. I checked the properties of a handful of the PDF archive records and noticed that most were processed in January this year.
There is, however, another tool called Perma.cc that’s open widely to all courts, law faculties, journals and academic libraries (see here for a list of partners). Courthouse Libraries BC is one of the 800-odd archiving organizations that uses the service, and from my own experience it is a slick, simple to use, peace-of-mind-giving tool that is already saving us from link rot in one of our legal publishing projects. I’d be interested to hear what other Canadian courts are doing.
It appears that the SCC decided against using Perma.cc, and I assume the reason is mostly precaution around controlling the archive itself. Perma.cc is headquartered in the US, and is administered by the Harvard Library Innovation Lab.
People looked at Perma.cc but their servers are not in Canada.
— Michel-A. Sheppard (@613mash) January 30, 2017
I can certainly see the sense in the SCC relying only on their own resources, although other archiving organizations (and there are plenty of courts among them) must rather appreciate that Perma.cc is a free tool (for qualified users) that hosts and enables archives using not just PDF snapshots, but code-based captures (which preserve more of the dynamic content of a website) in addition to graphical screenshots (PNG images).
Anyone have anything else to add about the array of archiving tools and solutions for Canadian (or elsewhere) courts?
One thing I noticed is that the majority of the SCC links were to government or other robust institutions (Hansard, Statistics Canada, law societies, CJC, etc.) and much fewer pointed to private or company websites than is typical for ordinary internet users. This, obviously, is not surprising. Most of us are not penning SCC judgments and vetting every source as vigorously as top judges must. With this, however, I thought there would be a higher survival rate among links. If this is true, it is not by a wide amount.
— Nate Russell is a liaison lawyer with Courthouse Libraries BC. Find him on Twitter @nrusse.
I would also be interested in hearing what other Canadian courts are doing about this. At present, no link rot protection initiatives are underway at the Manitoba Court of Appeal. Are there any Canadian courts using Perma.cc? Is Courthouse Libraries BC using it to prevent link rot in BC case law?
Nate — You might be interested in the “Amber” service announced by Harvard’s Berkman Center for Internet and Society, the same folks who helped develop the Perma.cc service. Amber is a free software tool for websites and blogs that preserves content and prevents broken links. Here’s a link to the announcement, which provides a good description: http://today.law.harvard.edu/berkman-center-releases-tool-to-combat-link-rot/
Thanks Louis,
Great tool. I’m assuming Amber and Perma.cc have basically the same genes (same parents), but the latter is for courts/academic libraries/law journals, whereas Amber is for folks (bloggers. news orgs, wikis, researchers, independents, activists, etc.) with their own WordPress or Drupal sites. Perma.cc is also the repository for its users, whereas Amber uses the host’s own site or potentially other third party services.
Other differences: Perma.cc is a manual archive/linking tool (you have to click in the dashboard or Chrome extension to create each link as you go) whereas Amber runs on autopilot (archive/linking all the links you type).
Interesting news for courts, it looks like if you have both a Perma.cc account AND run a WP or Drupal CMS, you can configure Amber to save to your Perma.cc account. This seems like a good combo approach, however I’d note that the Amber tool obeys norobots.txt disallow orders. This makes it less reliable than manual preservation using Perma.cc since I think Perma.cc only hides target captures upon request/complaint, and even then the archive is still visible to the organization that created the archive.
Maybe Steve would be interested in turning Amber on for Slaw?