Fifth Annual Link Rot Report of the Chesapeake Digital Preservation Group
The Chesapeake Digital Preservation Group has just published its 5th annual study of link rot among the original URLs for online law- and policy-related materials it has been archiving since 2007.
“Every year, the Chesapeake Group investigates whether or not the documents in the archive can still be found at the original web addresses from which they were captured. The group analyzes two samples of web addresses, or URLs, pulled from the archive’s records”
“The first sample includes 579 original URLs for content captured from 2007-2008. This sample is revisited every year to document link rot and explore how it changes over time (…) “
“In 2012, 218 out of 579 URLs in the sample no longer provide access to the content that was originally selected, captured, and archived by the Chesapeake Group. In other words, link rot has increased to 37.7 percent within five years.”
Link rot describes “a URL that no longer provides direct access to files matching the content originally harvested from the URL and currently preserved in the Chesapeake Group’s digital archive. In some instances, a 404 or “not found” message indicates link rot at a URL. In other cases, the URL may direct to a site hosted by the original publishing organization or entity, but the specific resource has been removed or relocated from the original or previous URL” (from the 2011 link rot report)
More than 90% of the sample URLs were from state governments (state.[state code].us), organizations (.org), and Us government (.gov) top-level domains.
The Project has built a digital archive collection comprising more than 8,600 digital items. Most of the material archived is American. The Project is an initiative of the Georgetown Law School and Harvard Law School Libraries, and of the State Law Libraries of Maryland and Virginia.
This issue is also of major concern to Canadian legal researchers, as illustrated by the following posts here on Slaw:
- Link Rot and Legal Research, Ted Tjaden, August 3, 2005
- Link Rot is Alive and Well, Ted Tjaden, March 19, 2008
- Link Rot in Court Decisions, Shaunna Mireau, May 7, 2009
The problem may be more acute with shortened URLs. The editors of my article in [2009] Annual Review of Civil Litigation decided to shorten many of my links, and there were a lot of them, since the article was about electronic media and the law. They tended to use tiny.cc. The volume was published in October 2009 in print only. The print format was itself a good reason to shorten the URL – the longer versions would be practically unusable if they had to be typed by a reader wanting to check the note.
In the summer of 2011 I reviewed the links with a view to posting the article on my own web site, after the two-year exclusivity period on the licence ran out. Almost all of the tiny.cc links were dead. I went back to my Word version, where almost all of the links were still live. I cleaned up the others and put the text online with long form but clickable links. Someone reading the printed text would have to find the source material in some other way. (I often do a web search of the full title of the text I am looking for – if it’s a newspaper article or the like, it often shows up somewhere.)
I recall that the federal government’s online statutes for a while have a version of a URL with ‘stable’ in the name – presumably where it was intended that they should be available in the long term. Those ‘stable’ links have not worked for years now. sic transit … lex?