Internet Archive and Copyright

Too late for our Theme Week on copyright but still interesting:

Michael Shamos, a computer science professor at Carnegie Mellon University, said archiving like that done by the Internet Archive is “the biggest copyright infringement in the world,” but said it is done in a way “that almost nobody cares about.”
CNEWS (via AP): Internet Archive faces copyright suit

A couple of weeks ago a there was an item in the newsThe NY Times article is good, as is the piece in Law.com. about a lawsuit by a company, Healthcare Advocates, against the Internet Archive for failing to do enough to protect copyrighted material: Healthcare Advocates’ opponents in another law suit had used earlier versions of the HA site presumably to advance their cause. The Internet Archive is more than just “a snapper-up of unconsidered trifles”: there are over 5 billion “pages” in the archive.

To a researcher, this is one of those instances where you want to say, “Yes, sure, but…” The “but” is about free access, of course; however, in this case it would seem that the “yes, sure” must be correct. Alas. If it was copyrighted to begin with, it’s surely still copyrighted when someone else copies it, even though my current version has changed. I think the Internet Archive follows something like the infamous Rogers Cable negative option: there are ways, evidently, to prevent their bots and spiders from taking your site, and ways to get them to remove your old material — but you do have to take steps, and I’m not sure that’s right. In the instant case, though, it would seem that Healthcare Advocates did take steps to protect their data but it got out via the Archive nonetheless.

Or is this another case of the Google cache brouhaha? In Parker v. Google [pdf] the U.S. District Court decided in favour of the search giant. Perhaps the copyright doesn’t kick in, as it were, until someone further downstream from the Internet Archive makes an impermissable use of the data.

On a more general level, archiving material on the internet is a serious issue. A great deal of data now finds itself in digital format only, and that on the net, rather than in print form, with the consequence that it’s evanescent. It would make sense for individuals and organizations to take steps to archive their own contributions to the internet; and in that regard people might be interested in a monograph by Neils Brügger, “Archiving Websites: General Considerations and Strategies” [pdf], published by the Centre for Internet Research in Århus, Denmark.

Comments are closed.