Archiving the Web
A not-new UK law was given regulatory effect this week and enables the British Library to archive the .uk web, just as it already receives legal deposit of UK print materials. The import of the new regulatory changes in effect April 6 is, I gather, that the archive can built by automated crawl, rather by permission for page-by-page grabs.
As the British Library explains, legal deposit of UK publications to identified libraries is, of course, a practice of long standing. The new regulations extend and entrench the program for UK digital materials:
Legal deposit has existed in English law since 1662. It helps to ensure that the nation’s published output (and thereby its intellectual record and future published heritage) is collected systematically, to preserve the material for the use of future generations and to make it available for readers within the designated legal deposit libraries.
By law, a copy of every UK print publication must be given to the British Library by its publishers, and to five other major libraries that request it. This system is called legal deposit and has been a part of English law since 1662.
From 6 April 2013, legal deposit also covers material published digitally and online, so that the Legal Deposit Libraries can provide a national archive of the UK’s non-print published material, such as websites, blogs, e-journals and CD-ROMs.
The extension of legal deposit to the digital realm dates back a decade, to the Legal Deposit Libraries Act of 2003. What is new, a press release from the British Library explains, is that
—the present regulations implement it in practical terms, encompassing electronic publications such as e-journals and e-books, offline (or hand-held) formats like CD-Rom and an initial 4.8 million websites from the UK web domain.
The press release details that deposit at the British Library and selective deposit libraries now encompass the digital:
From this point forward, the British Library, the National Library of Scotland, the National Library of Wales, the Bodleian Libraries, Cambridge University Library and Trinity College Library Dublin will have the right to receive a copy of every UK electronic publication, on the same basis as they have received print publications such as books, magazines and newspapers for several centuries.
The regulations, known as legal deposit, will ensure that ephemeral materials like websites can be collected, preserved forever and made available to future generations of researchers, providing the fullest possible record of life and society in the UK in the 21st century for people 50, 100, even 200 or more years in the future.
…
Access to non-print materials, including archived websites, will be offered via on-site reading room facilities at each of the legal deposit libraries. While the initial offering to researchers will be limited in scope, the libraries will gradually increase their capability for managing large-scale deposit, preservation and access over the coming months and years.By the end of this year, the results of the first live archiving crawl of the UK web domain will be available to researchers, along with tens of thousands of e-journal articles, e-books and other materials.
The motivation—likely all too apparent to those who think about such things—is nicely expressed by the Library’s head:
“Ten years ago, there was a very real danger of a black hole opening up and swallowing our digital heritage, with millions of web pages, e-publications and other non-print items falling through the cracks of a system that was devised primarily to capture ink and paper,” said Roly Keating, Chief Executive of the British Library.
Comments are closed.