Après Le Deluge de Data…quoi?

An article in the recent Communications of the ACM (Association for Computing Machinery), “Got data?: a guide to data preservation in the information age,” makes the case for urgent investment in data cyberinfrastructure — whatever is required to store, manage, catalog and access data.

(Note: that link won’t give you much joy unless you happen to subscribe to the ACM portal. Fortunately, the author, Francine Berman, who is Director of the San Diego Supercomputer Center, has put up on her website a version of the piece in PDF, “Surviving the Data Deluge.” Such is the advance of the free access movement that I’m always surprised and affronted when I come across a link to a scholarly piece behind a paywall. In this case, you’d think that the computing machinery industry would be flush enough to support the free publication of its communications.)

Berman begins by telling us how much digital data we’re producing. It’s worth repeating some of those figures here. In 2007 the amount of data was estimated to be 281 exabytes — 2.25 x 1021 bits, which, for those of us more earthbound, is a million times more data than is hosted by the Library of Congress and is equivalent to 281 trillion digitized novels. (A trillion, as everyone knows full well by now, is just about the size of a bailout.) So profligate are we with our data production that as of last year, we produce more of the stuff than can be stored on present-day machinery: increasingly it is slopping over into the gutters and running away.

The huge increase in digital data is one of four trends that Berman identifies, the other three being:

  • More and more policies and regulations require the access, stewardship, and/or preservation of digital data.
  • Storage costs for digital data are decreasing but are rising as a proportion of data center budgets
  • Digital data storage and services are increasingly commercialized.

She offers ten guidelines for “data stewardship”:

  1. Make a plan.
  2. Be aware of data costs and include them in your overall IT budget.
  3. Associate metadata with your data.
  4. Make multiple copies of valuable data.
  5. Plan for the transition of digital data to new storage media ahead of time.
  6. Plan for transitions in data stewardship.
  7. Determine the level of “trust” required when choosing how to archive data? (i.e. will Google do?)
  8. Tailor plans for preservation and access to the expected use.
  9. Pay attention to security.
  10. Know the relevant regulations.

The concerns addressed by Berman should be familiar to all law firms, if not to every lawyer working within a firm. The amount of digital data associated with the practice of law will only grow, and its stewardship is a crucial aspect of the responsible and ethical practice of law. Does your firm have a plan? Do you?

Comments are closed.