Column

The Data Rescue Project: Preserving Government Data Is a Tech & Community Issue

AALL Spectrum
Author: Andrea Guldalian, Duane Morris LLP

This submission is part of a column swap with the American Association of Law Libraries (AALL) bimonthly member magazine, AALL Spectrum. Published six times a year, AALL Spectrum is designed to further professional development and education within the legal information industry. Slaw and the AALL Spectrum board have agreed to hand-select several columns each year as part of this exchange. 

The Data Rescue Project is an archetypal librarian story. A community of data librarians, researchers, concerned individuals, and organizations sprang into action to preserve U.S. federal government data after it began disappearing from websites at a rapid pace in early 2025. The effort combined technical expertise—from data librarians, data hoarders, and archivists skilled in capturing and storing data—with the familiar librarian work of organizing and cataloging information, coordinating preservation efforts, disseminating usable data, and, of course, storytelling. Sharing how government data impacts people’s lives underscores why preservation efforts are vital to communities and must remain ongoing.

Precursors to the Data Rescue Project such as the End of Term Web Archive, which captures federal government data after presidential administration transitions, the 2017 Data Refuge Project, and the Environmental Data & Governance Initiative (EDGI), laid the groundwork for 2025 preservation efforts, but reinforcement and revitalization of efforts was critical.

Data Librarians and the Creation of the Data Rescue Project

The End of Term Web Archive and similar automated efforts capture snapshots of federal websites, but the tools often miss the actual datasets—particularly those hidden behind search forms, JavaScript rendering, or request-based access. Recognizing these gaps and seeing government information begin to disappear from websites, a network of data librarians and preservationists launched the Data Rescue Project in February 2025. Their mission was to identify vulnerable datasets, preserve them in trusted repositories, and make them publicly available as quickly as possible, while also saving the context, structure, and metadata essential to making the data understandable and usable for decades to come.

The Data Rescue Project grew rapidly from a simple Google Doc, where concerned professionals began crowdsourcing reports of missing or at-risk datasets, into a coordinated network of librarians, academics, and other interested parties. Lynda Kellam, Snyder-Granader director of research data and digital scholarship at the University of Pennsylvania and creator of the Google Doc, described the document’s rapid spread as going “librarian viral.” The effort soon became more formalized, as a coalition of data organizations including the International Association for Social Science Information Service & Technology (IASSIST), Research Data Access & Preservation (RDAP), and the Data Curation Network, joined with individuals and like-minded groups to focus on identifying, collecting, curating, and providing sustained public access to data. Partnering with organizations such as Harvard’s Library Innovation Lab, the Environmental Data & Governance Initiative (EDGI), Preservation of Electronic Government Information (PEGI), and Saving Ukrainian Cultural Heritage Online (SUCHO), the coalition targeted specific areas for its data rescue efforts.

The project’s technical workflows are designed to be accessible to non-specialists. Curated spreadsheets show volunteers what data needs saving, and the Mattermost platform provides orientation materials for both participants with “no technical or data experience” and those with more advanced skills. Scheduled “office hours” offer hands-on support. Data is deposited in repositories such as ICPSR’s DataLumos, a public, crowdsourced government data archive created in 2017. (ICPSR—the Inter-university Consortium for Political and Social Research—was founded in 1962 by political scientist Warren E. Miller to share scientific data, and now extends to 25 social and behavioral science areas.)

The results of the Data Rescue Project’s efforts have been impressive. As of mid-August 2025, “1,230 datasets across 85 federal government offices compiled by over 500 volunteers” had been rescued, according to a New America interview with Lynda Kellam. A March 2025 New Yorker article notes how guerilla-archivists were trained to use the ArchiveTeam Warrior app to back up data. Archiving utilities like Webrecorder or Wget can also be used to capture complex content. Harvard’s Library Innovation Lab released a backup of “more than three hundred thousand data sets hosted by data.gov,” and the Lab has also worked to make preserved data usable via opensource tools. Usability is critical, since the goal is to save data that serves communities—whether for medical or scientific researchers, meteorologists, economists, or even people using real estate listings to find school district data.

Data Storytelling

The Data Rescue Project’s work does not stop at preservation—it is also about visibility and advocacy. A public-facing website serves as a hub for resources, toolkits, and updates. Community communication channels share urgent alerts, celebrate restored datasets, and track preservation efforts using a centralized Data Rescue Tracker. This tracker logs metadata from multiple preservation initiatives; redundancy acts as a safeguard, providing multiple backups of key datasets.

Events encourage engagement, and in-person or virtual data rescue “sprints” help get preservation work done. Storytelling plays a vital role by connecting datasets to real-world impacts—such as how postsecondary education statistics inform student decision-making. The project’s promotional efforts underscore why preservation matters in everyday life. Jack Cushman of Harvard’s Library Innovation Lab encourages librarians and others to visit data.gov and look at the “Most-Viewed Datasets” to see the wide array of information represented.

The Road Ahead

For law librarians, the stakes are high when it comes to preserving federal and state data. The loss of a dataset can stymie research, undermine the integrity of research results, diminish transparency, and limit public accountability. In a legal landscape increasingly shaped by AI, datasets fuel research tools and predictive models, so ensuring reliable, open access to the underlying government data is essential. From a technological standpoint, Harvard’s Library Innovation Lab and the Data Rescue Project model best practices for resilience against data loss:

  • Open, self-documenting formats like BagIt ensure preserved data remains accessible without proprietary tools. A digital collection is stored in a directory (the bag), and an accompanying file (the tag) includes a machine-readable content list.
  • Cryptographic signing and timestamping add authenticity and provenance.
  • Geographically distributed copies prevent any one institution from becoming a single point of failure.
  • Client-side browsing tools allow the preserved datasets to be explored without depending on the original host’s infrastructure.

Sustainability of the current preservation efforts is key. The Harvard Library Innovation Lab and the Data Rescue Project emphasize building partnerships, securing ongoing funding, and embedding preservation responsibilities into institutional missions. Librarians’ skills— in organizing information, archiving, and ensuring access—are essential for protecting digital data. In addition to creating guides that point patrons to preserved datasets, law librarians and information professionals can assist in other ways:

  • Identify and Archive At-Risk Data: Identify potentially vulnerable government information— justice statistics datasets, defunct government commission websites, or legal guidance pages that might be removed when administrations change. Use web archiving tools (such as the Internet Archive’s Wayback Machine or browser-based tools like page) to capture copies of webpages and files proactively. Nominate important legal websites for inclusion in preservation projects.
  • Contribute to Data Repositories (e.g., DataLumos): If you have government datasets that are not easily findable anymore, consider depositing them in an open repository. DataLumos, hosted by ICPSR, welcomes contributions of federal data.
  • Leverage Library Expertise in Authentication and Discoverability: Information professionals can authenticate and contextualize preserved data, adding trust for end-users. Create catalog records or finding aids for rescued government documents. Adding descriptive metadata and linking related materials makes the preserved data more accessible and meaningful to legal researchers and the public.

The Data Rescue Project demonstrates that preserving government data is more than a technical task—it is a shared responsibility that safeguards transparency, accountability, and access to knowledge. By combining technical expertise with the organizational strengths of librarians and the commitment of engaged communities, these efforts ensure that vital public information remains available for research, decision-making, and the public good.

This article was adapted from a presentation during the June 2025 Private Law Librarians & Information Professional (PLLIP) Summit. Presenters included Jack Cushman, Andrea Guldalian, and Lynda Kellam.

______________

Andrea Guldalian
Director of Library and Research Services 

Duane Morris LLP
Philadelphia, PA

Start the discussion!

Leave a Reply

(Your email address will not be published or distributed)