Digital Preservation through Arch...
1 Digital Preservation Through Archival Collaboration: The Data Preservation Alliance for the Social Sciences Micah Altman, Harvard U. Margaret Adams, NARA Jonathan Crabtree, UNC Darrell Donakowski, U. Michigan Mark Maynard, U. Connecticut Amy Pienta, U. Michigan Copeland Young [Draft. Final version to appear in The American Archivist.] Abstract The Data Preservation Alliance for the Social Sciences (Data-PASS) is a partnership of five major U.S. institutions with a strong focus on archiving social science research. The Library of Congress supports the partnership through its National Digital Information Infrastructure and Preservation Program (NDIIPP). The goal of Data-PASS is to acquire and preserve data at risk of being lost to the research community, from opinion polls, voting records, large-scale surveys, and other social science studies. In this paper we discuss the agreements, processes, and infrastructure that provide a foundation for the collaboration. About the Partnership
2 An international movement to archive, preserve, and share data emerged over forty years ago when digital data began to appear in volume.1 This movement is undergoing a resurgence, as the social sciences shift anew toward a reliance on vast amounts of digital data. Still, we cannot say that even a majority of the digital social science research content created since the revolution in sample surveys and production of digital data has been preserved, nor that newly created data will be preserved. Why is this so? Many corporate and academic researchers assume that data they generate are their property and that they have limited obligations to share their data with others or to ensure its preservation. Some individual researchers are reluctant to deposit their data in archives because they fear competition. Some lack the time or expertise to prepare the metadata required for effective sharing. And some simply do not recognize the long-term value of their data. Institutional data producers may be under legal obligation to protect proprietary information. And some data just falls through the cracks. A huge quantity of digital social science research content lives on, for the moment, solely as files in the computers of individual researchers or of research institutions, or quite possibly as video tapes, floppy disks, or punchcards (etc.) in bookcases, libraries, and warehouses. If research sponsors, producers, and data curators do not take steps to preserve it, it will be lost forever.2 It needs to be identified, located, assessed, acquired, processed, preserved, and shared. 1 For an history of the early development of this community, see Margaret O Adams, ���The Origins and Early Years of IASSIST���, IASSIST Quarterly 30 no. 3 (2006), 5-15. 2 The members of this partnership represent the U.S. social science data archives tradition. There are other emerging approaches to preservation, including ���self���-archiving, and institutional archiving, and, more recently virtual archiving. See Peters, T.A. ���Digital Repositories: Individual, Discipline- based, Institutional, Consortial, or National?���, Journal of Academic Librarianship 28 no. 6: 414-417 (2001). For a discussion of virtual archiving by one of the partners see Micah Altman, ���Transformative Effects of NDIIPP, the case of the Henry A. Murray Archive���, Library Trends. (forthcoming). These are important trends, and our collection policy recognizes any collection to which a longstanding institution has made a long-term preservation commitment as not ���at-risk���. However, it is important to note that, as one recent
3 Five major American social science data archives have created the Data Preservation Alliance for the Social Sciences (Data-PASS) to ensure the long-term preservation of our holdings and of materials as yet un-archived.3 The partners are the Inter-university Consortium for Political and Social Research, The Roper Center for Public Opinion Research, The Howard W. Odum Institute for Research in Social Science, the electronic records custodial division of the National Archives and Records Administration (NARA) and The Henry A. Murray Research Archive, with strong technology support from the Harvard-MIT Data Center4. We seek to acquire and preserve data at-risk of being lost to the research community, from opinion polls, voting records, large-scale surveys, and other social science studies. While our organizations have a history of collaboration, this official partnership provides important benefits and has taught us a great deal about the advantages of formalized collaborative relationships. Data-PASS is, in part, funded by an award from the U.S. Library of Congress��� National Digital Information Infrastructure and Preservation Program (NDIIPP). The NDIIPP mission is to develop a national strategy to collect, archive, and preserve digital content, especially materials created in digital format. Our partnership works to ensure the long-term preservation of the vital heritage of digital material that allows our nation to understand itself, its social organization, and its policies and politics through social science research. study concludes, ���faculty output is not finding its way into institutional repositories in the U.S. in large numbers, except at some of the largest, most research-intensive universities.��� See, McDowell, C.S. ���Evaluating Institutional Repository Deployment in American Academe Since Early 2005: Repositories by the Numbers, Part 2���, D-lib Magazine 13 no 9/10 (2007). Furthermore, McDowell shows that currently most institutional repositories focus on print-related materials and do not have significant holdings of more complex (and less readily human interpretable) digital objects such as numeric data. Nor, in our own experience, do most of these repositories, make full preservation commitments to preserve quantitative data resources. 3 The Data-PASS project website is: http://www.icpsr.org/DATAPASS/ . All of the good practices documentation developed in this project, including the identification, appraisal and metadata practices are available from: http://www.icpsr.org/DATAPASS/presentations.html. The shared catalog is available from http://dvn.iq.harvard.edu/dvn/dv/datapass. [All URL���s accessed 08/01/2008] 4 Both the Harvard-MIT Data Center and the Henry A. Murray Research Archive are now part of the Institute for Quantitative Social Science, in the faculty of arts & sciences at Harvard University.