Chapter 8 of Partners for Preservation is ‘Preparing and Releasing Official Statistical Data’ by Professor Natalie Shlomo. This is the first chapter of Part III: Data and Programming. I knew early in the planning for the book that I wanted a chapter that talked about privacy and data.
During my graduate program, in March of 2007, Google announced changes to their log retention policies. I was fascinated by the implications for privacy. At the end of my reflections on Google’s proposed changes, I concluded with:
“The intersection of concerns about privacy, government investigations, document retention and tremendous volumes of private sector business data seem destined to cause more major choices such as the one Google has just announced. I just wonder what the researchers of the future will think of what we leave in our wake.”
While developing my chapter list for the book – I followed my curiosity about how the field of statistics preserves privacy and how these approaches might be applied to historical data preserved by archives. Fields of research that rely on the use of statistics and surveys have developed many techniques for balancing the desire for useful data with the expectations of confidentiality by those who participate in surveys and censuses. This chapter taught me that “statistical disclosure limitation”, or SDL, aims to prevent the disclosure of sensitive information about individuals.
This short excerpt gives a great overview of the chapter:
“With technological advancements and the increasing push by governments for open data, new forms of data dissemination are currently being explored by statistical agencies. This has changed the landscape of how disclosure risks are defined and typically involves more use of peturbative methods of SDL. In addition, the statistical community has begun to assess whether aspects of differential privacy which focus on the peturbation of outputs may provide solutions for SDL. This has led to collaborations with computer scientists”
Almost eighty years ago, the woman in the photo above used a keypunch to tabulate the US Census. The amount of hands-on detail labor required to gather that data boggles the mind in comparison to born-digital data collection techniques now possible. The 1940 census was released in 2012 and is available online for free through a National Archives website. As archives face the onslaught of born-digital data tied to individuals, the techniques used by statisticians will need to become a familiar tool for archivists seeking to both increase access to data while respecting the privacy of those who might be identified through unfettered access to the data. This chapter serves as a solid introduction to SDL, as well as a look forward to new ideas in the field. It also ties back to topics in Chapter 2: Curbing The Online Assimilation Of Personal Information and Chapter 5: The Internet Of Things.
Natalie Shlomo (BSc, Mathematics and Statistics, Hebrew University; MA, Statistics, Hebrew University; PhD, Statistics, Hebrew University) is Professor of Social Statistics at the School of Social Sciences, University of Manchester. Her areas of interest are in survey methods, survey design and estimation, record linkage, statistical disclosure control, statistical data editing and imputation, non-response analysis and adjustments, adaptive survey designs and small area estimation. She is the UK principle investigator for several collaborative grants from the 7th Framework Programme and H2020 of the European Union all involving research in improving survey methods and dissemination. She is also principle investigator for the Leverhulme Trust International Network Grant on Bayesian Adaptive Survey Designs. She is an elected member of the International Statistical Institute and a fellow of the Royal Statistical Society. She is an elected council member and Vice-President of the International Statistical Institute. She is associate editor of several journals, including International Statistical Review and Journal of the Royal Statistical Society, Series A. She serves as a member of several national and international advisory boards.
Image source: A woman using a keypunch to tabulate the United States Census, circa 1940. National Archives Identifier (NAID) 513295 https://commons.wikimedia.org/wiki/File:Card_puncher_-_NARA_-_513295.jpg