Menu Close

Category: privacy

Chapter 8: Preparing and Releasing Official Statistical Data by Professor Natalie Shlomo

Black and white photo of a woman using a keypunch to tabulate the United States Census, circa 1940.Chapter 8 of Partners for Preservation is ‘Preparing and Releasing Official Statistical Data’ by Professor Natalie Shlomo. This is the first chapter of Part III:  Data and Programming. I knew early in the planning for the book that I wanted a chapter that talked about privacy and data.

During my graduate program, in March of 2007, Google announced changes to their log retention policies. I was fascinated by the implications for privacy. At the end of my reflections on Google’s proposed changes, I concluded with:

“The intersection of concerns about privacy, government investigations, document retention and tremendous volumes of private sector business data seem destined to cause more major choices such as the one Google has just announced. I just wonder what the researchers of the future will think of what we leave in our wake.”

While developing my chapter list for the book – I followed my curiosity about how the field of statistics preserves privacy and how these approaches might be applied to historical data preserved by archives. Fields of research that rely on the use of statistics and surveys have developed many techniques for balancing the desire for useful data with the expectations of confidentiality by those who participate in surveys and censuses. This chapter taught me that “statistical disclosure limitation”, or SDL, aims to prevent the disclosure of sensitive information about individuals.

This short excerpt gives a great overview of the chapter:

“With technological advancements and the increasing push by governments for open data, new forms of data dissemination are currently being explored by statistical agencies. This has changed the landscape of how disclosure risks are defined and typically involves more use of peturbative methods of SDL. In addition, the statistical community has begun to assess whether aspects of differential privacy which focus on the peturbation of outputs may provide solutions for SDL. This has led to collaborations with computer scientists”

Almost eighty years ago, the woman in the photo above used a keypunch to tabulate the US Census. The amount of hands-on detail labor required to gather that data boggles the mind in comparison to born-digital data collection techniques now possible. The 1940 census was released in 2012 and is available online for free through a National Archives website. As archives face the onslaught of born-digital data tied to individuals, the techniques used by statisticians will need to become a familiar tool for archivists seeking to both increase access to data while respecting the privacy of those who might be identified through unfettered access to the data. This chapter serves as a solid introduction to SDL, as well as a look forward to new ideas in the field. It also ties back to topics in Chapter 2: Curbing The Online Assimilation Of Personal Information and Chapter 5: The Internet Of Things.

Bio:

Natalie Shlomo (BSc, Mathematics and Statistics, Hebrew University; MA, Statistics, Hebrew University; PhD, Statistics, Hebrew University) is Professor of Social Statistics at the School of Social Sciences, University of Manchester.  Her areas of interest are in survey methods, survey design and estimation, record linkage, statistical disclosure control, statistical data editing and imputation, non-response analysis and adjustments, adaptive survey designs and small area estimation.   She is the UK principle investigator for several collaborative grants from the 7th Framework Programme and H2020 of the European Union all involving research in improving survey methods and dissemination. She is also principle investigator for the Leverhulme Trust International Network Grant on Bayesian Adaptive Survey Designs. She is an elected member of the International Statistical Institute and a fellow of the Royal Statistical Society. She is an elected council member and Vice-President of the International Statistical Institute. She is associate editor of several journals, including International Statistical Review and Journal of the Royal Statistical Society, Series A.   She serves as a member of several national and international advisory boards.

Image source:  A woman using a keypunch to tabulate the United States Census, circa 1940. National Archives Identifier (NAID) 513295 https://commons.wikimedia.org/wiki/File:Card_puncher_-_NARA_-_513295.jpg

Chapter 5: The Internet of Things: the risks and impacts of ubiquitous computing by Éireann Leverett

Chapter 5 of Partners for Preservation is ‘The Internet of Things: the risks and impacts of ubiquitous computing’ by Éireann Leverett. This is one of the chapters that evolved a bit from my original idea – shifting from being primarily about proprietary hardware to focusing on the Internet of Things (IoT) and the cascade of social and technical fallout that needs to be considered.

Leverett gives this most basic definition of IoT in his chapter:

At its core, the Internet of Things is ‘ubiquitous computing’, tiny computers everywhere – outdoors, at work in the countryside, at use in the city, floating on the sea, or in the sky – for all kinds of real world purposes.

In 2013, I attended a session at The Memory of the World in the Digital Age: Digitization and Preservation conference on the preservation of scientific data. I was particularly taken with The Global Sea Level Observing System (GLOSS) — almost 300 tide gauge stations around the world making up a web of sea level observation sensors. The UNESCO Intergovernmental Oceanographic Commission (IOC) established this network, but cannot add to or maintain it themselves. The success of GLOSS “depends on the voluntary participation of countries and national bodies”. It is a great example of what a network of sensors deployed en masse by multiple parties can do – especially when trying to achieve more than a single individual or organization can on its own.

Much of IoT is not implemented for the greater good, but rather to further commercial aims.  This chapter gives a good overview of the basics of IoT and considers a broad array of issues related to it including privacy, proprietary technology, and big data. It is also the perfect chapter to begin Part II: The physical world: objects, art, and architecture – shifting to a topic in which the physical world outside of the computer demands consideration.

Bio:

Éireann Leverett

Éireann Leverett once found 10,000 vulnerable industrial systems on the internet.

He then worked with Computer Emergency Response Teams around the world for cyber risk reduction.

He likes teaching the basics and learning the obscure.

He continually studies computer science, cryptography, networks, information theory, economics, and magic history.

He is a regular speaker at computer security conferences such as FIRST, BlackHat, Defcon, Brucon, Hack.lu, RSA, and CCC; and also at insurance and risk conferences such as Society of Information Risk Analysts, Onshore Energy Conference, International Association of Engineering Insurers, International Risk Governance Council, and the Reinsurance Association of America. He has been featured by the BBC, The Washington Post, The Chicago Tribune, The Register, The Christian Science Monitor, Popular Mechanics, and Wired magazine.

He is a former penetration tester from IOActive, and was part of a multidisciplinary team that built the first cyber risk models for insurance with Cambridge University Centre for Risk Studies and RMS.

Image credit: Zan Zig performing with rabbit and roses, including hat trick and levitation, Strobridge Litho. Co., c1899.

NOTE: I chose the magician in the image above for two reasons:

  1. because IoT can seem like magic
  2. because the author of this chapter is a fan of magic and magic history

Chapter 2: Curbing the Online Assimilation of Personal information by Paulan Korenhof

The second chapter in Partners for Preservation is ‘Curbing the Online Assimilation of Personal Information’ by Paulan KorenhofGiven the amount of attention being focused on the right to be forgotten and the EU General Data Protection Regulation (GDPR), I felt it was essential to include a chapter that addressed these topics. Walking the fine line between providing access to archival records and respecting the privacy of those whose personal information is included in the records has long been an archival challenge.

In this chapter, Korenhof documents the history of the right to be forgotten and the benefits and challenges of GDPR as it is currently being implemented. She also explores the impact of the broad and virtually instantaneous access to content online that the Internet has facilitated.

This quote from the chapter highlights a major issue with making so much content available online, especially content that is being digitized or surfaced from previously offline data sources:

“With global accessibility and the convergence of different contextual knowledge realms, the separating power of space is nullified and the contextual demarcations that we are used to expecting in our informational interactions are missing.”

As the second chapter in Part 1: Memory, Privacy, and Transparency, it continues to pull these ideas together. In addition to providing a solid grounding in the right to be forgotten and GDPR, it should guide the reader to explore the unintended consequences of the mad rush to put everything online and the dramatic impact that search engines (and their human coded algorithms) have on what is seen.

I hope this chapter triggers more contemplation of these issues by archivists within the big picture of the Internet. Often we are so focused on improving access to content online that these questions about the broader impact are not considered.

Bio

Paulan Korenhof

Paulan Korenhof is in the final stages of her PhD-research at the Tilburg Institute for Law, Technology, and Society (TILT). Her research is focused on the manner in which the Web affects the relation between users and personal information, and the question to what degree the Right to Be Forgotten is a fit solution to address these issues. With a background in philosophy, law, and art, she investigates this relation from an applied phenomenological and critical theory perspective. Occasionally she co-operates in projects with Hacklabs and gives privacy awareness workshops to diverse audiences. Recently she started working at the Amsterdam University of Applied Sciences (HVA) as a researcher on Legal Technology.

 

Image credit: Flickr Commons: British Library: Image taken from page 5 of ‘Forget-Me-Nots. [In verse.]’: https://www.flickr.com/photos/britishlibrary/11301997276/

Chapter 1: Inheritance of Digital Media by Dr. Edina Harbinja

You're Dead, Your Data Isn't: What Happens Now?
The first chapter in Partners for Preservation is ‘Inheritance of Digital Media’, written by Dr. Edina Harbinja. This topic was one of the first I was sure I wanted to include in the book. Back in 2011, I attended an SXSW session titled Digital Death. The discussion was wide-ranging and attracted people of many backgrounds including lawyers, librarians, archivists, and social media professionals. I still love the illustration above, created live during the session.

The topic of personal digital archiving has since gained traction, inspiring events and the creation of resources. There are now multiple books addressing the subject. The Library of Congress created a kit to help people host personal digital archiving events. In April 2018 a Personal Digital Archiving Conference(PDA) was held in Houston, TX. You can watch the presentations from the PDA2017 hosted by Stanford University Libraries. PDA2016 was held at the University of Michigan Library and PDA2015 was hosted by NYU. In fact, the Internet Archive has an entire collection of videos and presentation materials from various PDA’s dating back to 2010.

I wanted the chapter on digital inheritance to address topics at the forefront of current thinking. Dr. Edina Harbinja delivered exactly what I was looking for and more. As the first chapter in Part 1: Memory, Privacy, and Transparency, it sets the stage for many of the common threads I saw in this section of the book.

Here is one of my favorite sentences from the chapter:

“Many digital assets include a large amount of personal data (e.g. e-mails, social media content) and their legal treatment cannot be looked at holistically if one does not consider privacy laws and their lack of application post-mortem.”

This quote gets at the heart of the chapter and provides a great example of the intertwining elements of memory and privacy. What do you think will happen to all of your “digital stuff”? Do you have an expectation that your privacy will be respected? Do you assume that your loved ones will have access to your digital records? To what degree are laws and policies keeping up (or not keeping up) with these questions? As an archivist, how might all this impact your ability to access, extract, and preserve digital records?

Look to chapter one of Partners for Preservation to explore these ideas.

Bio

Dr. Edina Harbinja

Dr. Edina Harbinja is a senior lecturer in media/privacy law at Aston University, Birmingham, UK. Her principal areas of research and teaching are related to the legal issues surrounding the Internet and emerging technologies. In her research, Edina explores the application of property, contract law, intellectual property, and privacy online. Edina is a pioneer and a recognized expert in post-mortem privacy, i.e. privacy of the deceased individuals. Her research has a policy and multidisciplinary focus and aims to explore different options of regulation of online behaviors and phenomena. She has been a visiting scholar and an invited speaker to universities and conferences in the USA, Latin America, and Europe, and has undertaken consultancy for the Fundamental Rights Agency. Her research has been cited by legislators, courts, and policymakers in the US, Australia, and Europe as well. Find her on Twitter at @EdinaRl.

SXSWi: You’re Dead, Your Data Isn’t: What Happens Now?

This five person panel at SXSW Interactive 2011 tackled a broad range of issues related to what happens to our online presence, assets, creations and identity after our death.

Presenters:

There was a lot to take in here. You can listen to the full audio of the session or watch a recording of the session’s live stream (the first few minutes of the stream lacks audio).

A quick and easy place to start is this lovely little video created as part of the promotion of Your Digital Afterlife – it gives a nice quick overview of the topic:

Also take a look at the Visual Map that was drawn by Ryan Robinson during the session – it is amazing! Rather than attempt to recap the entire session, I am going to just highlight the bits that most caught my attention:

Laws, Policies and Planning
Currently individuals are left reading the fine print and hunting for service specific policies regarding access to digital content after the death of the original account holder. Oklahoma recently passed a law that permits estate executors to access the online accounts of the recently deceased – the first and only state in the US to have such a law. It was pointed out during the session that in all other states, leaving your passwords to your loved ones is you asking them to impersonate you after your death.

Facebook has an online form to report a deceased person’s account – but little indication of what this action will do to the account. Google’s policy for accessing a deceased person’s email requires six steps, including mailing paper documents to Mountain View, CA.

There is a working group forming to create model terms of service – you can add your name to the list of those interested in joining at the bottom of this page.

What Does Ownership Mean?
What is the status of an individual email or digital photo? Is it private property? I don’t recall who mentioned it – but I love the notion of a tribe or family unit owning digital content. It makes sense to me that the digital model parallel the real world. When my family buys a new music CD, our family owns it – not the individual who happened to go to the store that day. It makes sense that an MP3 purchased by any member of my family would belong to our family. I want to be able to buy a Kindle for my family and know that my son can inherit my collection of e-books the same way he can inherit the books on my bookcase.

Remembering Those Who Have Passed
How does the web change the way we mourn and memorialize people? Many have now had the experience of learning of the passing of a loved one online – the process of sorting through loss in the virtual town square of Facebook. How does our identity transform after we are gone? Who is entitled to tag us in a photo?

My family suffered a tragic loss in 2009 and my reaction was to create a website dedicated to preserving memories of my cousin. At the Casey Feldman Memories site, her friends and family can contribute memories about her. As the site evolved, we also added a section to preserve her writing (she was a journalism student) – I kept imagining the day when we realized that we could no longer access her published articles online. I built the site using Omeka and I know that we have control over all the stories and photos and articles stored within the database.

It will be interesting to watch as services such as Chronicle of Life spring up claiming to help you “Save your memories FOREVER!”. They carefully explain why they are a trustworthy digital repository and why they backup their claims with a money-back guarantee.

For as little as $10, you can preserve your life story or daily journal forever: It allows you to store 1,000 pages of text, enough for your complete autobiography. For the same amount, you could also preserve less text, but up to 10 of your most important photos. – Chronicle of Life Pricing

Privacy
There are also some interesting questions about privacy and the rights of those who have passed to keep their secrets. Facebook currently deletes some parts of a profile when it converts it to a ‘memorial’ profile. They state that this is for the privacy of the original account holder. If users are ultimately given more power over the disposition of their social web presence – should these same choices be respected by archivists? Or would these choices need to be respected the way any other private information is guarded until some distant time after which it would then be made available?

Conculsion
Thanks again to all the presenters – this really was one of the best sessions for me at SXSWi! I loved that it got a whole different community of people thinking about digital preservation from a personal point of view. You may also want to read about Digital Death Day – one coming up in May 2011 in the San Francisco Bay Area and another in September 2011 in the Netherlands.

Image credit: Excerpt from Ryan Robinson’s Visual Map created live during the SXSW session.

Encouraging Participation in the Census

1940-census-posterWhile smart folks over at NARA are thinking about the preservation strategy for digitized 2010 census forms, I got inspired to take a look at what we have preserved from past censuses. In specific, I wanted to look at posters, photos and videos that give us a glimpse into how we encouraged and documented the activity of participation in the past.

There is a dedicated Census History area on the Census website, as well as a section of the 2010 website called The Big Count Archive. While I like the wide range of 2010 Census Posters – the 1940 census poster shown here (thank you Library of Congress) is just so striking.

I also loved the videos I found, especially when I realized that they were all available on YouTube – uploaded by a user named JasonGCensus. I am not clear on the relationship between JasonGCensus and the official U.S. Census Bureau’s Channel (which seems focused on 2010 Census content), but there are some real gems posted there.

For example, in the 1970 Census PSA shown below we learn about the privacy of our census data: “Our separate identities will be lost in the process which is concerned only with what we say, not who said it”. We are shown technology details – complete with old school beeping and blooping computer sounds. (NOTE: this video is also available on Census.gov, but I saw no way to embed that video here – hence my cheer at finding the same video on YouTube)

For the 1960 census, a PSA explains the new FOSDIC technology which removed the need for punch-cards. With the tagline ‘Operation Rollcall, USA’, the ad presents our part in “this enterprise” as cooperation with the enumerators. In the 1980 PSA the tag line is ‘Answer the Census: We’re counting on you!’ and stresses that it is kept confidential and is used to provide services to communities. By the time you get to the 1990 and 2000 PSAs we see more stress on the benefits to communities that fill out the census and less stress on how the census is actually recorded.

I also found some lovely census images in the Library of Congress Prints and Photographs catalog including the image shown here and:

Exploring the area of Census.gov dedicated to the 2010 census made me wonder what was available online for the 2000 census.

Wayback Machine to the rescue! They have what appears to be a fairly deep crawl of the 2000 Census.gov site dating from March of 2000. For example – the posters section seems to include all the images and PDFs of the originals. I even found functional Quicktime videos in the Video Zone, like this one: How America Knows What America Needs.

The ten year interval makes for a nice way to get a sense of the country from the PR perspective. What did the Census Bureau think was the right way to appeal to the American public? Were we more intrigued by the latest technology or worried about our privacy? Did they need to communicate what the census is used for? Or was it okay to simply express it as an American’s duty? I appreciate the ease with which I can find and share the resources above. Great fun.

And for those of you in the United States, please consider this my personal encouragement to fill out your census forms!

Update: The WashingtonPost has an interesting article about the ‘Snapshot of America’ series of promotional videos for the 2010 census. Definitely an interesting contrast to the videos I reviewed for this post.

SAA2008: Yale, Family Papers & High School Students (Session 508)

The session’s official title was Family and Community Archives Project: Introducing High School Students to the Archives Profession. It focused on a pilot outreach program carried out by 21 archivists from Yale University at the Cooperative Arts and Humanities magnet high school in New Haven, CT. 117 high school juniors participated as part of their US History course. The pilot aimed to introduce them to what archivists do, work with them to find, understand and describe their family papers and also to present archives as a possible profession to students who might assume that it was only welcoming to Caucasians.

A number of their original plans were adjusted after they met with the high school administrators:

  • They would need to work with juniors rather than seniors because it is the juniors who take US History
  • The principal wanted them to work with all 5 classes of US History students, rather than a single class.
  • The program would run from March to May instead of January to June
  • When they realized that a number of students are in foster care, they needed to find other ways to include students who did not want (or could not) do family research. They chose to add the option of researching the history of community organizations.

Logistics

A total of twenty-one archivists from various departments at Yale University volunteered. They were divided up into five teams, one for each class with which they would be working during the course of the pilot. Starting in October they held weekly meetings to create the schedule and plans. A total of eight lesson plans were created. These took much more time than the archivists had expected. They also designed and printed a brochure to introduce the students to archives, archivists and basic archival terms. A wiki (Family Community Archives Project Wiki) was created to facilitate communication among the archivists and teachers. The wiki included bios of the archivists.

All classwork would be graded by the teachers without input from the archivists. This classwork included a journal component. It was decided that the journal (a 3-ring binder that the archivists provided) would remain in the class room. This choice was made based on teacher input – there was concern that if the journals were removed from the classroom that they would quickly be misplaced or forgotten.

Parents and guardians of participating students were alerted via a letter explaining the class project and encouraging them to help students as they worked on their family or community research.

A blog (Family and Community Archives Project Blog) was created that students, archivists and teachers could all use to communicate with each other. They met with the classes for 8 weeks. Every student got a certificate of participation and an ‘archivally themed goody box’ (think Oscars.. but less opulent). They asked students to complete an evaluation form – to ‘be honest… we are thick skinned’. They mounted an exhibit in the main Yale library featuring the student’s work. As is often the case with 16 year olds, the students pulled it together at the last minute and did a great job. They had an opening reception that included students, parents and the community.

Lessons Learned

They discussed both with the teachers and archivists to analyze what worked and what didn’t. What worked?

  • Students learned what archivists do – some said they might consider a career as an archivist and that they learned a lot.
  • The teachers enjoyed it – noticed some students were more engaged than they sometimes were (while some were not that interested).
  • Brought Yale into community and the community into Yale.
  • Collaboration across libraries and departments – archivists met each other and worked together.
  • The group creation of lesson plans.
  • The choice to assign several archivists per class. It permitted small groups and one-on-one work. Lesson plans were sometimes customized to suite the classroom/teacher/student special cases.
  • The blog: this communication worked for some.. but not all. Hard to know why some students were more comfortable with the blog than others. It was a good way to provide students with information about the archivists and the project.
  • The wiki: provided schedules, lesson plans, resources.. etc. It was very successful & usefull.

The most successful aspects?

  • The archives tour
  • Discussion of who uses archives and why which included audio/visual examples and archival material.
  • The exhibit was a high point of the project. They photographed the items they wanted to display and that worked well. Students were very proud of the exhibit.. 25% did not contribute.

What did not work?

  • Teacher support varied – success completely depended on the enthusaism and commitment of the teacher.
  • 8 weeks is too long for this sort of project
  • Class meeting times too long – 40 and 80 minute sessions
  • Needed more feedback earlier in the process from teachers on lesson plans – didn’t learn the reading level of the students until lesson plans were done… needed clearer definition of expectations for the exhibit.
  • Efficacy and support for homework – some people thought there should be no homework (other than project tasks) .. some thought it should be more structured.
  • Technology support for A/V lesson – school didn’t have equipment to support the A/V projection needs
  • Student privacy – they needed parent/guardian permissions to allow video & photos of students to be taken. There was a very late question about if they could use the students’ first and last in the exhibition. No media release forms were sent out in time to make a video about the session.
  • School activities schedule changed all the time – interfered
  • Early class time led to poor attendance (7 am!)
  • The archivists talked too much – they needed more hands on lessons. Students should have been able to bring in materials earlier in the process and have more time to work with them. More opportunity to connect to the student – the example being the LAST class session when the students brought materials in for scanning by the archivists. This gave a way to connect to the archivists and understand why their materials were important.

Teacher’s suggestions for improving the project

  • Run the project for 2 weeks in march – just after national testing is completed
  • Meet with each class 5 times in a row in one week.. with one class being the tour

This project fit in really well with Yale’s goals of reaching out to the local New Haven community.

Potential lessons for other archivists

  • planning phase:
    • define measures of success
    • define what you want students to learn & how – realistic objects for a 16 year old.. do not be too ambitious. Include perspectives of archivist parents. for some classes lecturing worked well.. some classes small groups worked really well
    • define resources needed ( they had 21 archivists who did work on Yale’s time) – Money = $3,000 spent on photo reproductions, handouts, mounting, gift boxes, lunch for teachers & archivists and final reception.
    • explore what is available on the Internet – look for lesson plans – good stuff out there that is often too ambitious, but good for adaptation
    • partner with the teacher – engage the teachers early on.. define what the students need to do by the end of the project. think about archivists who have never taught before.. figure out what you can do to help them
    • include a tour of a repository
    • provide teaching lessons for archivists who haven’t taught
    • plan for unengaged students and teachers – adapted their lessons.. hard situation..
    • avoid early morning classes
    • resolve privacy/confidentiality issue early
  • implementation:
    • be flexible – be prepared for changing activities schedules and other in class challenges
    • do an exhibit – create copies.. understand that these are precious materials
    • be visual in your teaching – video!
    • delving into family history can raise sensitive information – help 16 year olds figure out how to choose what to display in a public exhibit
    • introduce them to other jobs beyond archivist – at first only talked about archivists work… but next year will also talk about all the people who work in archives. Tie in their interests (this was an arts school.. include that perspective)
    • wrap up meetings with teachers and archivists essential

Diversity

One of the underlying goals of the pilot was to explore ways to increase diversity.

Cultural exchange: What did archivists learn from the students and teachers when working with the school? They learned about the student’s families and their community organizations. It bridged a generation gap – the archivists learned about what it meant to be a high school kid these days. Not all of it was positive – it left a lot of the archivists with concern for the state of education – issues with their writing skills.

Difficult to measure: How do we know it worked? No longitudinal study is being done to find out if they end up working in archives. We need to take a long view – but be impatient.

The impact on archives, defined broadly – no matter if they did not make any new archivists, they supported the archival endeavor – 110 students, teachers and their families now have a better understanding of archives and records.

Questions & Answers

Question: Who crafted the evaluation for the students?

Answer: One of the archivists created it and it was approved by the rest of the team.

Question: In the future would you find it more desirable to work with the teachers on evaluating the student projects for grading purposes? or is that not our business?

Answer: No, they would not want to be involved with grading. The teacher knows the students. That said – they do wish that the teachers had planned the final project earlier on. Next time the archivists would encourage/push for final project guidelines.

Question: How did you measure that your learning objectives were met other than the survey?

Answer: They didn’t do that formally – but anecdotally when the students were in other classes – they heard other teachers report that students continued to talk about the archives work outside of the history class. There was a ‘buzz’ among the students.

Question: How did you find the time to do this?

Answer: The leadership had to agree (at least informally) that the archivists can do this. Molly: They were very surprised by how much time it all took. It was a volunteer effort.. they met as a group 1x a week during their lunch hour.

Question: Why didn’t you consider doing an electronic journal?

Answer: There was a concern that not all students are tech savvy. For example – only a handful of kids engaged with the blog. They felt they couldn’t require it unless everyone had access and a sufficient comfort level with the tools.

Question: Where any archivists of color involved in the project ? If one of the goals of projects like this is to encourage individuals of color to consider a career as an archivist, it might be easier if they see people who look like them.. people out there documenting diverse communities.

Answer: Yes.. a few. There were suggestions that they could contact the roundtables of color/ethnicity – bring in visiting speakers to talk about how they came to work in archives. The materials are important too – materials they can relate with. It was emphasized again that this was a pilot and the had to spend a great deal of time creating their lesson plans from scratch. Now that they have the building blocks – they can improve other aspects.

Question: What about talking about preserving things like MySpace pages – maybe use myspace for the blogging

Answer: They didn’t want to do anything that might exclude people.

Question: Was the non-involved teacher aware of what archives do?

Answer: He didn’t come to the archives tour. He was totally tuned out. He felt he was very behind in the teaching schedule – both students and the teacher felt it was taking away from class time.

Question: Could they offer the 11 out of 117 who said they might want to be archivists internships?

Answer: Maybe – but since the rules of the school required that any student who left the campus was accompanied by an adult, it would be very challenging.

My Thoughts

I found this session very inspiring. I loved that it took the archives to the community and it the community into the archives. This is the sort of outreach project I hope has a chance of spreading to other schools. Interested in considering a project like this at your archives? Take a look at all the resources available on the wiki’s handouts and homework page and be on the lookout for a writeup of the pilot in the Nov/Dec issue of Archival Outlook.

Controversial Photos, Archvists’ Choices and Journalism

New York Times Magazine Cover: January 1995The New York Times Magazine published The Great Ivy League Nude Posture Photo Scandal in January of 1995. Still available online, it is a fascinating tale that took reporter Ron Rosenbaum on a wild hunt through multiple archives in a quest for long lost photographs. I spotted a link to the article in a post on Boing Boing – and once I started reading it I couldn’t stop.

The story includes thorough coverage of the research (and the footwork and the paperwork) it took to find the final resting place of some very controversial photographs. Taken as part of the orientation process of new students at Ivy League and Seven Sisters school campuses predominately during the 1940’s, 50’s and 60’s, these photos were theoretically taken to screen for students who needed remedial posture classes. William Herbert Sheldon was a driving force behind many of the photos. Best known for assigning people into three categories of body types in the 1940s, Sheldon based his categories of endomorphic, mesomorphic, and ectomorphic on measurements done using the student photographs. Rosenbaum’s quest was to find the real story behind the photos and to discover if any of the photos survived the purging fires at that occurred at many of the schools involved.

His first stop was Harvard’s archives:

Harley P. Holden, curator of Harvard’s archives, said that from the 1880’s to the 1940’s the university had its own posture-photo program in which some 3,500 pictures of its students were taken. Most were destroyed 15 or 20 years ago “for privacy scruples,” Holden said. Nonetheless, quite a few Harvard nudes can be found illustrating Sheldon’s book on body types, the Atlas of Men. Radcliffe took posture photos from 1931 to 1961; the curator there said that most of them had been destroyed (although some might be missing) and that none were taken by Sheldon.

A major turning point in Sheldon’s project came in 1950. He went to the University of Washington to further his plans to make an Altas of Women. The families of a few photographed females students at the university questioned the real purpose of the photographs. The resulting upheaval culminated in the destruction of many photographs. A Time article dated September 25, 1950, Revolt at Washington, documents the events in Washington and notes that over 800 photos were burned.

Rosenbaum’s article goes on mention that thousands of photos were subsequently burned at Harvard, Vassar and Yale in the 60’s and 70’s – but he continued to hunt for the ones that some believed had escaped into Sheldon’s private archives. A chain of contacts led Rosenbaum to Sheldon’s former associate Roland D. Elderkin. An elderly gentleman of 84 at the time of the story’s publication, Elderkin spent years assisting Sheldon. He took many of the photographs. And after being turned down by many archives, he found Sheldon’s records, photos and negatives a home in the National Anthropological Archives.

In 1987, the curators of the National Anthropological Archives acquired the remains of Sheldon’s life work, which were gathering dust in “dead storage” in a Goodwill warehouse in Boston. While there were solid archival reasons for making the acquisition, the curators are clearly aware that they harbor some potentially explosive material in their storage rooms. And they did not make it easy for me to gain access.

On my first visit, I was informed by a good-natured but wary supervisor that the restrictive grant of Sheldon’s materials by his estate would permit me to review only the written materials in the Sheldon archives. The actual photographs, he said, were off-limits. To see them, I would have to petition the chief of archivists. Determined to pursue the matter to the bitter end, I began the process of applying for permission.

In their online guide to collections I found the entry for SHELDON, WILLIAM HERBERT (1898-1977), Papers. It notes that the collection is 150 linear feet. It also includes a line that reads “RESTRICTION: The photographic material is not available for research.”

While Rosenbaum’s hunt was for the photographs, some of his most interesting discoveries came from the papers themselves. During his three month wait for permission to view the photos, he reviewed boxes of letters and notes. See Rosenbaum’s article for details – but it was Sheldon’s own words in those papers that revealed he held racist views and that he seemed more concerned with his research than with the psychological impact of his research on the girls whose photos he arranged to take.

When finally Rosenbaum was given the opportunity to review some 20,000 negatives of the photos (no photos and no names) we read:

A curator trundled in a library cart from the storage facility. Teetering on top of the cart were stacks of big, gray cardboard boxes. The curator handed me a pair of the white cotton gloves that researchers must use to handle archival material.

I love it – gray cardboard boxes and white cotton gloves. He even mentions the finding aids and gives examples of how the groups of photos are described. I also appreciate the earlier acknowledgment of the “solid archival reasons for making the acquisition”.

Rosenbaum looked through a lot of the negatives, mostly to verify that what the finding aids claimed were present were in fact in those gray boxes. He was struck by the contrast between the expressions on the mens’ and womens’ faces.

For the most part, the men looked diffident, oblivious. That’s not surprising considering that men of that era were accustomed to undressing for draft physicals and athletic-squad weigh-ins. But the faces of the women were another story. I was surprised at how many looked deeply unhappy, as if pained at being subjected to this procedure. On the faces of quite a few I saw what looked like grimaces, reflecting pronounced discomfort, perhaps even anger. I was not much more comfortable myself sitting there in the midst of stacks of boxes of such images. There I was at the end of my quest. I’d tracked down the fabled photographs, but the lessons of the posture-photo ritual were elusive.

He found the missing photos – but no easy answers. This is a great combination of a compelling story and a realistic representation of archives and archivists. The records don’t always hold the answers to the question you thought you were asking – but sometimes they hold secrets you hadn’t expected.

So many elements tie back to the choices made by individual archivists – sometimes made in the heat of the moment or under great community pressure. I think this story is a particularly poignant example of the downstream effects of these sorts of hard choices. It isn’t often that we can see cause and effect this clearly.

What would you have done? Would you have burned the photos or stored them away? Would you have stepped forward to take Sheldon’s records? If something like this happened today – what do you think the future of these photos might be?

Redacting Data – A T-Shirt and Other Thoughts

ThinkGeek Magic Numbers T-ShirtThinkGeek.com has created a funny t-shirt with the word redacted on it.

In case you missed it, there was a whole lot of furor early this month when someone posted an Advanced Access Content System (AACS) decryption key online. The key consists of 16 hexadecimal numbers that can be used to decrypt and copy any Blu-Ray or HD-DVD movie. Of course, it turns out to not be so simple – and I will direct you to a series of very detailed posts over at Freedom to Tinker if you want to understand the finer points of what the no longer secret key can and cannot do. The CyberSpeak column over at USA Today has a nice summary of the big picture and more details about what happened after the key was posted.

What amused me about this t-shirt (and prompted me to post about it here) is that it points out an interesting challenge of redacting data. How do you ensure that the data you leave behind doesn’t support deduction of the missing data? This is something I have thought about a great deal when designing web based software and worrying about security. It is not something I had spent much time thinking about related to archives and the protection of privacy. The joke from the shirt of course is that removing just the secret info but leaving everything else doesn’t do the job. This is a simplified case – let me give you an example that might make this more relevant.

Let’s say that you have records from a business in your archives and one series included is of personnel records. If you redacted those records to remove people’s names, SSNs and other private data, but left the records in their original order so that researchers could examine them for other information – would that be enough to protect the privacy of the business’s employees?

What if somewhere else in the collection you had the employee directory that listed names and phone extensions. No problem there – right? Ah.. but what if you assumed that the personnel records were in alphabetical order and then used the phone directory as a partial key to figuring out which personnel records were for which people?

This is definitely a hypothetical scenario, but it gets the idea across about how archivists need to take in the big picture to ensure the right level of privacy protection.

Besides, what archivist (or archivist in training) could resist a t-shirt with the word redacted on it?

Google, Privacy, Records Managment and Archives

BoingBoing.net posted on March 14 and March 15 about Google’s announcement of a plan to change their log retention policy . Their new plan is to strip parts of IP data from records in order to protect privacy. Read more in the AP article covering the announcement.

For those who are not familiar with them – IP addresses are made up of sets of numbers and look something like 192.39.288.3. To see how good a job they can do figuring out the location you are in right now – go to IP Address or IP Address Guide (click on ‘Find City’).

Google currently keeps IP addresses and their corresponding search requests in their log files (more on this in the personal info section of their Privacy Policy). Their new plan is that after 18-24 months they will permanently erase part of the IP address, so that the address no longer can point to a single computer – rather it would point to a set of 256 computers (according to the AP article linked above).

Their choice to permanently redact these records after a set amount of time is interesting. They don’t want to get rid of the records – just remove the IP addresses to reduce the chance that those records could be traced back to specific individuals. This policy will be retroactive – so all log records more than 18-24 months old will be modified.

I am not going to talk about how good an idea this is.. or if it doesn’t go far enough (plenty of others are doing that, see articles at EFF and Wired: 27B Stroke 6 ). I want to explore the impact of choices like these on the records we will have the opportunity to preserve in archives in the future.

With my ‘archives’ hat on – the bigger question here is how much the information that Google captures in the process of doing their business could be worth to the historians of the future. I wonder if we will one day regret the fact that the only way to protect the privacy of those who have done Google searches is to erase part of the electronic trail. One of the archivist tenants is to never do anything to the record you cannot undo. In order for Google to succeed at their goal (making the records useless to government investigators) – it will HAVE to be done such that it cannot be undone.

In my information visualization course yesterday, our professor spoke about how great maps are at tying information down. We understand maps and they make a fabulous stable framework upon which we can organize large volumes of information. It sounds like the new modified log records would still permit a general connection to the physical geographic world – so that is a good thing. I do wonder if the ‘edited’ versions of the log records will still permit the grouping of search requests such that they can be identified as having been performed by the same person (or at least from the same computer)? Without the context of other searches by the same person/computer, would this data still be useful to a historian? Would being able to examine the searches of a ‘community’ of 256 computers be useful (if that is what the IP updates mean).

What if Google could lock up the unmodified version of those stats in a box for 100 years (and we could still read the media it is recorded on and we had documentation telling us what the values meant and we had software that could read the records)? What could a researcher discover about the interests of those of us who used Google in 2007? Would we loose a lot by if we didn’t know what each individual user searched for? Would it be enough to know what a gillion groups of 256 people/computers from around the world were searching for – or would loosing that tie to an individual turn the data into noise?

Privacy has been such a major issue with the records of many businesses in the past. Health records and school records spring to mind. I also find myself thinking of Arthur Anderson who would not have gotten into trouble for shredding their records if they had done so according to their own records disposition schedules and policies. Googling Electronic Document Retention Policy got me over a million hits. Lots of people (lawyers in particular) have posted articles all over the web talking about the importance of a well implemented Electronic Document Retention Policy. I was intrigued by the final line of a USAToday article from January 2006 about Google and their battle with the government over a pornography investigation:

Google has no stated guidelines on how long it keeps data, leading critics to warn that retention could be for years because of inexpensive data-storage costs.

That isn’t true any longer.

For me, this choice by Google has illuminated a previously hidden perfect storm. That the US government often request of this sort of log data is clear, though Google will not say how often. The intersection of concerns about privacy, government investigations, document retention and tremendous volumes of private sector business data seem destined to cause more major choices such as the one Google has just announced. I just wonder what the researchers of the future will think of what we leave in our wake.