Menu Close

Redacting Data – A T-Shirt and Other Thoughts

ThinkGeek Magic Numbers T-ShirtThinkGeek.com has created a funny t-shirt with the word redacted on it.

In case you missed it, there was a whole lot of furor early this month when someone posted an Advanced Access Content System (AACS) decryption key online. The key consists of 16 hexadecimal numbers that can be used to decrypt and copy any Blu-Ray or HD-DVD movie. Of course, it turns out to not be so simple – and I will direct you to a series of very detailed posts over at Freedom to Tinker if you want to understand the finer points of what the no longer secret key can and cannot do. The CyberSpeak column over at USA Today has a nice summary of the big picture and more details about what happened after the key was posted.

What amused me about this t-shirt (and prompted me to post about it here) is that it points out an interesting challenge of redacting data. How do you ensure that the data you leave behind doesn’t support deduction of the missing data? This is something I have thought about a great deal when designing web based software and worrying about security. It is not something I had spent much time thinking about related to archives and the protection of privacy. The joke from the shirt of course is that removing just the secret info but leaving everything else doesn’t do the job. This is a simplified case – let me give you an example that might make this more relevant.

Let’s say that you have records from a business in your archives and one series included is of personnel records. If you redacted those records to remove people’s names, SSNs and other private data, but left the records in their original order so that researchers could examine them for other information – would that be enough to protect the privacy of the business’s employees?

What if somewhere else in the collection you had the employee directory that listed names and phone extensions. No problem there – right? Ah.. but what if you assumed that the personnel records were in alphabetical order and then used the phone directory as a partial key to figuring out which personnel records were for which people?

This is definitely a hypothetical scenario, but it gets the idea across about how archivists need to take in the big picture to ensure the right level of privacy protection.

Besides, what archivist (or archivist in training) could resist a t-shirt with the word redacted on it?

Posted in privacy