I think I renamed this post at least 12 times. My original intention was was to consider the impact of born digital records on the skills needed for the historian/researchers of the future. In addition I found myself exploring the dividing lines among a number of possible roles in ensuring access to the information written in the 1s and 0s of our born digital records.
After my last post about the impact of anonymization of Google Logs, a friend directed me to the work of Dr. Latanya Sweeney. Reading through the information about her research I found Trail Re-identification: Learning Who You are From Where You Have Been. Given enough data to work with, algorithms can be written that often can re-identify the individuals who performed the original searches. Carnegie Mellon University‘s Data Privacy Lab includes the Trails Learning Project with the goal of answering the question “How can people be identified to the trail of seemingly innocent and anonymous data they leave behind at different locations?”. So it seems that there may be a lot of born digital records that start out anonymous but that may permit ‘re-identification’ – given the application of the right tools or techniques. That is fine – historians have often needed to become detectives. They have spent years developing techniques for the analysis of paper documents to support ‘re-identification’. Who wrote this letter? Is this document real or a forgery? Who is the ‘Mildred’ referenced in this record?
The field of diplomatics studies the authenticity and provenance of documents by looking at everything from the paper they were written on to the style of writing to the ink used. I like the idea of using the term ‘digital diplomatics’ for the ever increasing process of verifying and validating born digital records. Google found me the Digital Diplomatics conference that took place earlier this year in Munich. Unfortunately it was more geared toward investigating how the use of computers can enhance traditional diplomatic approaches rather than how to authenticate the provenance of born digital records.
In the March 2007 issue of Scientific American I found the article A Digital Life. It talks primarily about the Microsoft Research project MyLifeBits. A team at Microsoft Research has spent the last six years creating what they call a ‘digital personal archive’ of team member Gordon Bell. This archive hopes to “record all of Bell’s communications with other people and machines, as well as the images he sees, the sounds he hears and the Web sites he visits–storing everything in a personal digital archive that is both searchable and secure.”
They are not blind to the long term challenges of preserving the data itself in some accessible format:
Digital archivists will have to constantly convert their files to the latest formats, and in some cases they may need to run emulators of older machines to retrieve the data. A small industry will probably emerge just to keep people from losing information because of format evolution.
The article concludes:
Digital memories will yield benefits in a wide spectrum of areas, providing treasure troves of information about how people think and feel. By constantly monitoring the health of their patients, future doctors may develop better treatments for heart disease, cancer and other illnesses. Scientists will be able to get a glimpse into the thought processes of their predecessors, and future historians will be able to examine the past in unprecedented detail. The opportunities are restricted only by our ability to imagine them.
Historians will have at least these two types of digital artifacts to explore – those gathered purposefully (such as the digital personal archives described above) and those generated as a byproduct of other activity (such as the Google search logs). Might these be the future parallels to the ‘manuscript’ and ‘corporate’ archives of today?
So we have both the ideas of the Digital Archivist and the Digital Historian. What about a Digital Archaeologist? I am not the first to ponder the possible future job of Digital Archaeologist. A bit of googling of the term led me to Dark Star Gazette and Dear Digital Archaeologist. Back in February of 2007 they pondered:
Will there be digital archaeologists, people who sift through our society’s discarded files and broken web links, carefully brushing away revisions and piecing together antiquated file formats? Will a team of grad students working on their PhDs a thousand, or two thousand, years from now be digging through old blog entries, still archived online in some remote descendant of the Wayback Machine or a copy of Google’s backup tapes?
I can only imagine a world in which this is in fact the case. Given that premise, at what point does the historian get too far from the primary source? If the historian does not understand exactly what a computer program does to extract the information they want from logs or ‘digital memory repositories’ – are they no longer working with the primary source?
Imagine any field in which historians do research. Music? Accounting? Science? In order examine and interpret primary source records a historian becomes something of an expert in that field. Consider the historian documenting the life of a famous scientist based partly on their lab notebooks. That historian would be best served by being taught how to interpret the notebooks themselves. The historian must be fluent in the language of the record in order to gain the most direct access to the information.
Ah – but if there really are Digital Archaeologists in the far future, perhaps they would be the connection between the primary source born digital records and the historians who wish to study them. Or perhaps the Digital Archivist, in a new take on ‘arranging records’, would transform digital chaos into meaningful records for use by researchers? The field of expertise on the historians part would need only be in the content of the records – not exactly how they were rescued from the digital abyss.
Would a Digital Historian be someone who only considers the history of the digital landscape or a historian especially well versed in the interpretation of digital records? In Daniel Cohen and Roy Rosenzweig‘s book Digital History: A Guide to Gathering, Preserving, And Presenting the Past on the Web they seem to use the term in the present tense to refer to historians who uses computers and technology to support and expand the reach of their research. Yet, in his essay Scarcity or Abundance? Preserving the Past in a Digital Era, Roy Rosenzweig proposes:
Future graduate programs will probably have to teach such social-scientific and quantitative methods as well as such other skills as “digital archaeology”(the ability to “read” arcane computer formats), “digital diplomatics” (the modern version of the old science of authenticating documents), and data mining (the ability to find the historical needle in the digital hay). In the coming years, “contemporary historians” may need more specialized research and “language” skills than medievalists do.
What is my imagined skill set for the historian of our digital world? A willingness to dig into the rich and chaotic world of born digital records. The ability to use tools and find partners to assist in the interpretation of those records. Equal comfort working at tables covered in dusty boxes and in the virtual domain of glowing computer terminals. And of course – the same curiosity and sense of adventure that has always drawn people to the path of being a historian.
We cannot predict the future – we can only do our best to adapt to what we see before us. I suspect the prefixing of every job title with the word ‘digital’ will disappear over time – much as the prefixing of everything with the letter ‘e’ to let you know that something was electronic or online has ebbed out of popular culture. As the historians and archivists of today evolve into the historians and archivists of tomorrow they will have to deal with born digital records – no matter what job title we give them.