Menu Close

Category: historical research

Epidemiological Research and Archival Records: Source of Records Used for Research Fails to Make the News

Typist wearing mask, New York City, October 16, 1918 (NARA record 165-WW-269B-16)In early April, Reuters ran an article that was picked up by YahooNews titled Closing Schools reduced flu deaths in 1918. I was immediately convinced that archival records must have supported this research – even though no mention of that was included in the article. The article did tell me that it was Dr. Richard Hatchett of the National Institute of Allergy and Infectious Diseases (NIAID) who led the research.

I sent him an email asking about where the data for his research came from. Did the NIH have a set of data from long ago? Here is an excerpt from his kind reply:

Unfortunately, nobody kept track of data like this and you can see the great lengths we went to to track it down. Many of the people we thank in our acknowledgment at the end of the paper tracked down and provided information in local or municipal archives. For Baltimore, I came up and spent an entire day in the library going through old newspapers on microfilm. Some of the information had been gathered by previous historians in works on the epidemic in individual cities (Omaha — an unpublished Master’s thesis — and Newark are examples). Gathering the information was extremely arduous and probably one of the reasons no one had looked at this systematically before. Fortunately, several major newspapers (the NYTimes, Boston Globe, Washington Post, Atlanta Journal-Constitution, etc.) now have online archives going back at least until 1918 that facilitated our search.

Please let me know if you have any other questions. We were amateurs and pulling the information together took a lot longer than we would ever have imagined.

He also sent me a document titled “Supporting Information Methods”. This turned out to be 37 pages of detailed references found to support their research. They were hunting for three types of information: first reported flu cases, amplifying events (such as Liberty Loan Parades ) and interventions (such as quarantines, school closings and bands on public gatherings).

Many of the resources cited are newspapers (see The Baltimore Sun’s 1918 flu pandemic timeline for examples of what can be found in newspapers), but I was more intrigued by the wide range of non-newspaper records used to support this research. A few examples:

  • Chicago (First reported case): Robertson JD. Report and handbook of the Department of Health of the City of Chicago for the years 1911 to 1918 inclusive. Chicago, 1919.
  • Cleveland (School closings): The City Record of the Cleveland City Council, October 21, 1918, File No. 47932, citing promulgation of health regulations by Acting Commissioner of Health H.L. Rockwood.
  • New Orleans (Ban on public gatherings): Parish of Orleans and City of New Orleans. Report of the Board of Health, 1919, p. 131.
  • Seattle (Emergency Declaration): Ordinance No. 38799 of the Seattle City Council, signed by Mayor Hanson October 9, 1918.

The journal article referenced in the Reuter’s story, Public health interventions and epidemic intensity during the 1918 influenza pandemic, was published in the Proceedings of the National Academy of Sciences (PNAS) and is available online.

The good news here is that the acknowledgment that Dr. Hatchett mentions in his email includes this passage:

The analysis presented here would not have been possible without the contributions of a large number of public health and medical professionals, historians, librarians, journalists, and private citizens […followed by a long list of individuals].

The bad news is that the use of archival records is not mentioned in the news story.

We frequently hear about how little money there is at most archives. Cutbacks in funding are the norm. Every few weeks we hear of archives forced to cut their hours, staff or projects. Public understanding of the important ways that archival records are used can only help to reverse this trend.

Maybe we need a bumper sticker to hand out to new researchers. Something catchy and a little pushy – something that says “Tell the world how valuable our records are!” – only shorter.

  • If You Use Archival Records – Go On The Record
  • Put Primary Sources in the Spotlight
  • Archivists for Footnotes: Keep the paper trail alive
  • Archives Remember: Don’t Forget Them

I don’t love any of these – anyone else feeling wittier and willing to share?

(For more images of the 1918 Influenza Epidemic, visit the National Museum of Health and Medicine’s Otis Historical Archives’ Images from the 1918 Influenza Epidemic.)

Considering Historians, Archivists and Born Digital Records

I think I renamed this post at least 12 times. My original intention was was to consider the impact of born digital records on the skills needed for the historian/researchers of the future. In addition I found myself exploring the dividing lines among a number of possible roles in ensuring access to the information written in the 1s and 0s of our born digital records.

After my last post about the impact of anonymization of Google Logs, a friend directed me to the work of Dr. Latanya Sweeney. Reading through the information about her research I found Trail Re-identification: Learning Who You are From Where You Have Been. Given enough data to work with, algorithms can be written that often can re-identify the individuals who performed the original searches. Carnegie Mellon University‘s Data Privacy Lab includes the Trails Learning Project with the goal of answering the question “How can people be identified to the trail of seemingly innocent and anonymous data they leave behind at different locations?”. So it seems that there may be a lot of born digital records that start out anonymous but that may permit ‘re-identification’ – given the application of the right tools or techniques. That is fine – historians have often needed to become detectives. They have spent years developing techniques for the analysis of paper documents to support ‘re-identification’. Who wrote this letter? Is this document real or a forgery? Who is the ‘Mildred’ referenced in this record?

The field of diplomatics studies the authenticity and provenance of documents by looking at everything from the paper they were written on to the style of writing to the ink used. I like the idea of using the term ‘digital diplomatics’ for the ever increasing process of verifying and validating born digital records. Google found me the Digital Diplomatics conference that took place earlier this year in Munich. Unfortunately it was more geared toward investigating how the use of computers can enhance traditional diplomatic approaches rather than how to authenticate the provenance of born digital records.

In the March 2007 issue of Scientific American I found the article A Digital Life. It talks primarily about the Microsoft Research project MyLifeBits. A team at Microsoft Research has spent the last six years creating what they call a ‘digital personal archive’ of team member Gordon Bell. This archive hopes to “record all of Bell’s communications with other people and machines, as well as the images he sees, the sounds he hears and the Web sites he visits–storing everything in a personal digital archive that is both searchable and secure.”

They are not blind to the long term challenges of preserving the data itself in some accessible format:

Digital archivists will have to constantly convert their files to the latest formats, and in some cases they may need to run emulators of older machines to retrieve the data. A small industry will probably emerge just to keep people from losing information because of format evolution.

The article concludes:

Digital memories will yield benefits in a wide spectrum of areas, providing treasure troves of information about how people think and feel. By constantly monitoring the health of their patients, future doctors may develop better treatments for heart disease, cancer and other illnesses. Scientists will be able to get a glimpse into the thought processes of their predecessors, and future historians will be able to examine the past in unprecedented detail. The opportunities are restricted only by our ability to imagine them.

Historians will have at least these two types of digital artifacts to explore – those gathered purposefully (such as the digital personal archives described above) and those generated as a byproduct of other activity (such as the Google search logs). Might these be the future parallels to the ‘manuscript’ and ‘corporate’ archives of today?

So we have both the ideas of the Digital Archivist and the Digital Historian. What about a Digital Archaeologist? I am not the first to ponder the possible future job of Digital Archaeologist. A bit of googling of the term led me to Dark Star Gazette and Dear Digital Archaeologist. Back in February of 2007 they pondered:

Will there be digital archaeologists, people who sift through our society’s discarded files and broken web links, carefully brushing away revisions and piecing together antiquated file formats? Will a team of grad students working on their PhDs a thousand, or two thousand, years from now be digging through old blog entries, still archived online in some remote descendant of the Wayback Machine or a copy of Google’s backup tapes?

I can only imagine a world in which this is in fact the case. Given that premise, at what point does the historian get too far from the primary source? If the historian does not understand exactly what a computer program does to extract the information they want from logs or ‘digital memory repositories’ – are they no longer working with the primary source?

Imagine any field in which historians do research. Music? Accounting? Science? In order examine and interpret primary source records a historian becomes something of an expert in that field. Consider the historian documenting the life of a famous scientist based partly on their lab notebooks. That historian would be best served by being taught how to interpret the notebooks themselves. The historian must be fluent in the language of the record in order to gain the most direct access to the information.

Ah – but if there really are Digital Archaeologists in the far future, perhaps they would be the connection between the primary source born digital records and the historians who wish to study them. Or perhaps the Digital Archivist, in a new take on ‘arranging records’, would transform digital chaos into meaningful records for use by researchers? The field of expertise on the historians part would need only be in the content of the records – not exactly how they were rescued from the digital abyss.

Would a Digital Historian be someone who only considers the history of the digital landscape or a historian especially well versed in the interpretation of digital records? In Daniel Cohen and Roy Rosenzweig‘s book Digital History: A Guide to Gathering, Preserving, And Presenting the Past on the Web they seem to use the term in the present tense to refer to historians who uses computers and technology to support and expand the reach of their research. Yet, in his essay Scarcity or Abundance? Preserving the Past in a Digital Era, Roy Rosenzweig proposes:

Future graduate programs will probably have to teach such social-scientific and quantitative methods as well as such other skills as “digital archaeology”(the ability to “read” arcane computer formats), “digital diplomatics” (the modern version of the old science of authenticating documents), and data mining (the ability to find the historical needle in the digital hay). In the coming years, “contemporary historians” may need more specialized research and “language” skills than medievalists do.

What is my imagined skill set for the historian of our digital world? A willingness to dig into the rich and chaotic world of born digital records. The ability to use tools and find partners to assist in the interpretation of those records. Equal comfort working at tables covered in dusty boxes and in the virtual domain of glowing computer terminals. And of course – the same curiosity and sense of adventure that has always drawn people to the path of being a historian.

We cannot predict the future – we can only do our best to adapt to what we see before us. I suspect the prefixing of every job title with the word ‘digital’ will disappear over time – much as the prefixing of everything with the letter ‘e’ to let you know that something was electronic or online has ebbed out of popular culture. As the historians and archivists of today evolve into the historians and archivists of tomorrow they will have to deal with born digital records – no matter what job title we give them.

Google, Privacy, Records Managment and Archives posted on March 14 and March 15 about Google’s announcement of a plan to change their log retention policy . Their new plan is to strip parts of IP data from records in order to protect privacy. Read more in the AP article covering the announcement.

For those who are not familiar with them – IP addresses are made up of sets of numbers and look something like To see how good a job they can do figuring out the location you are in right now – go to IP Address or IP Address Guide (click on ‘Find City’).

Google currently keeps IP addresses and their corresponding search requests in their log files (more on this in the personal info section of their Privacy Policy). Their new plan is that after 18-24 months they will permanently erase part of the IP address, so that the address no longer can point to a single computer – rather it would point to a set of 256 computers (according to the AP article linked above).

Their choice to permanently redact these records after a set amount of time is interesting. They don’t want to get rid of the records – just remove the IP addresses to reduce the chance that those records could be traced back to specific individuals. This policy will be retroactive – so all log records more than 18-24 months old will be modified.

I am not going to talk about how good an idea this is.. or if it doesn’t go far enough (plenty of others are doing that, see articles at EFF and Wired: 27B Stroke 6 ). I want to explore the impact of choices like these on the records we will have the opportunity to preserve in archives in the future.

With my ‘archives’ hat on – the bigger question here is how much the information that Google captures in the process of doing their business could be worth to the historians of the future. I wonder if we will one day regret the fact that the only way to protect the privacy of those who have done Google searches is to erase part of the electronic trail. One of the archivist tenants is to never do anything to the record you cannot undo. In order for Google to succeed at their goal (making the records useless to government investigators) – it will HAVE to be done such that it cannot be undone.

In my information visualization course yesterday, our professor spoke about how great maps are at tying information down. We understand maps and they make a fabulous stable framework upon which we can organize large volumes of information. It sounds like the new modified log records would still permit a general connection to the physical geographic world – so that is a good thing. I do wonder if the ‘edited’ versions of the log records will still permit the grouping of search requests such that they can be identified as having been performed by the same person (or at least from the same computer)? Without the context of other searches by the same person/computer, would this data still be useful to a historian? Would being able to examine the searches of a ‘community’ of 256 computers be useful (if that is what the IP updates mean).

What if Google could lock up the unmodified version of those stats in a box for 100 years (and we could still read the media it is recorded on and we had documentation telling us what the values meant and we had software that could read the records)? What could a researcher discover about the interests of those of us who used Google in 2007? Would we loose a lot by if we didn’t know what each individual user searched for? Would it be enough to know what a gillion groups of 256 people/computers from around the world were searching for – or would loosing that tie to an individual turn the data into noise?

Privacy has been such a major issue with the records of many businesses in the past. Health records and school records spring to mind. I also find myself thinking of Arthur Anderson who would not have gotten into trouble for shredding their records if they had done so according to their own records disposition schedules and policies. Googling Electronic Document Retention Policy got me over a million hits. Lots of people (lawyers in particular) have posted articles all over the web talking about the importance of a well implemented Electronic Document Retention Policy. I was intrigued by the final line of a USAToday article from January 2006 about Google and their battle with the government over a pornography investigation:

Google has no stated guidelines on how long it keeps data, leading critics to warn that retention could be for years because of inexpensive data-storage costs.

That isn’t true any longer.

For me, this choice by Google has illuminated a previously hidden perfect storm. That the US government often request of this sort of log data is clear, though Google will not say how often. The intersection of concerns about privacy, government investigations, document retention and tremendous volumes of private sector business data seem destined to cause more major choices such as the one Google has just announced. I just wonder what the researchers of the future will think of what we leave in our wake.

The Archives and Archivists Listserv: hoping for a stay of execution

There has been a lot of discussion (both on the Archives & Archivists (A&A) Listserv and in blog posts) about the SAA‘s recent decision to not preserve the A&A listserv posts from 1996 through 2006 when they are removed from the listserv’s old hosting location at Miami University of Ohio.

Most of the outcry against this decision has fallen into two camps:

  • Those who don’t understand how the SAA task force assigned to appraise the listserv archives could decide it does not have informational value – lots of discussion about how the listserv reflects the move of archivists into the digital age as well as it’s usefulness for students
  • Those who just wish it wouldn’t go away because they still use it to find old posts. Some mentioned that there are scholarly papers that reference posts in the listserv archives as their primary sources.

I added this suggestion on the listserv:

I would have thought that the Archives Listserv would be the ideal test case for developing a set of best practices for archiving an organization’s web based listserv or bboard.

Perhaps a graduate student looking for something to work on as an independent project could take this on? Even if they only got permission for working with posts from 2001 onward [post 2001 those who posted had to agree to ‘terms of participation’ that reduce issues with copyright and ownership] – I suspect it would still be worthwhile.

I have always found that you can’t understand all the issues related to a technical project (like the preservation of a listserv) until you have a real life case to work on. Even if SAA doesn’t think we need to keep the data forever – here is the perfect set of data for archivists to experiment with. Any final set of best practices would be meant for archivists to use in the future – and would be all the easier to comprehend if they dealt with a listserv that many of them are already familiar with.

Another question: couldn’t the listserv posts still be considered ‘active records’? Many current listserv posters claim they still access the old list’s archives on a regular basis. I would be curious what the traffic for the site is. That is one nice side effect of this being on a website – it makes the usage of records quantifiable.

There are similar issues in the analog world when records people still want to use loose their physical home and are disposed of but, as others have also pointed out, digital media is getting cheaper and smaller by the day. We are not talking about paying rent on a huge wharehouse or a space that needs serious temperature and humidity control.

I was glad to see Rick Prelinger’s response on the current listerv that simply reads:

The Internet Archive is looking into this issue.

I had already checked when I posted my response to the listerv yesterday – having found my way to the A&A old listserv page in the Wayback Machine. For now all that is there is the list of links to each week’s worth of postings – nothing beyond that has been pulled in.

I have my fingers crossed that enough of the right people have become aware of the situation to pull the listserv back from the brink of the digital abyss.

Understanding Born Digital Records: Journalists and Archivists with Parallel Challenges

My most recent Archival Access class had a great guest speaker from the Journalism department. Professor Ira Chinoy is currently teaching a course on Computer-Assisted Reporting. In the first half of the session, he spoke about ways that archival records can fuel and support reporting. He encouraged the class to brainstorm about what might make archival records newsworthy. How do old records that have been stashed away for so long become news? It took a bit of time, but we got into the swing of it and came up with a decent list. He then went through his own list and gave examples of published news stories that fit each of the scenarios.

In the second half of class he moved on to address issues related to the freedom of information and struggling to gain access to born digital public records. Journalists are usually early in the food chain of those vying for access to and understanding of federal, state and local databases. They have many hurdles. They must learn what databases are being kept and figure out which ones are worth pursuing. Professor Chinoy relayed a number of stories about the energy and perseverance required to convince government officials to give access to the data they have collected. The rules vary from state to state (see the Maryland Public Information Act as an example) and journalists often must quote chapter and verse to prove that officials are breaking the law if they do not hand over the information. There are officials who deny that the software they use will even permit extractions of the data – or that there is no way to edit the records to remove confidential information. Some journalists find themselves hunting down the vendors of proprietary software to find out how to perform the extract they need. They then go back to the officials with that information in the hopes of proving that it can be done. I love this article linked to in Prof. Chinoy’s syllabus: The Top 38 Excuses Government Agencies Give for Not Being Able to Fulfill Your Data Request (And Suggestions on What You Should Say or Do).

After all that work – just getting your hands on the magic file of data is not enough. The data is of no use without the decoder ring of documentation and context.

I spent most of the 1990s designing and building custom databases, many for federal government agencies. There are an almost inconceivable number of person hours that go into the creation of most of these systems. Stakeholders from all over the organization destined to use the system participate in meetings and design reviews. Huge design documents are created and frequently updated … and adjustments to the logic are often made even after the system goes live (to fix bugs or add enhancements). The systems I am describing are built using complex relational databases with hundreds of tables. It is uncommon for any one person to really understand everything in it – even if they are on the IT team for the full development life cycle.

Sometimes you get lucky and the project includes people with amazing technical writing skills, but usually those talented people are aimed at writing documentation for users of the system. Those documents may or may not explain the business processes and context related to the data. They will rarely expose the relationship between a user’s actions on a screen and the data as it is stored in the underlying tables. Some decisions are only documented in the application code itself and that is not likely to be preserved along with the data.

Teams charged with the support of these systems and their users often create their own documents and databases to explain certain confusing aspects of the system and to track bugs and their fixes. A good analogy here would be to the internal files that archivists often maintain about a collection – the notes that are not shared with the researchers but instead help the archivists who work with the collection remember such things as where frequently requested documents are or what restrictions must be applied to certain documents.

So where does that leave those who are playing detective to understand the records in these systems? Trying to figure out what the data in the tables mean based on the understanding of end-users can be a fool’s errand – and that is if you even have access to actual users of the system in the first place. I don’t think there is any easy answer given the realities of how many unique systems of managing data are being used throughout the public sector.

Archivists often find themselves struggling with the same problems. They have to fight to acquire and then understand the records being stored in databases. I suspect they have even less chance of interacting with actual users of the original system that created the records – though I recall discussions in my appraisal class last term about all the benefits of working with the producers of records long before they are earmarked to head to the archives. Unfortunately, it appeared that this was often the exception rather than the rule – even if it is the preferred scenario.

The overly ambitious and optimistic part had the idea that what ‘we’ really need is a database that lists common commercial off-the-shelf (COTS) packages used by public agencies – along with information on how to extract and redact data from these packages. For those agencies using custom systems, we could include any information on what company or contractors did the work – that sort of thing can only help later. Or how about just a list of which agencies use what software? Does something like this exist? The records of what technology is purchased are public record – right? Definitely an interesting idea (for when I have all that spare time I dream about). I wonder if I set up a wiki for people to populate with this information if people would share what they already know.

I would like to imagine a future world in which all this stuff is online and you can login and download any public record you like at any time. You can get a taste of where we are on the path to achieving this dream on the archives side of things by exploring a single series of electronic records published on the US National Archives site. For example, look at the search screen for World War II Army Enlistment Records. It includes links to sample data, record group info and an FAQ. Once you make it to viewing a record – every field includes a link to explain the value. But even this extensive detail would not be enough for someone to just pick up these records and understand them – you still need to understand about World War II and Army enlistment. You still need the context of the events and this is where the FAQ comes in. Look at the information they provide – and then take a moment to imagine what it would take for a journalist to recreate a similar level of detailed information for new database records being created in a public agency today (especially when those records are guarded by officials who are leery about permitting access to the records in the first place).

This isn’t a new problem that has appeared with born digital records. Archivists and journalists have always sought the context of the information with which they are working. The new challenge is in the added obstacles that a cryptic database system can add on top of the already existing challenges of decrypting the meaning of the records.

Archivists and Journalists care about a lot of the same issues related to born digital records. How do we acquire the records people will care about? How do we understand what they mean in the context of why and how they were created? How do we enable access to the information? Where do we get the resources, time and information to support important work like this?

It is interesting for me find a new angle from which to examine rapid software development. I have spent so much of my time creating software based on the needs of a specific user community. Usually those who are paying for the software get to call the shots on the features that will be included. Certain industries do have detailed regulations designed to promote access by external observers (I am thinking of applications related to medical/pharmaceutical research and perhaps HAZMAT data) but they are definitely exceptions.

Many people are worrying about how we will make sure that the medium upon which we record our born digital records remains viable. I know that others are pondering how to make sure we have software that can actually read the data such that it isn’t just mysterious 1s and 0s. What I am addressing here is another aspect of preservation – the preservation of context. I know this too is being worried about by others, but while I suspect we can eventually come up with best practices for the IT folks to follow to ensure we can still access the data itself – it will ultimately be up to the many individuals carrying on their daily business in offices around the world to ensure that we can understand the information in the records. I suppose that isn’t new either – just another reason for journalists and archivists to make their voices heard while the people who can explain the relationships between the born digital records and the business processes that created them are still around to answer questions.

Spring 2007:Access and Information Visualization

I don’t often post explicitly about my experiences as a graduate student – but I want to let everyone know about the focus of my studies for the next four months. I am taking two courses that I hope will complement one another. One course is on Archival Access (description, MARC, DACS, EAD and theory). The other is on Information Visualization over in the Computer Science department.

My original hope was that in my big Information Visualization final project I might get the opportunity to work with some aspect of archives and/or digital records. I want to understand how to improve access and understanding of the rich resources in the structured digital records repositories in archives around the world. What has already happened just one week into the term is that I find myself cycling through multiple points of view as I do my readings.

How can we support interaction with archival records by taking advantage of the latest information visualization techniques and tools? We can make it easier to understand what records are in a repository – both analog and digital records. I have been imagining interactive visual representations of archives collections, time periods, areas of interest and so forth. When you visit an archives’ website – it can often be so hard to get your head around the materials they offer. I suspect that this is often the case even when you are standing in the same building as the collections. In my course on appraisal last term we talked a lot about examining the collections that were already present on the path to creating a collecting policy. I am optimistic about ways that visualizing this information could improve everyone’s understanding of what an archives contains, for archivists and researchers alike.

Once I get myself to stop those daydreams… I move on to the next set of daydreams. What about the products of these visual analytics tools? How do we captured interactive visualizations in archives? This seems like a greater challenge than the average static digital record (as if there really is such an animal as an ‘average’ digital record). I can see a future in which major government and business decisions are made based on the interpretation of such interactive data models, graphs and charts. Instead of needing just the ‘records’ – don’t we need a way to recreate the experience that the original user had when interacting with the records?

This (unsurprisingly) takes me back to the struggle of how to define exactly what a record is in the digital world. Is the record a still image of a final visualization? Can this actually capture the full impact of an interactive and possible 3D visualization? With information visualization being such a rich and dynamic field I feel that there is a good chance that the race to create new methods and tools will zoom far ahead of plans to preserve its products.

I think some of my class readings will take extra effort (and extra time) as my mind cycles through these ideas. I think that a lot of this will come out in my posts over the next four months. And I still have strong hopes for rallying a team in my InfoViz class to work on an archives related project.

Book Review: Past Time, Past Place: GIS for History

Past Time, Past Place: GIS for History consists mainly of 11 case studies of geographic information systems being applied to the study of history. It includes a nice sprinkling of full color maps and images and a 20 page glossary of GIS terms. Each case study includes a list of articles and other resources for further reading.

The book begins with an introduction by the editor, Anne Kelly Knowles. This chapter explains the basics of using GIS to study history, as well as giving an overview of how the book is organized.

The meat of the book are the case studies covering the following topics:

I suspect that different audiences will take very different ideas away from this book. I was for looking for information about GIS and historical records (this is another book found during my mad hunt for information on the appraisal and preservation of GIS records) and found a bit of related information to add to my research. I think this book will be of interest to those who fall in any of the following categories:

  • Archivists curious about how GIS might enhance access to and understanding of the records under their care
  • Historians interested in understanding how GIS can be used to approach historical research in new ways
  • History buffs who love reading a good story (complete with pictures)
  • Map aficionados curious about new and different kinds of information that can be portrayed with GIS

I especially loved the maps and other images. I am a bit particular when it comes to the quality of graphics – but this book comes through with bright colors and clear images. The unusual square book format (measuring 9″x9″) gave those who arranged the layout lots of room to work – and they took full advantage of the space.

No matter if you plan to read the case studies for the history being brought to life or are looking for “how-tos” as you tackle your own GIS-History project – this book deserves some attention. and US National Archives records

Thanks to Digitization 101‘s recent post “Footnote launches and announces partnership with National Archives” I was made aware of the big news about the digitization of the US National Archives’ records. has gone live with the first of apparently many planned installments of digitized NARA records. My first instinct was one of suspicion. In the shadow of recent historian alarm about the Smithsonian/Showtime deal, I think its valid to be concerned about new agreements between government agencies and private companies.

That said, I am feeling much more positive based on the passage below from the the January 10th National Archives Press Release about the agreement with Footnote (emphasis mine):

This non-exclusive agreement, beginning with the sizeable collection of materials currently on microfilm,will enable researchers and the general public to access millions of newly-digitized images of the National Archives historic records on a subscription basis from the Footnote web site. By February 6, the digitized materials will also be available at no charge in National Archives research rooms in Washington D.C. and regional facilities across the country. After an interval of five years, all images digitized through this agreement will be available at no charge through the National Archives web site .

This sounds like a win-win situation. NARA gets millions of records digitized (4.5 million and counting according to the press release). These records will be highlighed on the Footnote web site. They will have the advantages of Footnote’s search and browse interfaces (which I plan to do an in depth review of in the next week).

When signing up for my free account – I actually read through the entire Footnote Terms of Service including this passage (within the section labeled ‘Our Intellectual Property Rights’ – again, emphasis mine):

Content on the Website is provided to you AS IS for your information and personal use only as permitted through the functionality of the Website and may not be used, copied, reproduced, distributed, transmitted, broadcast, displayed, sold, licensed, or otherwise exploited for any other purposes whatsoever without the prior written consent of the respective owners . reserves all rights not expressly granted in and to the Website and the Content. You agree not to engage in the use, copying, or distribution of any of the Content other than expressly permitted herein, including any use, copying, or distribution of User Submissions of third parties obtained through the Website for any commercial purposes. If you download or print a copy of the Content for personal use, you must retain all copyright and other proprietary notices contained therein.

These terms certainly are no different from that under which most archives operate – but it did give me a moment of wondering how many extra hoops one would need to jump through if you wanted to use any of the NARA records found in Footnote for a major project like a book. A quick experiment with the Pennsylvania Archives (which are available for free with registration) did not show me any copyright information or notices related to rights. I downloaded an image to see what ‘copyright and other proprietary notices’ I might find and found none.

In his post “The Flawed Agreement between the National Archives and Footnote, Inc.“, Dan Cohen expresses his views of the agreement. I had been curious about what percentage of the records being digitized were out of copyright – Dan says they all are. If all of the records are out of copyright – exactly what rights are reserving (in the passage from the terms of service shown above)? I also agree with him in his frustration about the age restriction in place for using (you have to be over 18).

My final opinion about the agreement itself will depend on answers to a few more questions:

1) Were any of the records recently made available on already digitized and available via the website?

2) What percentage of the records that were digitized by Footnote would have been digitized by NARA without this agreement?

3) What roadblocks will truly be set in place for those interested in using records found on

4) What interface will be available to those accessing the records for free in “National Archives research rooms in Washington D.C. and regional facilities across the country” (from the press release above)? Will it be the website interface or via NARA’s own Archival Research Catalog (ARC) or Access to Archival Databases (AAD)?

If the records that Footnote has digitized and made available on would not otherwise have been digitized over the course of the next five years (a big if) then I think this is an interesting solution. Even the full $100 fee for a year subscription is much more reasonable than many other research databases out there (and certainly cheaper than even a single night hotel room within striking distance of National Archives II).

As I mentioned above, I plan to post a review of the search and browse interfaces in the next week. The support folks have given me permission to include screen shots – so if this topic is of interest to you, keep an eye out for it.

OBR: Optical Braille Recognition

In the interest of talking about new topics – I opened my little moleskine notebook and found a note to myself wondering if it is possible to scan Braille with the equivalent of OCR.

Enter Optical Braille Recognition or OBR. Created by a company called Neovision, this software will permit anyone with a scanner and a Windows platform computer to ‘read’ Braille documents.

Why was this in my notebook? I was thinking about unusual records that must be out in the world and wondering about how to improve access to the information within them. So if there are Braille records out there – how does the sighted person who can’t read Braille get at that information? Here is an answer. Not only does the OBR permit reading of Braille documents – but it would permit recreation of these same documents in Braille from any computer that has the right technology.

Reading through the Wikipedia Braille entry, I learned a few things that would throw a monkey wrench into some of this. For example – “because the six-dot Braille cell only offers 64 possible combinations, many Braille characters have different meanings based on their context”. The page on Braille code lists links to an assortment of different Braille codes which translate the different combinations of dots into different characters depending on the language of the text. On top of the different Braille codes used to translate Braille into specific letters or characters – there is another layer to Braille transcription. Grade 2 Braille uses a specific set of contractions and shorthand – and is used for official publications and things like menus, while Grade 3 Braille is used in the creation of personal letters.

It all goes back to context (of course!). If you have a set of Braille documents with no information on them giving you details of what sort of documents they are – you have a document that is effectively written in code. Is it music written in Braille Music notation? Is it a document in Hiranga using the Japanese Code? Is this a personal letter using Grade 3 Braille shorthand? You get the idea.

I suspect that one might even want to include a copy of both the Braille Code and the Braille transcription rules that go with a set of documents as a key to their translation in the future. If there are frequently used records – they could perhaps include the transcription (both literal transcription and a ‘translation’ of all the used Braille contractions) to improve access of analog records.

In a quick search for collections including braille manuscripts it should come as no surprise that the Helen Keller Archives does have “braille correspondence”. I also came across the finding aids for the Harvard Law School Examinations in Braille (1950-1985) and The Donald G. Morgan Papers (the papers of a blind professor at Mount Holyoke College).

I wonder how many other collections have Braille records or manuscripts. Has anyone reading this ever seen or processed a collection including Braille records?

129th anniversary of Thomas Edison’s Invention of the Phonograph

Phonograph Patent Drawing
Phonograph Patent Drawing by T.A. Edison. May 18, 1880. RG 241.Patent #227,679

In honor of today’s 129th anniversary of Thomas Edison’s announcement of his invention of the phonograph, I thought I would share an idea that came to me this past summer. I had the pleasure of taking a course on Visual and Sound Materials taught by Tom Connors, the curator of the National Public Broadcasting Archives. This course explored the history of audio recording, photography, film and broadcasting technology.

When explaining the details of the first phonographs, Prof. Connors mentioned that certain sounds recorded better. Recordings of horns and the pitch of tenor singers were reproduced most accurately – or at least played back with the best sound. We also talked about the change in access to music brought about eventually by the availability of records at the corner store. The most popular recordings were (not surprisingly) of music with lots of horns or the recordings of individual singers like Enrico Caruso. So my question is how might music have evolved differently if different music had sounded better when reproduced by the phonograph? Would Caruso have been replaced at the top of the heap by someone else with a different vocal range? Would Jazz music evolved differently? Would there have been other types of music altogether if string instruments or wind instruments reproduced as well as the bright sounding horns?

In our class we also discussed the impact of the introduction of long playing records. Suddenly you could have 30 minutes of music at a time – with no need to have anyone playing the piano or hovering over the phonograph to change the disk. This led to the movement of music into the background of daily life – in contrast with the earlier focus on playing live music for entertainment in people’s homes. It also paved the way for people to experience music alone – you no longer needed to be in the same room as the musicians. No longer was music exclusively something shared and witnessed in a group. In my opinion this was the start of the long path that led to the possibility of having your own personal ‘sound track’ via first the walkman and now the digital audio player such as the iPod.

These ideas are still about archives and research. From my point of view it is just another example of how a different kind of context can impact our understanding of history. There are so many ways in which little events can impact the big picture. Edison wasn’t pursuing a dream of access to music (though that was included on his list of possible uses for the phonograph) – he was more interested in dictation, audio books for the blind and recording the last words of the soon to be dearly departed.

I love having the ability to examine the original ideas and intentions of an inventor and it came as no surprise to me that some of the most interesting resources out there for learning more about Edison and his invention of the phonograph traced back to both the Library of Congress and the U.S. National Archives and Records Administration. The LOC’s American Memory project page for The Motion Pictures and Sound Recordings of the Edison Companies gives a wide range of access to both background information and the option to listen to early Edison recordings. NARA’s page for the digital image above (originally found in Wikipedia) can be found online via NARA’s Archival Research Catalog (ARC) by searching for ‘Edison Phonograph’.

Hurrah for the invention of the phonograph and for all the archives that keep information for us to use in exploring ideas! Listen for horns and tenor voices in the next song you hear – and noticed if you are listening alone or with a group.

A final question: how can providing easy access to more big picture historical context help users to understand how the records they examine fit into the complicated real world of long ago?