Menu Close

Spring 2007:Access and Information Visualization

I don’t often post explicitly about my experiences as a graduate student – but I want to let everyone know about the focus of my studies for the next four months. I am taking two courses that I hope will complement one another. One course is on Archival Access (description, MARC, DACS, EAD and theory). The other is on Information Visualization over in the Computer Science department.

My original hope was that in my big Information Visualization final project I might get the opportunity to work with some aspect of archives and/or digital records. I want to understand how to improve access and understanding of the rich resources in the structured digital records repositories in archives around the world. What has already happened just one week into the term is that I find myself cycling through multiple points of view as I do my readings.

How can we support interaction with archival records by taking advantage of the latest information visualization techniques and tools? We can make it easier to understand what records are in a repository – both analog and digital records. I have been imagining interactive visual representations of archives collections, time periods, areas of interest and so forth. When you visit an archives’ website – it can often be so hard to get your head around the materials they offer. I suspect that this is often the case even when you are standing in the same building as the collections. In my course on appraisal last term we talked a lot about examining the collections that were already present on the path to creating a collecting policy. I am optimistic about ways that visualizing this information could improve everyone’s understanding of what an archives contains, for archivists and researchers alike.

Once I get myself to stop those daydreams… I move on to the next set of daydreams. What about the products of these visual analytics tools? How do we captured interactive visualizations in archives? This seems like a greater challenge than the average static digital record (as if there really is such an animal as an ‘average’ digital record). I can see a future in which major government and business decisions are made based on the interpretation of such interactive data models, graphs and charts. Instead of needing just the ‘records’ – don’t we need a way to recreate the experience that the original user had when interacting with the records?

This (unsurprisingly) takes me back to the struggle of how to define exactly what a record is in the digital world. Is the record a still image of a final visualization? Can this actually capture the full impact of an interactive and possible 3D visualization? With information visualization being such a rich and dynamic field I feel that there is a good chance that the race to create new methods and tools will zoom far ahead of plans to preserve its products.

I think some of my class readings will take extra effort (and extra time) as my mind cycles through these ideas. I think that a lot of this will come out in my posts over the next four months. And I still have strong hopes for rallying a team in my InfoViz class to work on an archives related project.

Book Review: Past Time, Past Place: GIS for History

Past Time, Past Place: GIS for History consists mainly of 11 case studies of geographic information systems being applied to the study of history. It includes a nice sprinkling of full color maps and images and a 20 page glossary of GIS terms. Each case study includes a list of articles and other resources for further reading.

The book begins with an introduction by the editor, Anne Kelly Knowles. This chapter explains the basics of using GIS to study history, as well as giving an overview of how the book is organized.

The meat of the book are the case studies covering the following topics:

I suspect that different audiences will take very different ideas away from this book. I was for looking for information about GIS and historical records (this is another book found during my mad hunt for information on the appraisal and preservation of GIS records) and found a bit of related information to add to my research. I think this book will be of interest to those who fall in any of the following categories:

  • Archivists curious about how GIS might enhance access to and understanding of the records under their care
  • Historians interested in understanding how GIS can be used to approach historical research in new ways
  • History buffs who love reading a good story (complete with pictures)
  • Map aficionados curious about new and different kinds of information that can be portrayed with GIS

I especially loved the maps and other images. I am a bit particular when it comes to the quality of graphics – but this book comes through with bright colors and clear images. The unusual square book format (measuring 9″x9″) gave those who arranged the layout lots of room to work – and they took full advantage of the space.

No matter if you plan to read the case studies for the history being brought to life or are looking for “how-tos” as you tackle your own GIS-History project – this book deserves some attention.

Footnote.com and US National Archives records

Thanks to Digitization 101‘s recent post “Footnote launches and announces partnership with National Archives” I was made aware of the big news about the digitization of the US National Archives’ records. Footnote.com has gone live with the first of apparently many planned installments of digitized NARA records. My first instinct was one of suspicion. In the shadow of recent historian alarm about the Smithsonian/Showtime deal, I think its valid to be concerned about new agreements between government agencies and private companies.

That said, I am feeling much more positive based on the passage below from the the January 10th National Archives Press Release about the agreement with Footnote (emphasis mine):

This non-exclusive agreement, beginning with the sizeable collection of materials currently on microfilm,will enable researchers and the general public to access millions of newly-digitized images of the National Archives historic records on a subscription basis from the Footnote web site. By February 6, the digitized materials will also be available at no charge in National Archives research rooms in Washington D.C. and regional facilities across the country. After an interval of five years, all images digitized through this agreement will be available at no charge through the National Archives web site .

This sounds like a win-win situation. NARA gets millions of records digitized (4.5 million and counting according to the press release). These records will be highlighed on the Footnote web site. They will have the advantages of Footnote’s search and browse interfaces (which I plan to do an in depth review of in the next week).

When signing up for my free account – I actually read through the entire Footnote Terms of Service including this passage (within the section labeled ‘Our Intellectual Property Rights’ – again, emphasis mine):

Content on the Website is provided to you AS IS for your information and personal use only as permitted through the functionality of the Website and may not be used, copied, reproduced, distributed, transmitted, broadcast, displayed, sold, licensed, or otherwise exploited for any other purposes whatsoever without the prior written consent of the respective owners . Footnote.com reserves all rights not expressly granted in and to the Website and the Content. You agree not to engage in the use, copying, or distribution of any of the Content other than expressly permitted herein, including any use, copying, or distribution of User Submissions of third parties obtained through the Website for any commercial purposes. If you download or print a copy of the Content for personal use, you must retain all copyright and other proprietary notices contained therein.

These terms certainly are no different from that under which most archives operate – but it did give me a moment of wondering how many extra hoops one would need to jump through if you wanted to use any of the NARA records found in Footnote for a major project like a book. A quick experiment with the Pennsylvania Archives (which are available for free with registration) did not show me any copyright information or notices related to rights. I downloaded an image to see what ‘copyright and other proprietary notices’ I might find and found none.

In his post “The Flawed Agreement between the National Archives and Footnote, Inc.“, Dan Cohen expresses his views of the agreement. I had been curious about what percentage of the records being digitized were out of copyright – Dan says they all are. If all of the records are out of copyright – exactly what rights are Footnote.com reserving (in the passage from the terms of service shown above)? I also agree with him in his frustration about the age restriction in place for using Footnote.com (you have to be over 18).

My final opinion about the agreement itself will depend on answers to a few more questions:

1) Were any of the records recently made available on Footnote.com already digitized and available via the archives.gov website?

2) What percentage of the records that were digitized by Footnote would have been digitized by NARA without this agreement?

3) What roadblocks will truly be set in place for those interested in using records found on Footnote.com?

4) What interface will be available to those accessing the records for free in “National Archives research rooms in Washington D.C. and regional facilities across the country” (from the press release above)? Will it be the Footnote.com website interface or via NARA’s own Archival Research Catalog (ARC) or Access to Archival Databases (AAD)?

If the records that Footnote has digitized and made available on Footnote.com would not otherwise have been digitized over the course of the next five years (a big if) then I think this is an interesting solution. Even the full $100 fee for a year subscription is much more reasonable than many other research databases out there (and certainly cheaper than even a single night hotel room within striking distance of National Archives II).

As I mentioned above, I plan to post a review of the Footnote.com search and browse interfaces in the next week. The Footnote.com support folks have given me permission to include screen shots – so if this topic is of interest to you, keep an eye out for it.

OBR: Optical Braille Recognition

In the interest of talking about new topics – I opened my little moleskine notebook and found a note to myself wondering if it is possible to scan Braille with the equivalent of OCR.

Enter Optical Braille Recognition or OBR. Created by a company called Neovision, this software will permit anyone with a scanner and a Windows platform computer to ‘read’ Braille documents.

Why was this in my notebook? I was thinking about unusual records that must be out in the world and wondering about how to improve access to the information within them. So if there are Braille records out there – how does the sighted person who can’t read Braille get at that information? Here is an answer. Not only does the OBR permit reading of Braille documents – but it would permit recreation of these same documents in Braille from any computer that has the right technology.

Reading through the Wikipedia Braille entry, I learned a few things that would throw a monkey wrench into some of this. For example – “because the six-dot Braille cell only offers 64 possible combinations, many Braille characters have different meanings based on their context”. The page on Braille code lists links to an assortment of different Braille codes which translate the different combinations of dots into different characters depending on the language of the text. On top of the different Braille codes used to translate Braille into specific letters or characters – there is another layer to Braille transcription. Grade 2 Braille uses a specific set of contractions and shorthand – and is used for official publications and things like menus, while Grade 3 Braille is used in the creation of personal letters.

It all goes back to context (of course!). If you have a set of Braille documents with no information on them giving you details of what sort of documents they are – you have a document that is effectively written in code. Is it music written in Braille Music notation? Is it a document in Hiranga using the Japanese Code? Is this a personal letter using Grade 3 Braille shorthand? You get the idea.

I suspect that one might even want to include a copy of both the Braille Code and the Braille transcription rules that go with a set of documents as a key to their translation in the future. If there are frequently used records – they could perhaps include the transcription (both literal transcription and a ‘translation’ of all the used Braille contractions) to improve access of analog records.

In a quick search for collections including braille manuscripts it should come as no surprise that the Helen Keller Archives does have “braille correspondence”. I also came across the finding aids for the Harvard Law School Examinations in Braille (1950-1985) and The Donald G. Morgan Papers (the papers of a blind professor at Mount Holyoke College).

I wonder how many other collections have Braille records or manuscripts. Has anyone reading this ever seen or processed a collection including Braille records?

GIS and Geospatial Data Preservation: Research Resources

I found these websites while doing research for a paper on the selection and appraisal of geospatial data and geographic information systems (GIS). I hope these links might be useful for others doing similar research.

CIESIN – Center for International Earth Science Information Network at Columbia University, especially Guide to Managing Geospatial Electronic Records (USA)

CUGIR – Cornell University Geospatial Information Repository, especially Collection Development Policy (USA)

Digital Curation Center – supporting UK institutions who store, manage and preserve these data to help ensure their enhancement and their continuing long-term use, especially Curating Geospatial Data, especially Curating Geospatial Data (UK)

Digital Preservation Coalition – “established in 2001 to foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.” Especially their Decision Tree. (UK)

GeoConnections – a Canadian national partnership program to evolve and expand the Canadian Geospatial Data Infrastructure (CGDI). (Canada)

InterPARES 2 Case Studies – especially CyberCartographic Atlas of Antarctica and Preservation of the City of Vancouver GIS Database (VanMap)

Library and Archives of Canada – especially Managing Cartographic, Architectural and Engineering Records in the Government of Canada (Canada)

Library of Congress Digital Preservation – subtitled “The National Digital Information Infrastructure and Preservation Program” (NDIIPP) (USA)

Maine GeoArchives (USA)

Maryland State Geographic Information Committee Standards for Records Preservation

NGDA – the National Geospatial Digital Archive, especially Collection Development Policy For The National Geospatial Digital Archive and UCSB Maps & Imagery Collection Development Policy (USA)

New York State Archives – especially GIS Development Guides: GIS Use and Maintenance (USA)

North Carolina Center for Geographic Information and Analysis (USA)

North Carolina Geospatial Data Archiving Project – especially their NDIIPP proposal for Collection and Preservation of At Risk Digital Geospatial Data (USA)

OMB Circular No. A-16 – which requires the development of the National Spatial Data Infrastructure (NSDI) by the Federal Geographic Data Committee (FGDC) (USA)

Any great sites I am missing? Please let me know and I will add to the list.

The Edges of the GIS Electronic Record

I spent a good chunk of the end of my fall semester writing a paper ultimately titled “Digital Geospatial Records: Challenges of Selection and Appraisal”. I learned a lot – especially with the help of archivists out there on the cutting edge who are trying to find answers to these problems. I plan on a number of posts with various ideas from my paper.

To start off, I want to consider the topic of defining the electronic record in the context of GIS. One of the things I found most interesting in my research was the fact that defining exactly what a single electronic record consists of is perhaps one of the most challenging steps.

If we start with the SAA’s glossary definition of the term ‘record’ we find the statement that “A record has fixed content, structure, and context.” The notes go on to explain:

Fixity is the quality of content being stable and resisting change. To preserve memory effectively, record content must be consistent over time. Records made on mutable media, such as electronic records, must be managed so that it is possible to demonstrate that the content has not degraded or been altered. A record may be fixed without being static. A computer program may allow a user to analyze and view data many different ways. A database itself may be considered a record if the underlying data is fixed and the same analysis and resulting view remain the same over time.

This idea presents some major challenges when you consider data that does not seem ‘fixed’. In the fast moving and collaborative world of the internet, Geographic Information Systems are changing over time – but the changes themselves are important. We no longer live in a world in which the way you access a GIS is via a CD which has a specific static version of the map data you are considering.

One of the InterPARES 2 case studies I researched for my paper was the Preservation of the City of Vancouver GIS database (aka VanMap). Via a series of emails exchanged with the very helpful Evelyn McLellan (who is working on the case study) I learned that the InterPARES 2 researchers concluded that the entire VanMap system is a single record. This decision was based on the requirement of ‘archival bond’ to be present in order for a record to exist. I have included my two favorite definitions of archival bond from the InterPARES 2 dictionary below:

archival bond
n., The network of relationships that each record has with the records belonging in the same aggregation (file, series, fonds). [Archives]

n., The originary, necessary and determined web of relationships that each record has at the moment at which it is made or received with the records that belong in the same aggregation. It is an incremental relationship which begins when a record is first connected to another in the course of action (e.g., a letter requesting information is linked by an archival bond to the draft or copy of the record replying to it, and filed with it. The one gives meaning to the other). [Archives]

I especially appreciate the second definition above because it’s example gives me a better sense of what is meant by ‘archival bond’ – though I need to do more reading on this to get a better grasp of it’s importance.

Given the usage of VanMap by public officials and others, you can imagine that the state of the data at any specific time is crucial to determining the information used for making key decisions. Since a map may be created on the fly using multiple GIS layers but never saved or printed – it is only the knowledge that someone looked at the information at a particular time that would permit those down the road to look through the eyes of the decision makers of the past. Members of the VanMap team are now working with the Sustainable Archives & Library Technologies (SALT) lab at the San Diego Supercomputer Center (SDSC) to use data grid technology to permit capturing the changes to VanMap data over time. My understanding is that a proof of concept has been completed that shows how data from a specific date can be reconstructed.

In contrast with this approach we can consider what is being done to preserve GIS data by the Archivist of Maine in the Maine GeoArchives. In his presentation titled “Managing GIS in the Digital Archives” delivered at the 2006: Joint Annual Meeting of NAGARA, COSA, and SAA on August 3, 2006, Jim Henderson explained their approach of appraising individual layers to determine if they should be accessioned in the archive. If it is determined that the layer should be preserved, then issues of frequency of data capture are addressed. They have chosen a pragmatic approach and are currently putting these practices to the test in the real world in an ambitious attempt to prevent data loss as quickly as is feasible.

My background is as a database designer and developer in the software industry. In my database life, a record is usually a row in a database table – but when designing a database using Entity-Relationship Modeling (and I will admit I am of the “Crow’s Feet” notation school and still get a smile on my face when I see the cover of the CASE*Method: Entity Relationship Modelling book) I have spent a lot of time translating what would have been a single ‘paper record’ into the combination of rows from many tables.

The current system I am working on includes information concerning legal contracts. Each of these exists as a single paper document outside the computers – but in our system we distribute information that is needed to ‘rebuild’ the contract into many different tables. One for contact information – one for standard clauses added to all the contracts of this type – another set of tables for defining financial formulas associated with the contract. If I then put on my archivist hat and I didn’t just choose to keep the paper agreement, I would of course draw my line around all these different records needed to rebuild the full contract. I see that there is a similar definition listed as the second definition on the InterPARES 2 Terminology Dictionary for the term ‘Record‘:

n., In data processing, a grouping of interrelated data elements forming the basic unit of a file. A Glossary of Archival and Records Terminology (The Society of American Archivists)

Just in this brief survey we can see three very different possible views on where to draw a line around what constitutes a single Geographic Information System electronic record. Is it the entire database, a single GIS layer or some set of data elements which create a logical record? Is it worthwhile trying to contrast the definition of a GIS record with the definition of a record when considering analog paper maps? I think the answer to all of these questions is ‘sometimes’.

What is especially interesting about coming up with standard approaches to archiving GIS data is that I don’t believe there is one answer. Saying ‘GIS data’ is about as precise as saying ‘database record’ or ‘entity’ – it could mean anything. There might be a best answer for collaborative online atlases.. and another best answer for state government managed geographic information library.. and yet another best answer for corporations dependent on GIS data for doing their business.

I suspect that it will be via thorough analysis of the information stored in a GIS system, how it is/was created, how often it changes and how it was used that will determine the right approach for archiving these born digital records. There are many archivists (and IT folks and map librarians and records managers) around the world who have a strong sense of panic over the imminent loss of geospatial data. As a result, people from many fields are trying different approaches to stem the loss. It will be interesting to consider these varying approaches (and their varying levels of success) over the next few years. We can only hope that a few best practices will rise to the top quickly enough that we can ensure access to vital geospatial records in the future.

DMCA Exemption Added That Supports Archivists

The Digital Millennium Copyright Act, aka DMCA (which made it illegal to create or distribute technology which can get around copyright protection technology) has six new classes of exemptions added today.

From the very long named Rulemaking on Exemptions from Prohibition on Circumvention of Technological Measures that Control Access to Copyrighted Works out of the U.S. Copyright Office (part of the Library of Congress) comes the addition of the following class of work that will not be “subject to the prohibition against circumventing access controls”:

Computer programs and video games distributed in formats that have become obsolete and that require the original media or hardware as a condition of access, when circumvention is accomplished for the purpose of preservation or archival reproduction of published digital works by a library or archive. A format shall be considered obsolete if the machine or system necessary to render perceptible a work stored in that format is no longer manufactured or is no longer reasonably available in the commercial marketplace.

This remain valid from November 27, 2006 through October 27, 2009. Hmm.. three years? So what happens if this expires and doesn’t get extended (though one would imagine by then either we will have a better answer to this sort of problem OR the problem will be even worse than it is now)? When you look at the fact that places like NARA have fabulous mission statements for their Electronic Records Archives with phrases like “for the life of the republic” in them – three years sounds pretty paltry.

That said, how interesting to have archivists highlighted as benefactors of new legal rules. So now it will be legal (or at least not punishable under the DMCA) to create and share  programs to access records created by obsolete software. I don’t know enough about the world of copyright and obsolete software to be clear on how much this REALLY changes what places like NARA’s ERA and other archives pondering the electronic records problem are doing, but clearly this exemption can only validate a lot of work that needs to be done.

129th anniversary of Thomas Edison’s Invention of the Phonograph

Phonograph Patent Drawing
Phonograph Patent Drawing by T.A. Edison. May 18, 1880. RG 241.Patent #227,679

In honor of today’s 129th anniversary of Thomas Edison’s announcement of his invention of the phonograph, I thought I would share an idea that came to me this past summer. I had the pleasure of taking a course on Visual and Sound Materials taught by Tom Connors, the curator of the National Public Broadcasting Archives. This course explored the history of audio recording, photography, film and broadcasting technology.

When explaining the details of the first phonographs, Prof. Connors mentioned that certain sounds recorded better. Recordings of horns and the pitch of tenor singers were reproduced most accurately – or at least played back with the best sound. We also talked about the change in access to music brought about eventually by the availability of records at the corner store. The most popular recordings were (not surprisingly) of music with lots of horns or the recordings of individual singers like Enrico Caruso. So my question is how might music have evolved differently if different music had sounded better when reproduced by the phonograph? Would Caruso have been replaced at the top of the heap by someone else with a different vocal range? Would Jazz music evolved differently? Would there have been other types of music altogether if string instruments or wind instruments reproduced as well as the bright sounding horns?

In our class we also discussed the impact of the introduction of long playing records. Suddenly you could have 30 minutes of music at a time – with no need to have anyone playing the piano or hovering over the phonograph to change the disk. This led to the movement of music into the background of daily life – in contrast with the earlier focus on playing live music for entertainment in people’s homes. It also paved the way for people to experience music alone – you no longer needed to be in the same room as the musicians. No longer was music exclusively something shared and witnessed in a group. In my opinion this was the start of the long path that led to the possibility of having your own personal ‘sound track’ via first the walkman and now the digital audio player such as the iPod.

These ideas are still about archives and research. From my point of view it is just another example of how a different kind of context can impact our understanding of history. There are so many ways in which little events can impact the big picture. Edison wasn’t pursuing a dream of access to music (though that was included on his list of possible uses for the phonograph) – he was more interested in dictation, audio books for the blind and recording the last words of the soon to be dearly departed.

I love having the ability to examine the original ideas and intentions of an inventor and it came as no surprise to me that some of the most interesting resources out there for learning more about Edison and his invention of the phonograph traced back to both the Library of Congress and the U.S. National Archives and Records Administration. The LOC’s American Memory project page for The Motion Pictures and Sound Recordings of the Edison Companies gives a wide range of access to both background information and the option to listen to early Edison recordings. NARA’s page for the digital image above (originally found in Wikipedia) can be found online via NARA’s Archival Research Catalog (ARC) by searching for ‘Edison Phonograph’.

Hurrah for the invention of the phonograph and for all the archives that keep information for us to use in exploring ideas! Listen for horns and tenor voices in the next song you hear – and noticed if you are listening alone or with a group.

A final question: how can providing easy access to more big picture historical context help users to understand how the records they examine fit into the complicated real world of long ago?