The Edges of the GIS Electronic Record

I spent a good chunk of the end of my fall semester writing a paper ultimately titled “Digital Geospatial Records: Challenges of Selection and Appraisal”. I learned a lot – especially with the help of archivists out there on the cutting edge who are trying to find answers to these problems. I plan on a number of posts with various ideas from my paper.

To start off, I want to consider the topic of defining the electronic record in the context of GIS. One of the things I found most interesting in my research was the fact that defining exactly what a single electronic record consists of is perhaps one of the most challenging steps.

If we start with the SAA’s glossary definition of the term ‘record’ we find the statement that “A record has fixed content, structure, and context.” The notes go on to explain:

Fixity is the quality of content being stable and resisting change. To preserve memory effectively, record content must be consistent over time. Records made on mutable media, such as electronic records, must be managed so that it is possible to demonstrate that the content has not degraded or been altered. A record may be fixed without being static. A computer program may allow a user to analyze and view data many different ways. A database itself may be considered a record if the underlying data is fixed and the same analysis and resulting view remain the same over time.

This idea presents some major challenges when you consider data that does not seem ‘fixed’. In the fast moving and collaborative world of the internet, Geographic Information Systems are changing over time – but the changes themselves are important. We no longer live in a world in which the way you access a GIS is via a CD which has a specific static version of the map data you are considering.

One of the InterPARES 2 case studies I researched for my paper was the Preservation of the City of Vancouver GIS database (aka VanMap). Via a series of emails exchanged with the very helpful Evelyn McLellan (who is working on the case study) I learned that the InterPARES 2 researchers concluded that the entire VanMap system is a single record. This decision was based on the requirement of ‘archival bond’ to be present in order for a record to exist. I have included my two favorite definitions of archival bond from the InterPARES 2 dictionary below:

archival bond
n., The network of relationships that each record has with the records belonging in the same aggregation (file, series, fonds). [Archives]

n., The originary, necessary and determined web of relationships that each record has at the moment at which it is made or received with the records that belong in the same aggregation. It is an incremental relationship which begins when a record is first connected to another in the course of action (e.g., a letter requesting information is linked by an archival bond to the draft or copy of the record replying to it, and filed with it. The one gives meaning to the other). [Archives]

I especially appreciate the second definition above because it’s example gives me a better sense of what is meant by ‘archival bond’ – though I need to do more reading on this to get a better grasp of it’s importance.

Given the usage of VanMap by public officials and others, you can imagine that the state of the data at any specific time is crucial to determining the information used for making key decisions. Since a map may be created on the fly using multiple GIS layers but never saved or printed – it is only the knowledge that someone looked at the information at a particular time that would permit those down the road to look through the eyes of the decision makers of the past. Members of the VanMap team are now working with the Sustainable Archives & Library Technologies (SALT) lab at the San Diego Supercomputer Center (SDSC) to use data grid technology to permit capturing the changes to VanMap data over time. My understanding is that a proof of concept has been completed that shows how data from a specific date can be reconstructed.

In contrast with this approach we can consider what is being done to preserve GIS data by the Archivist of Maine in the Maine GeoArchives. In his presentation titled “Managing GIS in the Digital Archives” delivered at the 2006: Joint Annual Meeting of NAGARA, COSA, and SAA on August 3, 2006, Jim Henderson explained their approach of appraising individual layers to determine if they should be accessioned in the archive. If it is determined that the layer should be preserved, then issues of frequency of data capture are addressed. They have chosen a pragmatic approach and are currently putting these practices to the test in the real world in an ambitious attempt to prevent data loss as quickly as is feasible.

My background is as a database designer and developer in the software industry. In my database life, a record is usually a row in a database table – but when designing a database using Entity-Relationship Modeling (and I will admit I am of the “Crow’s Feet” notation school and still get a smile on my face when I see the cover of the CASE*Method: Entity Relationship Modelling book) I have spent a lot of time translating what would have been a single ‘paper record’ into the combination of rows from many tables.

The current system I am working on includes information concerning legal contracts. Each of these exists as a single paper document outside the computers – but in our system we distribute information that is needed to ‘rebuild’ the contract into many different tables. One for contact information – one for standard clauses added to all the contracts of this type – another set of tables for defining financial formulas associated with the contract. If I then put on my archivist hat and I didn’t just choose to keep the paper agreement, I would of course draw my line around all these different records needed to rebuild the full contract. I see that there is a similar definition listed as the second definition on the InterPARES 2 Terminology Dictionary for the term ‘Record‘:

n., In data processing, a grouping of interrelated data elements forming the basic unit of a file. A Glossary of Archival and Records Terminology (The Society of American Archivists)

Just in this brief survey we can see three very different possible views on where to draw a line around what constitutes a single Geographic Information System electronic record. Is it the entire database, a single GIS layer or some set of data elements which create a logical record? Is it worthwhile trying to contrast the definition of a GIS record with the definition of a record when considering analog paper maps? I think the answer to all of these questions is ‘sometimes’.

What is especially interesting about coming up with standard approaches to archiving GIS data is that I don’t believe there is one answer. Saying ‘GIS data’ is about as precise as saying ‘database record’ or ‘entity’ – it could mean anything. There might be a best answer for collaborative online atlases.. and another best answer for state government managed geographic information library.. and yet another best answer for corporations dependent on GIS data for doing their business.

I suspect that it will be via thorough analysis of the information stored in a GIS system, how it is/was created, how often it changes and how it was used that will determine the right approach for archiving these born digital records. There are many archivists (and IT folks and map librarians and records managers) around the world who have a strong sense of panic over the imminent loss of geospatial data. As a result, people from many fields are trying different approaches to stem the loss. It will be interesting to consider these varying approaches (and their varying levels of success) over the next few years. We can only hope that a few best practices will rise to the top quickly enough that we can ensure access to vital geospatial records in the future.

The Edges of the GIS Electronic Record

3 Comments