Menu Close

Category: SAA2006

SAA 2006 Session 103: “X” Marks the Spot: Archiving GIS Databases – Part III

With the famous Hitchhiker’s Guide to the Galaxy quote of “Don’t Panic!”, James Henderson of the Maine State Archives gave an overview of how they have approached archiving GIS data in his presentation “Managing GIS in the Digital Archives” (the third presentation of the ‘X Marks the Spot’ panel). His basic point is that there is no time to wait for the perfect alignment of resources and research – GIS data is being lost every day, so they had to do what they could as soon as possible to stop the loss.

Goals: preserve permanently valuable state of main official records that are in digital form – both born digital as well as those digitized for access.. and provide continuing digital access to these records

A billion dollars has been spent creating the records over 15 years, but nothing is being done to preserve it. GIS data is overwritten or deleted by agencies as information in live systems is updated with information such as new road names.

At Camp Pitt in 1999 they created a digital records management plan – but it took a long time to get to the point that they were given the money, time and opportunity to put it into action.

Overall Strategy for archiving digital records:

  • Born Digital: GIS & Email
  • Digitized Analog: Media (paper, film, analog tape) For access: researchers, agencies, Archives staff

The state being sued caused enough panic at the state level to make the people ‘in charge’ see that email needed to preserved and organized and accessible.

Some points:

  • what is everyone doing across the state?
  • Keep both native format (whatever folks have already done) – and an archival format in XML
  • Digitize from microfilm (send out to be done)
  • Create another ‘access format’

GeoArchives (special case of the general approaches diagramed above)

  • stop the loss (road name change.. etc)
  • create a prototype for others to use
  • a model for others to critique, improve and apply

Scope: fairly limited

  • preservation: data (layers, images) in GeoLibrary (forced in by legislation – agencies MUST offer data to GeoLibrary)
  • access: use existing geolibrary
  • compare layer status (boundaries, roads) at any historical time
  • Overly different layers (boundaries 2005, roads 2010).

GeoArchives diagram based on NARA ERA diagram
Fit into the ERA diagram very well

Project team – true collaboration. Pulled people from GeoLibrary who were enthusiastic and supportive of central IT GIs changes.

Used a survey to find out what data people wanted.

Created crosswalks with Dublin Core, MARC 21 and FGDC

Functional Requirements – there is a lot of related information – who created this data? Where did it come from? Link them to the related layers.

Appraise the data layers – at the data layer level (rather than digging in to keep some data in a layer and not other data)

Has about 100 layers – so hand appraisal is do-able (though automation would be nice and might be required after next ‘gift’).

Current plan is to embed archival records in systems holding critical operational records so that the archival records will be migrated along with the other layers. Export to XML for now.


  • communications with IT to keep the process going
  • documentation of applications
  • documentation of servers
  • security?
  • Metadata for layers must be complete and consistent with the GeoArchives manual

For more information – see

UPDATE: This link appears to not work. I will update it with a working link once I find one! (Finally got around to finding the right fix for the link!)

SAA 2006 Session 103: “X” Marks the Spot: Archiving GIS Databases – Part II

Richard Marciano of the SALT interdisciplinary lab (Sustainable Archives & Library Technologies) at the San Diego Supercomputer Center delivered a presentation titled “Research Issues Related to Preservation of Geospatial Electronic Records” – the 2nd topic in the ‘X’ Marks the Spot session.

He focuses on research Issues related to preservation of geospatial electronic records. While not an archivist, he is a member of SAA. As a person coming to archival studies with a strong background in software development, I took great comfort in his discussion of their being a great future for IT and archivists to work together on topics such as this.

Richard gave us a great overview of the most recent work being done in this field, along with a snapshot of the latest up and coming projects on the horizon. If I had to pick one main point to empasize, it would be that IT can provide the infrastructure to automate much of what is now being done by hand – but there is a long way to go to achieve this dream and it will require extensive collaboriation between Archivists (with the experience of how things should be done) and the IT community (with the technical expertise to build the systems needed). His presentation was definitely more organized than my laundry list below – please do not take my notes below as an indication of the flow of his talk.

NHPRC Electronic Records/GIS projects:

  • CIESIN at Columbia University
  • Maine GeoArchives Maine State Archvies (see Part III of the Session 103 posts for details on the Maine GeoArchives)
  • eLegacy (State California & SDSC) – California’s geospacial records archival appraisal, accessioning and preservation. Starting in 2006
  • InterPARES Van MAP (2005) –presentation of the City of Vancouver GIS Database

More IT related projects:

  • Archivists’ Workbench (2000) Methodologies for the long-term preservation of and access to software-dependent electronic records. Includes tools for GIS
  • ICAP (2003) change management
  • PAT (2004) persistent archives testbed and the Michigan precinct voting records, spacial data ingestion

SDSC has a goal of infrastructure independence – they want to keep data and move it easily over time. Their current preferred approach uses Data Grids (see American Archivist Journal , volume 69 – Number 1: Building Preservation Environments with Data Grid Technology by Reagan W. Moore) which depend on the dual goals of data virtualization and trust virtualization. He recommended the SAA Electronic Records Section on Friday from 12 to 2 for good related presentations.

CIESIN at Columbia University
Common types of data loss:

  • loss of non-archived data
  • historical versions of data

North Carolina Geospatial Data Archiving Project ( Steve Morris – Instead of solving problems, it actually further complications. Complex databases can be difficult to manage over time due to complex data models, challenges of proprietary database models… has MANY levels of individual datasets or data layers.

e-Legacy – working from the California State Archives
July 2006 – July 2008
The staff is a mix of California State Archives staff and members of SDSC. They are using data grid technology to build a distributed community grid. Distributed storage permits addition of storage arbitrarily and in multiple locations.
Infrastructure is being deployed across multiple offices and the SDSC.

InterPARES VanMAP (University of British Columbia)
A big city centralized enterprise GIS system
Question of case study: What are the records? Where are the records? What do they look like – from the point of view of the city users?
What infrastructure would you need to do a historical query – to see what the city would look like in a specific date in the past? Current enterprise systems are meant to be a snapshot of the present with nothing in place to support storage of past records.

How did they approach this? They got representative data sets. Put all the historical data layers into a ‘dark archive’ repository. Built proof of concept.. put in date request – correct layers are brought back from the archive system and on the fly they are rendered to show the closest version of the historical map possible.

There is a list of 30 or so questions that is part of evaluating the system.

ICAP: preserving and using temporal and multi-versions of records
Keep track of versions of records. Being aware of a timeline of records and being able to ask significant historical questions of those records.

Took multiple time slices – and automatically create an XML database using the records from the time slices of data. XML database and spatial querying

PAT Testbed
Creating a joint consortium model for managing records across state boundaries. Distributed framework with local ‘Grid Block’ at each location. Local Storage Resources manage and populate their local resources.
Goal: how do we automate archival processes

Michigan Department of Community – preserving and accessing Michigan Historical voting records. Created a MySQL database with the records. Did automatic scrubbing and validation of records based on rules. Due to the use of GIS it permits viewing maps with data shown – red/blue voting statistics by county. Viewer permits looking at maps by election year.

In response to a question, he talked about a project to take 401 Certification permits (related to water) – aspect of the PAT project that looked at this.. digitized all the historical records within a watershed. Delivered it back to the state agency. Integrating all the government processes – to permit them to ask good questions about the permits and the related locations (upstream or downstream).

SAA 2006 Session 103: “X” Marks the Spot: Archiving GIS Databases – Part I

‘X’ Marks the Spot was a fantastic first session for me at the SAA conference. I have had a facination with GIS (Geographic Information Systems) for a long time. I love the layers of information. I love the fact that you can represent information in a way that often makes you realize new things just from seeing it on a map.

Since my write-ups of each panelist is fairly long, I will put each in a separate post.

Helen Wong Smith, from the Kamehameha Schools, started off the panel discussing her work on the Land Legacy Database in her presentation titled “Wahi Kupuna: Digitized Cultural Resources Database with GIS Access”.

Kamehameha Schools (KS) was founded by the will of Princess Bernice Pauahi Bishop. With approximately 360,000 acres, KS is the largest private landowner in the state of Hawaii. With over $7 billion in assets the K-12 schools subsidize a significant portion of the cost to educate every student (parents pay only 10% of the cost).

KS generates income from residential, commercial and resort leases. In addition to generating income – a lot of the land has a strong cultural connection. Helen was charged with empowering the land management staff to apply 5 values every time there is any type of land transaction: Economic, Educational, Cultural, Environmental and Community. They realized that they had to know about the lands they own. For example, if they take a parcel back from a long lease and they are going to re-lease it, they need to know about the land. Does it have archaelogical sites? Special place to the Hawai’ian people?

Requirements for the GIS enabled system:

  • Find the information
  • Keep it all in one place
  • Ability to export and import from other standard-based databases (MARC, Dublin Core, Open Archives Initiative)
  • Some information is private – not to be shared with public
  • GIS info
  • Digitize all text and images
  • Identify by Tax map keys (TMK)
  • Identify by ‘traditional place name’
  • Identify by ‘common names’ – surfer invented names (her favorites examples are ‘suicides’ and ‘leftovers’)

The final system would enforce the following security:

  • Lowest – material from public repositories i.e the Hawaii State Archives
  • Medium – material for which we’ve acquired the usage rights for limited use
  • Highest – leases and archaeological reports

Currently the Land Legacy Database is only available within the firewall – but eventually the lowest level of security will be made public.
They already had a web GIS portal and needed this new system to hook up to the Web GIS as well and needed to collect and disseminate data, images, audio/visual clips and references in all formats. In addition, the land managers needed easy way to access information from the field, such as lease agreement or archaeological reports (native burials? Location & who they were).

Helen selected Greenstone – open source software (from New Zealand) for the following reasons:

  • open source
  • multilingual (deals with glottals and other issues with spelling in Hawiian language)
  • GNU General Public License
  • Software for building and distributing digital library collections
  • New way to organizing information
  • Publish it on the internet and CD-ROM
  • many ways of access including by Search, Titles and Genres
  • support for audio and video clips (Example – Felix E Grant Collection).

The project started with 60,000 TIF records (can be viewed as JPEGS) – pre-scanned and indexed by another person. Each of these ‘Claim documents’ includes a testimony and a register. It is crucial to reproduce the original primary resources to prevent confusion, such as can occur between place names and people names.

Helen showed an example from another Greenstone database of newspaper articles published in a new Hawaiian journal. It was displayed in 3 columns, one each for:

  • original hawaiian language newspaper as published
  • the text including the diacriticals
  • English translation

OCR would be a major challenge with these documents – so it isn’t being used.

Helen worked with programmers in New Zealand to do the customizations needed (such as GIS integration) after loosing the services of the IT department. She has been told that she made more progress working with the folks from New Zealand than she would have with IT!

The screen shots were fun – they showed examples of how the Land Legacy Database data uses GIS to display layers on maps of Hawaii including outlines of TMKs or areas with ‘traditional names’. One can access the Land Legacy Database by clicking on a location on the map and selecting Land Legacy Database to get to records.

The Land Legacy Database was envisioned as a tool to compile diverse resources regarding the Schools’ lands to support decision making i.e. as the location and destruction of cultural sites. Its evolution includes:

  • inclusion of internal and external records including reports conducted for and by the Schools in the past 121 years
  • a platform providing access to staff, faculty and students across the islands
  • sharing server space with the Education Division

Helen is only supposed to spend 20% of her time on this project! Her progress is amazing.

SAA2006: Joint Annual Meeting of NAGARA, COSA, and SAA

I will have my laptop with me at the SAA meeting in downtown DC later this week. My plan is to write my thoughts on my laptop as I go through the sessions over the course of the day and then post in the evenings after I get back home to the land of internet access.

I also will be sitting next to my Poster on Friday morning from 9-10am. If you want to stop by and say hello, that will be the easiest time and place to find me. My poster’s title is “Communicating Context in Online Collections” and I plan to upload a version of it to a page of this blog after the conference is over (along with links to all my sources).