Menu Close

Year: 2007

OBR: Optical Braille Recognition

In the interest of talking about new topics – I opened my little moleskine notebook and found a note to myself wondering if it is possible to scan Braille with the equivalent of OCR.

Enter Optical Braille Recognition or OBR. Created by a company called Neovision, this software will permit anyone with a scanner and a Windows platform computer to ‘read’ Braille documents.

Why was this in my notebook? I was thinking about unusual records that must be out in the world and wondering about how to improve access to the information within them. So if there are Braille records out there – how does the sighted person who can’t read Braille get at that information? Here is an answer. Not only does the OBR permit reading of Braille documents – but it would permit recreation of these same documents in Braille from any computer that has the right technology.

Reading through the Wikipedia Braille entry, I learned a few things that would throw a monkey wrench into some of this. For example – “because the six-dot Braille cell only offers 64 possible combinations, many Braille characters have different meanings based on their context”. The page on Braille code lists links to an assortment of different Braille codes which translate the different combinations of dots into different characters depending on the language of the text. On top of the different Braille codes used to translate Braille into specific letters or characters – there is another layer to Braille transcription. Grade 2 Braille uses a specific set of contractions and shorthand – and is used for official publications and things like menus, while Grade 3 Braille is used in the creation of personal letters.

It all goes back to context (of course!). If you have a set of Braille documents with no information on them giving you details of what sort of documents they are – you have a document that is effectively written in code. Is it music written in Braille Music notation? Is it a document in Hiranga using the Japanese Code? Is this a personal letter using Grade 3 Braille shorthand? You get the idea.

I suspect that one might even want to include a copy of both the Braille Code and the Braille transcription rules that go with a set of documents as a key to their translation in the future. If there are frequently used records – they could perhaps include the transcription (both literal transcription and a ‘translation’ of all the used Braille contractions) to improve access of analog records.

In a quick search for collections including braille manuscripts it should come as no surprise that the Helen Keller Archives does have “braille correspondence”. I also came across the finding aids for the Harvard Law School Examinations in Braille (1950-1985) and The Donald G. Morgan Papers (the papers of a blind professor at Mount Holyoke College).

I wonder how many other collections have Braille records or manuscripts. Has anyone reading this ever seen or processed a collection including Braille records?

GIS and Geospatial Data Preservation: Research Resources

I found these websites while doing research for a paper on the selection and appraisal of geospatial data and geographic information systems (GIS). I hope these links might be useful for others doing similar research.

CIESIN – Center for International Earth Science Information Network at Columbia University, especially Guide to Managing Geospatial Electronic Records (USA)

CUGIR – Cornell University Geospatial Information Repository, especially Collection Development Policy (USA)

Digital Curation Center – supporting UK institutions who store, manage and preserve these data to help ensure their enhancement and their continuing long-term use, especially Curating Geospatial Data, especially Curating Geospatial Data (UK)

Digital Preservation Coalition – “established in 2001 to foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.” Especially their Decision Tree. (UK)

GeoConnections – a Canadian national partnership program to evolve and expand the Canadian Geospatial Data Infrastructure (CGDI). (Canada)

InterPARES 2 Case Studies – especially CyberCartographic Atlas of Antarctica and Preservation of the City of Vancouver GIS Database (VanMap)

Library and Archives of Canada – especially Managing Cartographic, Architectural and Engineering Records in the Government of Canada (Canada)

Library of Congress Digital Preservation – subtitled “The National Digital Information Infrastructure and Preservation Program” (NDIIPP) (USA)

Maine GeoArchives (USA)

Maryland State Geographic Information Committee Standards for Records Preservation

NGDA – the National Geospatial Digital Archive, especially Collection Development Policy For The National Geospatial Digital Archive and UCSB Maps & Imagery Collection Development Policy (USA)

New York State Archives – especially GIS Development Guides: GIS Use and Maintenance (USA)

North Carolina Center for Geographic Information and Analysis (USA)

North Carolina Geospatial Data Archiving Project – especially their NDIIPP proposal for Collection and Preservation of At Risk Digital Geospatial Data (USA)

OMB Circular No. A-16 – which requires the development of the National Spatial Data Infrastructure (NSDI) by the Federal Geographic Data Committee (FGDC) (USA)

Any great sites I am missing? Please let me know and I will add to the list.

The Edges of the GIS Electronic Record

I spent a good chunk of the end of my fall semester writing a paper ultimately titled “Digital Geospatial Records: Challenges of Selection and Appraisal”. I learned a lot – especially with the help of archivists out there on the cutting edge who are trying to find answers to these problems. I plan on a number of posts with various ideas from my paper.

To start off, I want to consider the topic of defining the electronic record in the context of GIS. One of the things I found most interesting in my research was the fact that defining exactly what a single electronic record consists of is perhaps one of the most challenging steps.

If we start with the SAA’s glossary definition of the term ‘record’ we find the statement that “A record has fixed content, structure, and context.” The notes go on to explain:

Fixity is the quality of content being stable and resisting change. To preserve memory effectively, record content must be consistent over time. Records made on mutable media, such as electronic records, must be managed so that it is possible to demonstrate that the content has not degraded or been altered. A record may be fixed without being static. A computer program may allow a user to analyze and view data many different ways. A database itself may be considered a record if the underlying data is fixed and the same analysis and resulting view remain the same over time.

This idea presents some major challenges when you consider data that does not seem ‘fixed’. In the fast moving and collaborative world of the internet, Geographic Information Systems are changing over time – but the changes themselves are important. We no longer live in a world in which the way you access a GIS is via a CD which has a specific static version of the map data you are considering.

One of the InterPARES 2 case studies I researched for my paper was the Preservation of the City of Vancouver GIS database (aka VanMap). Via a series of emails exchanged with the very helpful Evelyn McLellan (who is working on the case study) I learned that the InterPARES 2 researchers concluded that the entire VanMap system is a single record. This decision was based on the requirement of ‘archival bond’ to be present in order for a record to exist. I have included my two favorite definitions of archival bond from the InterPARES 2 dictionary below:

archival bond
n., The network of relationships that each record has with the records belonging in the same aggregation (file, series, fonds). [Archives]

n., The originary, necessary and determined web of relationships that each record has at the moment at which it is made or received with the records that belong in the same aggregation. It is an incremental relationship which begins when a record is first connected to another in the course of action (e.g., a letter requesting information is linked by an archival bond to the draft or copy of the record replying to it, and filed with it. The one gives meaning to the other). [Archives]

I especially appreciate the second definition above because it’s example gives me a better sense of what is meant by ‘archival bond’ – though I need to do more reading on this to get a better grasp of it’s importance.

Given the usage of VanMap by public officials and others, you can imagine that the state of the data at any specific time is crucial to determining the information used for making key decisions. Since a map may be created on the fly using multiple GIS layers but never saved or printed – it is only the knowledge that someone looked at the information at a particular time that would permit those down the road to look through the eyes of the decision makers of the past. Members of the VanMap team are now working with the Sustainable Archives & Library Technologies (SALT) lab at the San Diego Supercomputer Center (SDSC) to use data grid technology to permit capturing the changes to VanMap data over time. My understanding is that a proof of concept has been completed that shows how data from a specific date can be reconstructed.

In contrast with this approach we can consider what is being done to preserve GIS data by the Archivist of Maine in the Maine GeoArchives. In his presentation titled “Managing GIS in the Digital Archives” delivered at the 2006: Joint Annual Meeting of NAGARA, COSA, and SAA on August 3, 2006, Jim Henderson explained their approach of appraising individual layers to determine if they should be accessioned in the archive. If it is determined that the layer should be preserved, then issues of frequency of data capture are addressed. They have chosen a pragmatic approach and are currently putting these practices to the test in the real world in an ambitious attempt to prevent data loss as quickly as is feasible.

My background is as a database designer and developer in the software industry. In my database life, a record is usually a row in a database table – but when designing a database using Entity-Relationship Modeling (and I will admit I am of the “Crow’s Feet” notation school and still get a smile on my face when I see the cover of the CASE*Method: Entity Relationship Modelling book) I have spent a lot of time translating what would have been a single ‘paper record’ into the combination of rows from many tables.

The current system I am working on includes information concerning legal contracts. Each of these exists as a single paper document outside the computers – but in our system we distribute information that is needed to ‘rebuild’ the contract into many different tables. One for contact information – one for standard clauses added to all the contracts of this type – another set of tables for defining financial formulas associated with the contract. If I then put on my archivist hat and I didn’t just choose to keep the paper agreement, I would of course draw my line around all these different records needed to rebuild the full contract. I see that there is a similar definition listed as the second definition on the InterPARES 2 Terminology Dictionary for the term ‘Record‘:

n., In data processing, a grouping of interrelated data elements forming the basic unit of a file. A Glossary of Archival and Records Terminology (The Society of American Archivists)

Just in this brief survey we can see three very different possible views on where to draw a line around what constitutes a single Geographic Information System electronic record. Is it the entire database, a single GIS layer or some set of data elements which create a logical record? Is it worthwhile trying to contrast the definition of a GIS record with the definition of a record when considering analog paper maps? I think the answer to all of these questions is ‘sometimes’.

What is especially interesting about coming up with standard approaches to archiving GIS data is that I don’t believe there is one answer. Saying ‘GIS data’ is about as precise as saying ‘database record’ or ‘entity’ – it could mean anything. There might be a best answer for collaborative online atlases.. and another best answer for state government managed geographic information library.. and yet another best answer for corporations dependent on GIS data for doing their business.

I suspect that it will be via thorough analysis of the information stored in a GIS system, how it is/was created, how often it changes and how it was used that will determine the right approach for archiving these born digital records. There are many archivists (and IT folks and map librarians and records managers) around the world who have a strong sense of panic over the imminent loss of geospatial data. As a result, people from many fields are trying different approaches to stem the loss. It will be interesting to consider these varying approaches (and their varying levels of success) over the next few years. We can only hope that a few best practices will rise to the top quickly enough that we can ensure access to vital geospatial records in the future.