Menu Close

Category: search

GIS, Access, Archives and Daydreams

Today in my Information Structure class, our topic was Entity Relationship Modeling. While this is a technique that I have used frequently over the many years I have been designing Oracle databases, it was interesting to see a slightly different spin on the ideas. The second half of class was an exercise to take a stab (as a class) at coming up with a preliminary data model for a mythical genealogical database system.

While deciding if we should model PLACE as an entity, a woman in our class who is a genealogy specialist told us that only one database she has ever worked with tries to do any validation of location – but that it is virtually impossible due to the scale of the problem. Since the borders and names of places on earth have changed so rapidly over time, and often with little remaining documentation, it is hard to correlate place names from archival records with fixed locations on the planet. Anyone who has waded through the fabulous ship records on the Ellis Island website hunting for information about their grandparents or great-grandparents has struggled with trying to understand how the place names on those records relate to the physical world we live in.

So – now to my daydream. Imagine if we could somehow work towards a consolidated GIS database that included place names and boundary information throughout history. Each GIS layer would relate to specific years or eras in time. Imagine if you could connect any set of archival records that contained location data to this GIS database and not only visualize the records via a map – but visualize the records with the ability to change the layers so you could see how the boundaries and place names changed. And view the relationship between records that have different place names on them from different eras – but are actually from the same location.

I poked around to see what people are already doing – and found all of this:

I know it is a daydream – but I believe in my heart of hearts that it will exist someday as computing power increases, the price of storing data decreases and more data sources converge. I do forsee another issue related to the challenges presented by different versions of borders and place names from the same time period – but there are ways to address that too. It could happen – believe with me!

Google Newspaper Archives

I was intrigued by the news that Google had launched a News Archive search interface. For my first search, I searched on “Banjo Dancing” (a one man show that spent most of the 1980s in Arena Stage‘s Old Vat Room). It was tantalizing to see articles from “way back when” appear. The ‘timeline’ format was very useful way to quickly move through the articles and help focus your search.

Many newspapers that provide online access to their archives charge a per article fee for viewing the full article. You are not charged when you click on the link – but you do get a chance to view some sort of short abstract before paying. The advanced search permits you to limit your results based on their cost – so you can search only for those articles which are free or cost below a specific amount. By modifying my original search to only include free articles I found three, one from 1979, one from 2002 and one which did not yield anything.

So what does this mean for archives? In their FAQ, Google states “If you have a historical archive that you think would be a good fit in News archive search, we would love to hear from you.”. Take a moment and think about that – archives with digitized news content could raise their hand and ask to be included. Google has suddenly put the tools for increasing access in the hands of everyone. The university that has digitized it’s newspapers can suddenly be put on the same level with the New York Times and the Washington Post. There currently does not seem to be a fixed list showing “these are the news sources included in the Google news archive” – but I hope they add one.

In their usual fashion, Google has increased the chance of the serendipitous discovery of information – but because everything in the news archive will come from a vetted source, the quality and reliability of the information found should be far and above your standard web search.

Session 305: Extended Archival Description Part I – Archives of American Art

Session 305 included perspectives from three digital collections which are trying to use EAD and meta data to solve real world problems of navigation and access. This post addresses the presentation by the first speaker, Barbara Aikens from the Archives of American Art at the Smithsonian.

The Archives of American Art (AAA) has over 4,500 collections focusing on the history of American art. They received a 3.6 million dollar grant from the Terra Foundation to fund their 5 year project. They had already been using EAD for their standard in online finding aids since 2004. They also had already looked into digitizing their microfilmed holdings and they believe that the history of microfilming at AAA made the transition to scanning entire collections at the item level easier than it might otherwise have been. So far they have digitized 11 full collections (45 linear feet).

Their organization of the digitized files was based on collection code, box and folder. Basing their template on the EAD Cookbook, AAA used Note Tab Pro to create their XML EAD finding aid. I wonder how they might be able to take advantage of the open source software tools being developed such as Archon and the Archivists’ Toolkit (if you are interested in these packages, keep your eye open for my future post looking at them each in detail). There was some mention of re-purposing DCDs, but I was not clear about what they were describing.

The resulting online finding aid lets you read all the information you would expect to find in a finding aid (see an example), as well as permitting you to drill down into each series or container to view a list of folders. Finally the folder view provides thumbnails on the left and a big image on the right. Note that this item level folder view includes very basic folder meta data and a link back to that folder’s corresponding series page. There is no meta data for any of the images of individual items. This approach for organizing and viewing digitized collections is workable for large collections. The context is well communicated and the user’s experience is very like that of going through a collection while physically visiting an archive. First you use the finding aid to location collections of interest. Next you examine the Series and or Container descriptions to location the types of information for which you are looking. Finally, you can drill down to folders with enticing names to see if you can find what you need.

As an experiment, I tested the ‘Search within Collections/Finding Aids’ option by searching for “Downtown Gallery” and for gallery artist files to see if I was given a link to the new Downtown Gallery Records finding aid. My search for “Downtown Gallery” instead directed me to what appears to be a MARC record in the Smithsonian Archives, Manuscripts and Photographs catalog. Two versions of the finding aid are linked to from this record – with no indication as to how they are different (it turned out one was an old version – the other the new one which includes links to the digitized content). A bit more experimentation showed me that the new online collection finding aids are not integrated into the search. I will have to remember to try this sort of searching in a few months to see what the search experience is like.

What I was hoping for (in a perfect world) would be highlighting of the search terms and deep linking from the search results directly to the series and folder description pages. I wonder what side effects there will be for the accuracy of search results given that the series/folder detail description page does not include all the other text from the main finding aid. (ie New Finding Aid vs New Finding Aid Series Level Page). Oddly enough – the old version of the finding aid for this same collection includes the folder level descriptions on the SAME page (with HTML anchors permitting linking from the side bar Table of Contents to the correct location on the page). So a search for terms that appear in the historical background along with the name of an artist only listed at the folder level WOULD return results (in standard text searching) for the old finding aid but not for the new one. Once the new finding aids are integrated into the search results – it would be very helpful to have an option to only return finding aids that include digitized collections.

While exploring the folder level view, I assumed that the order of the images in the folders is the original order in the analog folder. If so, then that is a fabulous and elegant way of communicating the original order of the records to the user of the digital interface. If NOT – then it is quite misleading because a user could easily assume, as I did, that the order in which they are displayed in the folder view is the original order.

Overall, this is exciting work – and shows how well the EAD can function as a framework for the item level digitization of documents. It also points to some interesting questions about how to handle search within this type of framework.

UPDATE: See the comment below for the clarification that the new finding aids based on the work described in this presentation are NOT online yet – but should be at the end of the month (posted: 08/09/2006).