Menu Close

My New Daydream: A Hosting Service for Digitized Collections

In her post Predictions over on, Merrilee asked “Where do you predict that universities, libraries, archives, and museums will be irresistibly drawn to pooling their efforts?” after reading this article.

And I say: what if there were an organization that created a free (or inexpensive fee-based) framework for hosting collections of digitized materials? What I am imagining is a large group of institutions conspiring to no longer be in charge of designing, building, installing, upgrading and supporting the websites that are the vehicle for sharing digital historical or scholarly materials. I am coming at this from the archivists perspective (also having just pondered the need for something like this in my recent post: Promise to Put It All Online ) – so I am imagining a central repository that would support the upload of digitized records, customizable metadata and a way to manage privacy and security.

The hurdles I imagine this dream solution removing are those that are roughly the same for all archival digitization projects. Lack of time, expertise and ongoing funding are huge challenges to getting a good website up and keeping it running – and that is even before you consider the effort required to digitize and map metadata to records or collections of records. It seems to me that if a central organization of some sort could build a service that everyone could use to publish their content – then the archivists and librarians and other amazing folks of all different titles could focus on the actual work of handling, digitizing and describing the records.

Being the optimist I am I of course imagine this service as providing easy to use software with the flexibility for building custom DTDs for metadata and security to protect those records that cannot (yet or ever) be available to the public. My background as a software developer drives me to imagine a dream team of talented analysts, designers and programmers building an elegant web based solution that supports everything needed by the archival community. The architecture of deployment and support would be managed by highly skilled technology professionals who would guarantee uptime and redundant storage.

I think the biggest difference between this idea and the wikipedias of the world is that there would be some step required for an institution to ‘join’ such that they could use this service. The service wouldn’t control the content (in fact would need to be super careful about security and the like considering all the issues related to privacy and copyright) – rather it would provide the tools to support the work of others. While I know that some institutions would not be willing to let ‘control’ of their content out of their own IT department and their own hard drives, I think others would heave a huge sigh of relief.

There would still be a place for the Archons and the Archivists’ Toolkits of the world (and any and all other fabulous open-source tools people might be building to support archivists’ interactions with computers), but the manifestation of my dream would be the answer for those who want to digitize their archival collection and provide access easily without being forced to invent a new wheel along the way.

If you read my GIS daydreams post, then you won’t be surprised to know that I would want GIS incorporated from the start so that records could be tied into a single map of the world. The relationships among records related to the same geographic location could be found quickly and easily.

Somehow I feel a connection in these ideas to the work that the Internet Archive is doing with In that case, producers of websites want them archived. They don’t want to figure out how to make that happen. They don’t want to figure out how to make sure that they have enough copies in enough far flung locations with enough bandwidth to support access – they just want it to work. They would rather focus on creating the content they want Archive-It to keep safe and accessible. The first line on Archive-It’s website says it beautifully: “Internet Archive’s new subscription service, Archive-It, allows institutions to build, manage and search their own web archive through a user friendly web application, without requiring any technical expertise.”

So, the tag line for my new dream service would be “DigiCollection’s new subscription service, Digitize-It, allows institutions to upload, manage and search their own digitized collections through a user friendly web application, without requiring any technical expertise.”

GIS, Access, Archives and Daydreams

Today in my Information Structure class, our topic was Entity Relationship Modeling. While this is a technique that I have used frequently over the many years I have been designing Oracle databases, it was interesting to see a slightly different spin on the ideas. The second half of class was an exercise to take a stab (as a class) at coming up with a preliminary data model for a mythical genealogical database system.

While deciding if we should model PLACE as an entity, a woman in our class who is a genealogy specialist told us that only one database she has ever worked with tries to do any validation of location – but that it is virtually impossible due to the scale of the problem. Since the borders and names of places on earth have changed so rapidly over time, and often with little remaining documentation, it is hard to correlate place names from archival records with fixed locations on the planet. Anyone who has waded through the fabulous ship records on the Ellis Island website hunting for information about their grandparents or great-grandparents has struggled with trying to understand how the place names on those records relate to the physical world we live in.

So – now to my daydream. Imagine if we could somehow work towards a consolidated GIS database that included place names and boundary information throughout history. Each GIS layer would relate to specific years or eras in time. Imagine if you could connect any set of archival records that contained location data to this GIS database and not only visualize the records via a map – but visualize the records with the ability to change the layers so you could see how the boundaries and place names changed. And view the relationship between records that have different place names on them from different eras – but are actually from the same location.

I poked around to see what people are already doing – and found all of this:

I know it is a daydream – but I believe in my heart of hearts that it will exist someday as computing power increases, the price of storing data decreases and more data sources converge. I do forsee another issue related to the challenges presented by different versions of borders and place names from the same time period – but there are ways to address that too. It could happen – believe with me!

Google Newspaper Archives

I was intrigued by the news that Google had launched a News Archive search interface. For my first search, I searched on “Banjo Dancing” (a one man show that spent most of the 1980s in Arena Stage‘s Old Vat Room). It was tantalizing to see articles from “way back when” appear. The ‘timeline’ format was very useful way to quickly move through the articles and help focus your search.

Many newspapers that provide online access to their archives charge a per article fee for viewing the full article. You are not charged when you click on the link – but you do get a chance to view some sort of short abstract before paying. The advanced search permits you to limit your results based on their cost – so you can search only for those articles which are free or cost below a specific amount. By modifying my original search to only include free articles I found three, one from 1979, one from 2002 and one which did not yield anything.

So what does this mean for archives? In their FAQ, Google states “If you have a historical archive that you think would be a good fit in News archive search, we would love to hear from you.”. Take a moment and think about that – archives with digitized news content could raise their hand and ask to be included. Google has suddenly put the tools for increasing access in the hands of everyone. The university that has digitized it’s newspapers can suddenly be put on the same level with the New York Times and the Washington Post. There currently does not seem to be a fixed list showing “these are the news sources included in the Google news archive” – but I hope they add one.

In their usual fashion, Google has increased the chance of the serendipitous discovery of information – but because everything in the news archive will come from a vetted source, the quality and reliability of the information found should be far and above your standard web search.

Question from the Archives of American Art and EAD talk (session 305)

At the end of the Extended Archival Description panel, someone in the audience asked if ColdFusion and ASP were used for the Archives of American Art project. The response was interesting. The answer was yes to ColdFusion and no to ASP. That wasn’t the interesting part. The part I was intrigued by was the reasons WHY they had used ColdFusion.

The developer on the project was there and stood to add his 2 cents. He said these were the reasons for the choice of ColdFusion:

  • The Smithsonian is not enthusiastic about open source software
  • The Smithsonian is not unfriendly towards ColdFusion
  • He knew ColdFusion very well

This immediately made me think of a recent post at Creating Passionate Users: When the “best tool for the job”… isn’t. In her post, Kathy Sierra talks about other factors to weigh when choosing a software tool to solve a problem OTHER than what is the best tool for the job based on the features of all the options. She proposes (in what she admits is a sweeping generalization) that enthusiasm for a tool be weighed more heavily than it’s pure appropriateness for the task when selecting which tool to use.

I am not saying that ColdFusion was necessarily the AAA developer’s first choice – but that it is interesting to remember that there are LOTS of different elements that go into choosing software to address the challenges at the intersection of archives and the internet. One of those things is simply the skills of the people you have to work on a project – and their enthusiasm for the tools at hand.