Menu Close

Month: January 2010

Leveraging Google Reader’s Page Change Tracking for Web Page Preservation

The Official Google Reader Blog recently announced a new feature that will let users watch any page for updates. The way this works is that you add individual URLs to your Google Reader account. Just as with regular RSS feeds, when an update is detected – a new entry is added to that subscription.

My thinking is that this could be a really useful tool for archivists charged with preserving websites that change gradually over time, especially those fairly static sites that change infrequently with little or no notice of upcoming changes. If a web page was archived and then added to a dedicated Google Reader account, the archivist could scan their list of watch pages daily or weekly. Changes could then trigger the creation of a fresh snapshot of the site.

I will admit that there have been services out there for a while that do something similar to what Google has just rolled out. I personally have used Dapper.net to take a standard web page and generate an RSS feed based on updates to the page (sound familiar?). One Dapper.net feed that I created and follow is for the news archive page for the International Red Cross and can be found here. What is funny is that now they actually have an official RSS feed for their news that includes exactly what my Dapper.net feed harvested off their news archive page – but when I built that Dapper feed there was no other way for me to watch for those news updates.

There are lots of different tools out there that aim to archive websites. Archive-It is a subscription based service run by Internet Archive that targets institutions and will archive sites on demand or on a regular schedule. Internet Archive also has an open source crawler called Heritrix for those who are comfortable dealing with the code. Other institutions are building their own software to tackle this too. Harvard University has their own Web Archive Collection Service (WAX). The LiWA (Living Web Archives) Project is based in Germany and aims to “extend the current state of the art and develop the next generation of Web content capture, preservation, analysis, and enrichment services to improve fidelity, coherence, and interpretability of web archives.” One could even use something as simple as PDFmyURL.com – an online service that turns any URL into a PDF (be sure to play with the advanced options to make sure you get a wide enough snapshot). I know there are many more possibilities – these just scratch the surface.

What I like about my idea is that it isn’t meant to replace these services but rather work in tandem with them. The Internet Archive does an amazing job crawling and archiving many web pages – but they can’t archive everything and their crawl frequency may not match up with real world updates to a website. This approach certainly wouldn’t scale well for huge websites for which you would need to watch for changes on many pages. I am picturing this technique as being useful for small organizations or individuals who just need to make sure that a county government website makeover or a community organization’s website update doesn’t get lost in the shuffle. I like the idea of finding clever ways to leverage free services and tools to support those who want to protect a particular niche of websites from being lost.

Image Credit: The RSS themed image above is by Matt Forsythe.

Concertina History Online Features Virtual Collaboration and Digitization

In the early 1960s, my father bought a Wheatstone concertina in London. He tells how he visited the factory where it was made to pick one out and recalls the ledger book in which details about the concertinas were recorded. After a recent retelling of this family classic, I was inspired to see what might be online related to concertinas. I was amazed!

First I found the Concertina Library which presents itself as a ‘Digital Reference Collection for Concertinas’. With fourteen contributing authors, the site includes in depth articles on concertina history, technology, music, research and a wide range of concertina systems.

I particularly appreciate the reasons that Robert Gaskins, site creator, lists for the creation of the site on the about page:

(1) Almost all of the historical material about concertinas has been held in research libraries where access is limited, or in private collections where access may be non-existent. The reason for this is not that the material is so valuable, but that in the past there was no way to make material of limited interest available to everyone, so it stayed safely in archives. The web has provided a way to make this material widely available—partly by the libraries themselves, and partly in collections such as this.

(2) There seems to be a growing number of people working again on the history of concertinas, perhaps in part because research materials are becoming available on the web. These people are widely scattered, so they don’t get to meet and discuss their work in person. But again the web has provided an answer, allowing people to work collaboratively and exchange information across miles and timezones, and for the resulting articles the web offers worldwide publication at almost no cost.

What an eloquent testimonial for the power of the internet to both provide access to once-inaccessible materials and support virtual collaboration within a geographically dispersed community.

Next, I found the Wheatstone Concertina Ledgers. This site features business records (in the form of ledgers) of the C. Wheatstone & Co. stretching from 1830 through 1974 (with some gaps). The originals are held at the Library of the Horniman Museum in London. It is a great reference website with a nice interface for paging through the ledgers. Armed with the serial number from my father’s concertina (36461) I found my way to page 88 of a Wheatstone Production Journal from the Dickinson Archives. If I am reading that line properly, his concertina is a 3E model and was made (or maybe sold?) April 25, 1960. I wish that there was documentation online to explain how to read the ledgers. For example, I would love to know what ‘Bulletin 3052’ means.

I liked the way that they retained the sense of turning pages in a ledger. Every page of each ledger is included, including front and back end pages and blank pages. I have total confidence that I am seeing the pages in the same order as I would in person.

You can read the overview and introduction to the project, but what intrigued me more was the very detailed narrative of how this digitization effort was accomplished. In How The Wheatstone Concertina Ledgers Were Digitized, we find Robert Gaskins of  the Concertina Library explaining how, with an older model IBM ThinkPad, a consumer grade scanner, and his existing software (Microsoft Office and Macromedia Fireworks), he created a website with 4,500 images and clean, simple navigation. From where I sit, this is a great success story – a single person’s dedication can yield fantastic results. You don’t need the latest and greatest technology to run a successful digitization project. One individual can go a long way through sheer determination and the clever leveraging of what they have on hand.

Back on the Concertina Library‘s about page we find “There is still a lot of material relevant to the study of concertinas and their history which should be digitized and placed on the web, but has not been so far. Ideas for additional contributors, items, and collections are very welcome.” If I am following the dates correctly, the Concertina Library has articles dating back to February of 2001, shortly before Mr. Gaskins started planning the ledger digitization project. At the same time as he was collaborating with other concertina enthusiasts to build the Concertina Library,  he was scanning ledgers and creating the Wheatstone Concertina Ledgers website. Three cheers to Mr. Gaskins for his obvious personal enthusiasm and dedication to virtual collaboration, digitization and well-built websites! Another three cheers for all those who joined the cause and collaborated to create great online resources to support ongoing concertina research from anywhere in the world.

All this started because my father owns a beautiful old concertina. I love it when an innocent web search leads me to find a wealth of online archival materials. Do you have a favorite online archival resource that you stumbled across while doing similar research for family or friends? Please share them in the comments below!

Image Credit: http://www.flickr.com/photos/rocketlass/ / CC BY-NC-SA 2.0