My New Daydream: A Hosting Service for Digitized Collections

September 20, 2006 3 Comments

In her post Predictions over on hangingtogether.org, Merrilee asked “Where do you predict that universities, libraries, archives, and museums will be irresistibly drawn to pooling their efforts?” after reading this article.

And I say: what if there were an organization that created a free (or inexpensive fee-based) framework for hosting collections of digitized materials? What I am imagining is a large group of institutions conspiring to no longer be in charge of designing, building, installing, upgrading and supporting the websites that are the vehicle for sharing digital historical or scholarly materials. I am coming at this from the archivists perspective (also having just pondered the need for something like this in my recent post: Promise to Put It All Online ) – so I am imagining a central repository that would support the upload of digitized records, customizable metadata and a way to manage privacy and security.

The hurdles I imagine this dream solution removing are those that are roughly the same for all archival digitization projects. Lack of time, expertise and ongoing funding are huge challenges to getting a good website up and keeping it running – and that is even before you consider the effort required to digitize and map metadata to records or collections of records. It seems to me that if a central organization of some sort could build a service that everyone could use to publish their content – then the archivists and librarians and other amazing folks of all different titles could focus on the actual work of handling, digitizing and describing the records.

Being the optimist I am I of course imagine this service as providing easy to use software with the flexibility for building custom DTDs for metadata and security to protect those records that cannot (yet or ever) be available to the public. My background as a software developer drives me to imagine a dream team of talented analysts, designers and programmers building an elegant web based solution that supports everything needed by the archival community. The architecture of deployment and support would be managed by highly skilled technology professionals who would guarantee uptime and redundant storage.

I think the biggest difference between this idea and the wikipedias of the world is that there would be some step required for an institution to ‘join’ such that they could use this service. The service wouldn’t control the content (in fact would need to be super careful about security and the like considering all the issues related to privacy and copyright) – rather it would provide the tools to support the work of others. While I know that some institutions would not be willing to let ‘control’ of their content out of their own IT department and their own hard drives, I think others would heave a huge sigh of relief.

There would still be a place for the Archons and the Archivists’ Toolkits of the world (and any and all other fabulous open-source tools people might be building to support archivists’ interactions with computers), but the manifestation of my dream would be the answer for those who want to digitize their archival collection and provide access easily without being forced to invent a new wheel along the way.

If you read my GIS daydreams post, then you won’t be surprised to know that I would want GIS incorporated from the start so that records could be tied into a single map of the world. The relationships among records related to the same geographic location could be found quickly and easily.

Somehow I feel a connection in these ideas to the work that the Internet Archive is doing with Archive-IT.org. In that case, producers of websites want them archived. They don’t want to figure out how to make that happen. They don’t want to figure out how to make sure that they have enough copies in enough far flung locations with enough bandwidth to support access – they just want it to work. They would rather focus on creating the content they want Archive-It to keep safe and accessible. The first line on Archive-It’s website says it beautifully: “Internet Archive’s new subscription service, Archive-It, allows institutions to build, manage and search their own web archive through a user friendly web application, without requiring any technical expertise.”

So, the tag line for my new dream service would be “DigiCollection’s new subscription service, Digitize-It, allows institutions to upload, manage and search their own digitized collections through a user friendly web application, without requiring any technical expertise.”

Posted in access, digitization, future-proofing, historical research, interface design, internet archiving, open source, software, what if

3 Comments

Pingback:Atakans Zeitenläufte
Rob Jenson
October 18, 2006 at 11:03 pm

Hi Jeanne!

I believe that what you envision is consistent with what the Internet Archive[s] wants to do and be. Their Archive-It service is designed to support archivists at institutions that want to create collections of preserved websites. So, if one of my institutions wanted to start archiving web sites that are relevant to the history of our county, we could find a computer guy to put together a “mini-me” of the IA with hardware and software, lease a CoLo cage at an ISP with decent bandwidth, build an interface that a non-techy can use to maintain and manage the sites and content, etc.; OR — outsource it to someone who already knows how to do it. I suspect that Choice B will be more cost effective for many institutions (including the Library of Congress and NARA). Based on their

FAQ Page, for a mere $10,000 per year, you can create your own web archive. That sounds like a lot, but when you figure how much it would cost to roll your own mini-IA, that is really inexpensive … and you get full-text searching of your archives

I don’t know if there is anyone out there who has created a DSpace or similar generic digital archiving system with a similar planned business model — namely to enable organizations that don’t have the resources to build and manage their own digital archiving system to lease the facilities and support. Perhaps when Lockheed Martin has finished building ERA for NARA they will either create a service bureau for this kind of thing, in addition to selling the ERA solution to state governments and other large-scale customers.

“Dark Digital Archives” pose intriguing problems, starting with the need to be trustworthy in storing access passwords to access-protected sites, and developing an access control model (for the archivists and patrons who will have access while the archives are “dark” — and a way of partitioning access to different pieces). It would be really cool if the access control credentials could be seamlessly migrated. Imagine if a permanent archive of livejournal could be developed in such a way that the private bits can be accessed by the same authorized users as the live site — and the journal owners could set parameters of when their journal would “go public” (death + 50 years?) and the archiving system would support that.

I don’t know what GIS data and systems look like — are their standardized metadata and systems out there so that anyone with the “right system” can make it work if they have the data, or would archiving a GIS site involve archving the software that makes the maps display and dance the right way? There might be a need to develop second-order metadata that describes how a system used the data, where the archived software lives, and how to download and de-encapsulate the software (or emulation package) necessary to display the archived data the same way that it was used in the original system. Big problem to solve, but not impossible.
Pingback:Squirl.info - an interesting option for putting collections online - SpellboundBlog.com - ponderings of an archives student

Comments are closed.