Menu Close

Category: preservation

Video News Archives: Digitization as Good Business

Flickr: OSU Spring Game 2006 Media Lineup by Chris MetcalfMy work now includes more SEO (Search Engine Optimization) work and so I have added SEO focused blogs to my RSS feedreader. Today I spotted Search Engine Land‘s post Business Opportunities For Video News Archives. Stephen Baker calculates that 35 years worth of archive footage equals 51,100 hours of content per station. With approximately 20 stations per broadcast group he estimates a cost of $30 million per group to digitize each broadcast group’s archive of news footage. See the original article for more details on his calculations.

He then proposes 3 approaches to monetizing these efforts and leveraging the resulting digitized video:

  1. Media-Centric Wikipedia – complete with an expectation that social media contributions would provide “scalable way for creating editorial metadata, such as descriptions and story summaries that would be costly to otherwise create”. This makes me think of Flickr Commons for video.
  2. Education Site – akin to NBCU’s iCue site I mentioned in my post about NBC News Archive footage on Hulu. “Efforts like this provide educational/subscription opportunities as well as sponsorship/advertising opportunities—what advertiser doesn’t want to get in front of 13 – 18 year olds?”
  3. News Site Extension – described as “bolting the news archive onto the existing site”. The major benefit of this is that “more content provides more SEO opportunity and, hence, larger audience reach.”

Baker concludes:

In a market where traditional media is struggling to create unique and compelling online experiences and business models, the archive represent a differentiator that can jump-start audience building and monetization initiatives. Not only is it an important representation of world history that must be saved for “preservation-sake”, the archive represents a large, untapped online opportunity.  Who will be first to realize its potential?

The ultimate goal of all three of these scenarios is to offset the extreme expense of digitization of thousands of hours of news footage. I think it is refreshing to see a perspective from outside the cultural heritage corner of the world that still sees video archives as rich resources worth preserving. I also like seeing ideas that are pitched in manner that should catch the attention of those making budgets and struggling with finding funding for large digitization efforts.

Image Credit: Flickr photo OSU Spring Game 2006 Media Lineup by Chris Metcalf

Blog Action Day 2008: Poverty in the Archival Record and Beyond

Blog Action Day - Poverty long

In honor of this year’s Blog Action Day theme of Poverty, I want to point people to examples of ways in which poverty is documented in archives, manuscript collections and elsewhere.

The most obvious types of records that document poverty are:

There are also organizations dedicated to research on poverty – such as the Chronic Poverty Research Centre, University of Kentucky Center for Poverty Research and National Poverty Center. The archival records from groups such as these could show ways that organizations have addressed poverty over time, as well as the history of poverty itself.

Archives do their best job with records produced in the process of carrying out tasks related to business or personal life, and many of those who are living in the greatest poverty aren’t generating (or saving) their own records. Is being documented by photographers, news articles and the Census Bureau the same thing as telling your own story through an oral history or having your photographs, personal papers or other life documents archived? One of the most fascinating things about primary source materials in general, and archival records in specific, is the first hand view that it can lend the researcher. That sense of stepping into their shoes – of having a chance to retrace their steps.

There are certainly institutions whose records cast light on the lives of those in poverty such as homeless shelters, social service agencies and health clinics – but I would put forth that we are rarely capturing the first person voices of those living in poverty. I am realistic. I know that those dealing with the basic issues of food, shelter and personal safety are likely not thinking about where to record their oral history or how to get their personal papers into an archive or manuscript collection. That doesn’t mean that I don’t wish there wasn’t a better way. These are people who deserve to be represented with their own voice to the people of the future.

I am enamored of the idea of recording people’s own stories as is being done in each of the following examples:

I want to end my post with an inspirational project. Photographer Camilo José Vergara has been photographing the built environment in poor, minority communities across the United States since 1977.  He has re-photographed the same locations many times over the years. This permits him to create time lapse series of images that show how a space has changed over time. He has published a number of books (the most recent of which is American Ruins) as well as having created an interactive website.

The Invincible Cities website documents Harlem, NY, Camden, NJ and Richmond, CA. After selecting one of these three locations you are greeted by a map, timeline and photographs. You can walk through time at individual locations and watch storefronts change, buildings get demolished and fashions shift. The interface lets you select images by location, theme and year. My description can’t do it any justice – just go explore for yourself: Invincible Cities. The site explains that his next goal is to create a ‘Visual Encyclopedia of the American Ghetto’ (VE for short) that covers all of the United States.

In the March 2008 PopPhoto.com article Camilo Jose Vergara: 30 Years Documenting the American Ghetto, we find the following interesting quotes from the photographer:

“Once photography at its best and most prestigious became art and the rewards went to photographer artists, the field became uninterested and unable to significantly contribute to the creation of a historical record, that is to the making of an inventory of our world and to illustrate how it changes,” asserts Vergara, adding that the Internet is an ideal way to bypass traditional museums. “You can realize a larger world that can support a different kind of photography.”

The Internet is especially well-suited to housing a multi-layered history of the ghettos’ evolution. Advances in technology allow the designers to arrange images in complex ways: links take the viewer to a page that gives census data; click on a color-coded street map on the left side of the screen to pinpoint exact addresses of panoramic views, artifacts, architectural details, building interiors or street-level views. “These kinds of things were unimaginable when I started the project,” he says.

Can we expect projects like this  to give individuals of the future a real taste of what life was like for the poor in US cities or around the world? Should part of our efforts at diversity of representation in the historical record specifically address preservation of the records and manuscripts of those living in poverty? Lots to think about! I hope this post has introduced you to new resources and projects. Please share any I missed in the comments below.

SAA2008: Preservation and Experimentation with Analog/Digital Hybrid Literary Collections (Session 203)

floppy disks

The official title of Session 203 was Getting Our Hands Dirty (and Liking It): Case Studies in Archiving Digital Manuscripts. The session chair, Catherine Stollar Peters from the New York State Archives and Records Administration, opened the session with a high level discussion of the “Theoretical Foundations of Archiving Digital Manuscripts”. The focus of this panel was preserving hybrid collections of born digital and paper based literary records. The goal was to review new ways to apply archival techniques to digital records. The presenters were all archivists without IT backgrounds who are building on others work … and experimenting. She also mentioned that this also impacts researchers, historians, and journalists.For each of the presenters, I have listed below the top challenges and recommendations. If you attended the sessions, you can skip forward to my thoughts.

Norman Mailer’s Electronic Records

Challenges & Questions:

  • 3 laptops and nearly 400 disks of correspondence
  • While the letters might have been dictated or drafted by Mailer, all the typing, organization and revisions done on the computer were done by his assistant Judith McNally. This brings into question issues of who should be identified as the record creator. How do they represent the interaction between Mailer & McNally? Who is the creator? Co-Creators?
  • All the laptops and disks were held by Judith McNally. When she died all of her possessions were seized by county officials. All the disks from her apartment were eventually recovered over a year later – but it causes issues of provenance. There is no way to know who might have viewed/changed the records.

Revelations and Recommendations:

What is accessioning and processing when dealing with electronic records? What needs to be done?

  • gain custody
  • gather information about creator’s (or creators’) use of the electronic records. In March 2007 they interviewed Mailer to understand the process of how they worked together. They learned that the computers were entirely McNally’s domain.
  • number disks, computers (given letters), other digital media
  • create disk catalog – to reflect physical information of the disk. Include color of ink.. underlining..etc. At this point the disk has never been put into a computer. This captures visual & spacial information
  • gather this info from each disk: file types, directory structure & file names

The ideal for future collections of this type is archivist involvement earlier – the earlier the better.

Papers of Peter Ganick

  • Speaker: Melissa Watterworth
  • Featured Collection: Papers of Writer and Small Press Publisher Peter Ganick, Thomas J Dodd Research Center, University of Connecticut

Challenges & Questions:

  • What are the primary sources of our modern world?
  • How do we acquire and preserve born digital records as trusted custodians?
  • How do we preserve participatory media – maybe we can learn from those who work on performance art?
  • How do we incrementally build our collections of electronic records? Should we be preserving the tools?
  • Timing of acquisition: How actively should we be pursuing personal archives? How can we build trust with creators and get them to understand the challenges?
  • Personal papers are very contextual – order matters. Does this hold true for born digital personal archives? What does the networking aspect of electronic records mean – how does it impact the idea of order?
  • First attempt to accession one of Peter Ganick’s laptops and the archivist found nothing she could identify as files.. she found fragments of text – hypertext work and lots of files that had questionable provenance (downloaded from a mailing list? his creations?). She had to sit down next to him and learn about how he worked.
  • He didn’t understand at first what her challenges were. He could get his head around the idea of metadata and issues of authenticity. He had trouble understanding what she was trying to collect.
  • How do we arrange and keep context in an online environment?
  • Biggest tech challenge: are we holding on for too long to ideas of original order and context?
  • Is there a greater challenge in collecting earlier in the cycle? What if the creator puts restrictions on groupings or chooses to withdraw them?
  • Do we want to create contracts with donors? Is that practical?

Revelations and Recommendations:

  • Collect materials that had high value as born digital works but were at a high risk of loss.
  • Build infrastructure to support preservation of born digital records.
  • Go back to the record creator to learn more about his creative process. They used to acquire records from Ganick every few years.. that wasn’t frequent enough. He was changing the tools he used and how he worked very quickly. She made sure to communicate that the past 30 years of policy wasn’t going to work anymore. It was going to have to evolve.
  • Created a ‘submission agreement’ about what kinds of records should be sent to the archive. He submitted them in groupings that made sense to him. She reviewed the records to make sure she understood what she was getting.
  • Considering using PDFa to capture snapshot of virtual texts.
  • Looked to model of ‘self archiving’ – common in the world of professors to do ongoing accruals.
  • What about ’embedded archivists’? There is a history of this in the performing arts and NGOs and it might be happening more and more.

George Whitmore Papers

Challenges & Questions:

  • How do you establish identity in a way that is complete and uncorrupted? How do you know it is authentic? How do you make an authentic copy? Are these requirements as unreasonable and unachievable?

Revelations and Recommendations:

  • Refresh and replicate files on a regular schedule.
  • They have had good success using Quick View Plus to enable access to many common file formats. On the downside, it doesn’t support everything and since it is proprietary software there are no long term guarantees.
  • In some cases they had to send CP/M files to a 3rd party to have them converted into WordStar and have the ascii normalized.
  • Varied acquisition notes.. and accession records.. loan form with the 3rd party who did the conversion that summarized the request.. they did NOT provide information about what software was used to convert from CP/M to DOS. This would be good information to capture in the future.
  • Proposed an expansion of the standards to include how electronic records were migrated in the <processinfo> processing notes.

Questions & Answers

Question: As part of a writers community, what do we tell people who want to know what they can DO about their records. They want technical information.. they want to know what to keep. Current writers are aware they are creating their legacy.

Answer: Michael: The single best resource is the interPARES 2 Creator Guidelines. The Beineke has adapted them to distrubute to authors. Melissa: Go back to your collection development policies and make sure to include functions you are trying to document (like process.. distribution networks). Also communities of practice (acid free bits) are talking about formats and guidelines like that Gabriela: People often want to address ‘value’. Right now we don’t know how to evaluate the value of electronic drafts – it is up to authors.

Question: Cal Lee: Not a question so much as an idea: the world of digital forensics and security and the ‘order of volatility’ dictate that everyone should always be making a full disk copy bit by bit before doing anything else.

Comment: Comment on digital forensic tools – there is lots of historical and editing history of documents in the software… also delete files are still there.

Question: Have you seen examples of materials that are coming into the archive where the digital materials are working drafts for a final paper version? This is in contrast to others are electronic experiments.

Answer: Yes, they do think about this. It can effect arrangement and how the records are described. The formats also impact how things are preserved.

Question: Access issues? Are you letting people link to them from the finding aids? How are the documents authenticity protected.

Answer: DSpace gives you a new version anytime you want it (the original bitstream) .. lots of cross linking supports people finding things from more than one path. In some cases documents (even electronic) can only be accessed from within the on site reading room.

Question: What is your relationship is like with your IT folks?

Answer: Gabriela: Our staff has been very helpful. We use ‘legacy’ machines to access our content. They build us computers. They are also not archivists, so there is a little divide about priorities and the kind of information that I am interested in.. but it has been a very productive conversation.

Question: (For Melissa) Why didn’t you accept Peter’s email (Melissa had said they refused a submission of email from Peter because it didn’t have research value)?

Answer: The emails that included personal medical emails were rejected. The agreement with Peter didn’t include an option to selectively accept (or weed) what was given.

Question: In terms of gathering information from the creators.. do you recommend a formal/recorded interview? Or a more informal arrangement in which you can contact them anytime on an ongoing basis?

Answer: Melissa: We do have more formal methods – ‘documentation study’ style approaches. We might do literature reviews.. Ultimately the submission agreement is the most formal document we have. Gabriela: It depends on what the author is open to.. formal documentation is best.. but if they aren’t willing to be recorded, then you take what you can get!

My Thoughts

I am very curious to see how best practices evolve in this arena. I wonder how stories written using something like Google Documents, which auto-saves and preserves all versions for future examination, will impact how scholars choose to evaluate the evolution of documents. There have already been interesting examinations of the evolution of collaborative documents. Consider this visual overview of the updates to the Wikipedia entry for Sarah Palin created by Dan Cohen and discussed in his blog post Sarah Palin, Crowdsourced. Another great example of this type of visual experience of a document being modified was linked to in the comments of that post: Heavy Metal Umlaut: The Movie. If you haven’t seen this before – take a few minutes to click through and watch the screencast which actually lets you watch as a Wikipedia page is modified over time.

While I can imagine that there will be many things to sort out if we try to start keeping these incredibly frequent snapshot save logs (disk space? quantity of versions? authenticity? author preferences to protect the unpolished versions of their work?) – I still think that being able to watch the creative process this way will still be valuable in some situations. I also believe that over time new tools will be created to automate the generation of document evolution visualization and movies (like the two I link to above) that make it easy for researchers to harness this sort of information.

Perhaps there will be ways for archivists to keep only certain parts of the auto-save versioning. I can imagine an author who does not want anyone to see early drafts of their writing (as is apparently also the case with architects and early drafts of their designs) – but who might be willing for the frequency of updates to be stored. This would let researchers at least understand the rhythm of the writing – if not the low level details of what was being changed.

I love the photo I found for the top of this post. I admit to still having stacks of 3 1/2 floppy disks. I have email from the early days of BITNET.  I have poems, unfinished stories, old resumes and SQL scripts. For the moment my disks live in a box on the shelf labeled ‘Old Media’. Lucky me – I at least still have a computer with a floppy drive that can read them!

Image Credit: oh messy disks by Blude via flickr.

As is the case with all my session summaries from SAA2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

MayDay 2008: Do you have a disaster plan?

MayDay 2008I couldn’t let MayDay 2008 pass without pointing everyone to the amazing annotated list of MayDay resources that the Society of American Archivists (SAA) has made available.

Does your institution have a disaster plan?
If not, the list of resources include a detailed set of Free Disaster Plan Templates. Today is the perfect day to download one and start planning.

A full disaster plan too overwhelming? SAA also provides a tidy list of easy MayDay activity ideas including:

Create or Update Your Contact Lists
One of the most important elements of disaster response is knowing how to contact critical people – emergency responders, staff, and vendors. Make sure your staff members have an up-to-date list that includes as much contact information as possible: work and home phone numbers (including direct lines at work), mobile phone numbers, work and home email addresses, and any other relevant addresses. Staff at many institutions hit by hurricanes in 2005 discovered that they couldn’t use work email or phone numbers because work systems were completely out of commission; those who had an alternative phone number or email address often could connect.

Make Sure Boxes Are Off the Floor
Any number of causes – a broken pipe, a clogged toilet, fire sprinklers – may result in water in your storage areas. If shelf space is limited, use pallets for clearance. Make sure nothing is on the floor where it can be soaked.

Don’t have precious cultural heritage materials under your care? Okay then, how about you? Do you have a Family Disaster Plan and a Disaster Supplies Kit ready?

Image Credit: Society of American Archivists MayDay 2008 Logo.

Caring for Special Collections: Exploring the Connecting to Collections Bookshelf

Connecting to Collections BookshelfI subscribe to the RSS feed from the Institute of Museum and Library Services (IMLS), and so saw a press release encouraging institutions to apply for the free IMLS Connecting to Collections Bookshelf.

The IMLS Connecting to Collections Bookshelf is intended to provide small and medium-sized libraries and museums with essential resources needed to improve the condition of their collections. The Bookshelf includes books, DVDs, and other collections resources, as well as a Guide to Online Resources and a User’s Guide to all of the materials. It addresses such topics as the philosophy and ethics of collecting, collections management and planning, emergency preparedness, and culturally specific conservation issues.

The Heritage Preservation has created both a 48 page Bookshelf User’s Guide, with a page dedicated to each resources selected for the bookshelf, and a Guide to Online Resources to be used as a companion to the bookshelf. The Bookshelf User’s Guide has a brilliant section at the end giving you pointers to specific sections of the various Bookshelf resources to answer special questions – such as ‘Where can we find information on raising funds for collections care?’ and ‘How can I prioritize the needs of our collections?’.

What is interesting is that it took me a while to realize that each of the institutions that is awarded The Bookshelf will actually receive the books. My past experience with O’Reilly’s Safari Books Online made me assume that the books would be only accessed online. The Safari Books Online site requires a paid membership, but then provides access to an ever growing electronic reference library. The total number of resources is listed as currently over 5,000. One level of membership, Safari Library, provides unlimited access to all the resources (currently listed as $42.99 a month or $472.89 per year) while the less expensive membership level, Safari Bookshelf (currently listed as $22.99 a month or $252.99 a year), provides access to up to ten titles at a time.

Seeing those prices got me wondering, what will the receivers of this bookshelf be getting and what it’s total cost would be? I found my way to a list of the books and resources that will be included. Between the Internet and the 48 page guide to the Bookshelf I found the following information about each element of the Bookshelf. IMLS has broken the bookshelf down into three subsections as shown below:

Bookshelf: The Core Collection

Bookshelf: Nonliving Collections

Bookshelf: Living Collections

Grand Total

The maximum cost (with no membership discounts) to purchase all the components of The Bookshelf would be $951.87. Add in the cost of shipping and printing your own copies from the free downloads and we can probably talk about the monetary value of the Bookshelf being approximately $1000!

Online Acces

While researching all of this I came across a new option on Amazon.com – something they are calling Amazon Upgrade. For an additional fee above and beyond the price you pay for the physical book – you can have immediate and permanent online access to the content of that book. Take a look at the offering explained on the Amazon page for The National Trust Manual of Housekeeping: The Care of Collection in Historic Houses Open to the Public. I assume that they plan to increase the titles for which this is an option. If so, I can envision building an online reference shelf of one’s own – one title at a time. Rather than deciding that something like O’Reilly’s Safari Books Online has enough books to make it worth while for you – you will create your own custom online reference shelf.

The other half of the online access story is of course the number of resources that are posted online for free download (or as living HTML documents being updated over time). These are all the resources from the list above that can be downloaded for free:

What if all the resources that those who care for collections need were available via an online bookshelf? Now that would be an amazing resource for which many would be happy to pay an annual fee. Perhaps it could be provided as part of the membership fee for one or more of the appropriate professional organizations. An additional benefit to an online collection is the opportunity to receive automatic updates and new editions. I will also keep an eye on the Amazon Upgrade option to see how easy it is for someone to build their own online reference shelf – but I think a purposeful online collection designed for cultural heritage institutions would be even more compelling.

Getting the Bookshelf

A lot of organizations have already received the Bookshelf, but the press release that got me looking at all this mentioned that the next (final?) application period will be from March 1 through April 30, 2008. Recipients will be announced in July of 2008.

If you are considering applying you can find more details about the application process and review the questions you must answer online. But even for those that don’t qualify (federally operated and for-profit institutions are not eligible) – the Bookshelf User’s Guide, the Guide to Online Resources and those resources that may be downloaded for free provide a powerful combination of materials to support institutions and individuals as they care for collections of all shapes and sizes.

Note: All prices quoted in this post were valid as of January 27th, 2008. Image shown above from IMLS Connecting to Collections Bookshelf page.

Digital Preservation via Emulation – Dioscuri and the Prevention of Digital Black Holes

dioscuri.JPGAvailable Online posted about the open source emulator project Dioscuri back in late September. In the course of researching Thoughts on Digital Preservation, Validation and Community I learned a bit about the Microsoft Virtual PC software. Virtual PC permits users to run multiple operating systems on the same physical computer and can therefore facilitate access to old software that won’t run on your current operating system. That emulator approach pales in comparison with what the folks over at Dioscuri are planning and building.

On the Digital Preservation page of the Dioscuri website I found this paragraph on their goals:

To prevent a digital black hole, the Koninklijke Bibliotheek (KB), National Library of the Netherlands, and the Nationaal Archief of the Netherlands started a joint project to research and develop a solution. Both institutions have a large amount of traditional documents and are very familiar with preservation over the long term. However, the amount of digital material (publications, archival records, etc.) is increasing with a rapid pace. To manage them is already a challenge. But as cultural heritage organisations, more has to be done to keep those documents safe for hundreds of years at least.

They are nothing if not ambitious… they go on to state:

Although many people recognise the importance of having a digital preservation strategy based on emulation, it has never been taken into practice. Of course, many emulators already exist and showed the usefulness and advantages it offer. But none of them have been designed to be digital preservation proof. For this reason the National Library and Nationaal Archief of the Netherlands started a joint project on emulation.

The aim of the emulation project is to develop a new preservation strategy based on emulation.

Dioscuri is part of Planets (Preservation and Long-term Access via NETworked Services) – run by the Planets consortium and coordinated by the British Library. The Dioscuri team has created an open source emulator that can be ported to any hardware that can run a Java Virtual Machine (JVM). Individual hardware components are implemented via separate modules. These modules should make it possible to mimic many different hardware configurations without creating separate programs for every possible combination.

You can get a taste of the big thinking that is going into this work by reviewing the program overview and slide presentations from the first Emulation Expert Meeting (EEM) on digital preservation that took place on October 20th, 2006.

In the presentation given by Geoffrey Brown from Indiana University titled Virtualizing the CIC Floppy Disk Project: An Experiment in Preservation Using Emulation I found the following simple answer to the question ‘Why not just migrate?’:

  • Loss of information — e.g. word edits

  • Loss of fidelity — e.g. WordPerfect to Word isn’t very good

  • Loss of authenticity — users of migrated document need access to original to verify authenticity

  • Not always possible — closed proprietary formats

  • Not always feasible — costs may be too high

  • Emulation may necessary to enable migration

After reading through Emulation at the German National Library, presented by Tobias Steinke, I found my way to the kopal website. With their great tagline ‘Data into the future’, they state their goal is “…to develop a technological and organizational solution to ensure the long-term availability of electronic publications.” The real gem for me on that site is what they call the kopal demonstrator. This is a well thought out Flash application that explains the kopal project’s ‘procedures for archiving and accessing materials’ within the OAIS Reference Model framework. But it is more than that – if you are looking for a great way to get your (or someone else’s) head around digital archiving, software and related processes – definitely take a look. They even include a full Glossary.

I liked what I saw in Defining a preservation policy for a multimedia and software heritage collection, a pragmatic attempt from the Bibliothèque nationale de France, a presentation by Grégory Miura, but felt like I was missing some of the guts by just looking at the slides. I was pleased to discover what appears to be a related paper on the same topic presented at IFLA 2006 in Seoul titled: Pushing the boundaries of traditional heritage policy: Maintaining long-term access to multimedia content by introducing emulation and contextualization instead of accepting inevitable loss . Hurrah for NOT ‘accepting inevitable loss’.

Vincent Joguin’s presentation, Emulating emulators for long-term digital objects preservation: the need for a universal machine, discussed a virtual machine project named Olonys. If I understood the slides correctly, the idea behind Olonys is to create a “portable and efficient virtual processor”. This would provide an environment in which to run programs such as emulators, but isolate the programs running within it from the disparities between the original hardware and the actual current hardware. Another benefit to this approach is that only the virtual processor need be ported to new platforms rather than each individual program or emulator.

Hilde van Wijngaarden presented an Introduction to Planets at EEM. I also found another introductory level presentation that was given by Jeffrey van der Hoeven at wePreserve in September of 2007 titled Dioscuri: emulation for digital preservation.

The wePreserve site is a gold mine for presentations on these topics. They bill themselves as “the window on the synergistic activities of DigitalPreservationEurope (DPE), Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval (CASPAR), and Preservation and Long-term Access through NETworked Services (PLANETS).” If you have time and curiosity on the subject of digital preservation, take a glance down their home page and click through to view some of the presentations.

On the site of The International Journal of Digital Curation there is a nice ten page paper that explains the most recent results of the Dioscuri project. Emulation for Digital Preservation in Practice: The Results was published in December 2007. I like being able to see slides from presentations (as linked to above), but without the notes or audio to go with them I am often left staring at really nice diagrams wondering what the author’s main point was. The paper is thorough and provides lots of great links to other reading, background and related projects.

There is a lot to dig into here. It is enough to make me wish I had a month (maybe a year?) to spend just following up on this topic alone. I found my struggle to interpret many of the Power Point slide decks that have no notes or audio very ironic. Here I was hunting for information about the preservation of born digital records and I kept finding that the records of the research provided didn’t give me the full picture. With no context beyond the text and images on the slides themselves, I was left to my own interpretation of their intended message. While I know that these presentations are not meant to be the official records of this research, I think that the effort obviously put into collecting and posting them makes it clear that others are as anxious as I to see this information.

The best digital preservation model in the world will only preserve what we choose to save. I know the famous claim on the web is that ‘content is king’ – but I would hazard to suggest that in the cultural heritage community ‘context is king’.

What does this have to do with Dioscuri and emulators? Just that as we solve the technical problems related to preservation and access, I believe that we will circle back around to realize that digital records need the same careful attention to appraisal, selection and preservation of context as ‘traditional’ records. I would like to believe that the huge hurdles we now face on the technical and process side of things will fade over time due to the immense efforts of dedicated and brilliant individuals. The next big hurdle is the same old hurdle – making sure the records we fight to preserve have enough context that they will mean anything to those in the future. We could end up with just as severe a ‘digital black hole’ due to poorly selected or poorly documented records as we could due to records that are trapped in a format we can no longer access. We need both sides of the coin to succeed in digital preservation.

Did I mention the part about ‘Hurray for open source emulator projects with ambitious goals for digital preservation’? Right. I just wanted to be clear about that.

Image Credit: The image included at the top of this post was taken from a screen shot of Dioscuri itself, the original version of which may be seen here.

Will Crashed Hard Drives Ever Equal Unlabeled Cardboard Boxes?

Photo of Crashed Hard Drive - wonderferret on FlickrHow many of us have an old hard drive hanging around? I am talking about the one you were told was unfixable. The one that has 3 bad sectors. The one they replaced and handed to you in one of those distinctive anti-static bags. You know the ones I mean – the steely grey translucent plastic ones that look like they should contain space food.

I have more than one ‘dead’ hard drive. I can’t quite bring myself to throw them out – but I have no immediate plans to try and reclaim their files.

I know that there are services and techniques for pulling data off otherwise inaccessible hard drives. You hear about it in court cases and see it on TV shows. A quick Google search on hard drive rescue turns up businesses like Disk Data Recovery

Do archivists already make it a policy to hunt not just for computers, but for discarded and broken hard drives lurking in filing cabinets and desk drawers? Compare this to a carton of documents that needed special treatment to permit access to the records they contained and yet are appraised as valuable. If the treatment required were within budgetary and time constraints – it would be performed. Mold, bugs, rusty staples, photos that are stuck together… archivists generally know where to get the answers they need to tackle these sorts of problems. I suspect that a hard drive advertised or discovered to be broken would be treated more like an empty box than a moldy box.

For now I would stack this challenge near the bottom of the list below archiving digital records that we can access easily but that run on old hardware or software, but I can imagine a time when standard hard drive rescue techniques will need to be a tool for the average archivist.

Blog Action Day: A Look At Earth Day as Archived Online

In honor of this year’s Blog Action Day theme of discussing the environment, I decided to see what records the Internet had available about the history of Earth Day.

I started by simply Googling Earth Day. In a new browser window I opened the Internet Archive’s Wayback Machine. These were to be my two main avenues for unearthing the way that Earth Day was represented on the internet over the years.

Wikipedia’s first version of an Earth Day page was created on December 16th, 2002. This is the current Earth Day page as of the creation of this post – last updated about a week ago.

The current home page for the Earthday Network appears identical to the most recent version stored in the Wayback Machine, dated June 29, 2007 – until you notice that the featured headline on the link to http://www.earthdaynetwork.tv is different.

The site that claims to be ‘The Official Site of International Earth Day’ is EarthSite.org. The oldest version from the Wayback Machine is from December of 1996. This version shows a web visitor counter perpetually set to 1,671. Earth day ten years ago was scheduled for March 20th, 1997. If you scroll down a bit on the What’s New page you can read the 1997 State of the World Message By John McConnell (attributed as the founder of Earth Day).

The U.S. Government portal for Earth Day was first archived in the Internet Archive on April 6, 2003. The site, EarthDay.gov, hasn’t changed much in the past 4 years. The EPA has an Earth Day page of it’s own, that was first archived in early 1999. No clear way to know if that actually means that the EPA’s Earth Day page is older or if it was just found earlier by the Internet Archives ambitious web crawlers.

Envirolink.org, with the tagline “The Online Environmental Community”, was first archived back in 1996. You can see on the Wayback Machine page for Environlink.org, has a fairly full ten years worth of web page archiving.

Next I wanted to explore what the world of government records might produce on the subject. A quick stop over at Footnote.com to search for “Earth Day” didn’t yield a terribly promising list of results (no surprise there – most of their records date to before the time period we are looking for). Next I tried searching in Archival Research Catalog (ARC) over on the U.S. National Archives website. I got 15 hits – all fairly interesting looking… but none of them linked to digitized content. A search in Access to Archival Databases (AAD) system found 2 hits – one to some sort of contract between the EPA and a Fairfax Virginia company named EARTH DAY XXV from 1995 and the other a State Department telegram including this passage:

THIS NATION IS COMMITTED TO STRIVING FOR AN ENVIRONMENT THAT NOT ONLY SUSTAINS LIFE, BUT ALSO ENRICHES THE LIVES OF PEOPLE EVERYWHERE – – HARMONIZING THE WORKS OF MAN AND NATURE. THIS COMMITMENT HAS RECENTLY BEEN REINFORCED BY MY PROCLAMATION, PURSUANT TO A JOINT RESOLUTION OF THE CONGRESS, DESIGNATING MARCH 21, 1975 AS EARTH DAY, AND ASKING THAT SPECIAL ATTENTION BE GIVEN TO EDUCATIONAL EFFORTS DIRECTED TOWARD PROTECTING AND ENHANCING OUR LIFE-GIVING ENVIRONMENT.

I also thought to check the Government Printing Office’s (GPO) website for the Public Papers of the Presidents of the United States. Currently it only permits searching back through 1991 online – but my search for “Earth Day” did bring back 50 speeches, proclamations and other writings by the various presidents.

Frustrated by the total scattering of documents without any big picture, I headed back to Google – this time to search the Google News Archive for articles including “Earth Day” published before 1990. The timeline display showed me articles mostly from TIME, the Washington Post and the New York Times – some of which claimed I would need to pay in order to read.

Back again to do one more regular Google search – this time for earth day archive. This yielded an assortment of hits – and just above the fold I found my favorite snapshot of Earth Day history. The TIME Earth Day Archive Collection is a selection of the best covers, quotes and articles about Earth Day – from February 2, 1970 to the present. This is the gold mine for getting perspective on Earth Day as it has been perceived and celebrated in the United States. The covers are brilliant! If I had started this post early enough, I would have requested permission to include some here.

With the passionate title Fighting to Save the Earth from Man, the first article in the TIME Earth Day Collection begins by quoting then President Nixon’s first State of the Union Address:

The great question of the seventies is, shall we surrender to our surroundings, or shall we make our peace with nature and begin to make reparations for the damage we have done to our air, to our land, and to our water?

Fast forward to the recent awarding of the Nobel Peace Prize for 2007 to the Intergovernmental Panel on Climate Change (IPCC) and Al Gore and I have to image that the answer to that question of if we were ready to make peace with nature asked so long ago was ‘Not Yet’.

Overall, this was an interesting experiment. The hunt for ‘old’ (such as it is in the fast moving world of the Internet) data about a topic online is a strange and frustrating experience. Even with the Wayback Machine, I often found myself with only part of the picture. Often the pages I tried to view were missing images or other key elements. Sometimes I found a link to something tantalizing, only to realize that the target page was not archived (or is so broken as to be of no use). The search through government records and old newspaper stories did produce some interesting results – but again seemed to fail to produce any sense of the big picture of Earth Day over the years.

The TIME Collection about Earth Day was assembled by humans and arranged nicely for examination by those interested in the subject. It is properly named a ‘collection’ (in the archival sense) because it is not the pure output of activities surrounding Earth Day, but rather a selected snapshot of related articles and images that share a common topic. That said, it is my fervent hope that websites such as these appear more and more. I suspect that the lure of attracting more readers to their websites with existing content will only encourage more content creators with a long history to join in the fun. If other do it as well as TIME has seemed to in this case, it will be a win/win situation for everyone.

Preserving Virtual Worlds – TinyMUD to SecondLife

A recent press release from the Library of Congress, Digital Preservation Program Makes Awards to Preserve American Creative Works, describes the newly funded project aimed at the preservation of ‘virtual worlds’:

The Preserving Virtual Worlds project will explore methods for preserving digital games and interactive fiction. Major activities will include developing basic standards for metadata and content representation and conducting a series of archiving case studies for early video games, electronic literature and Second Life, an interactive multiplayer game. Second Life content participants include Life to the Second Power, Democracy Island and the International Spaceflight Museum. Partners: University of Maryland, Stanford University, Rochester Institute of Technology and Linden Lab.

This has gotten a fair amount of coverage from the gaming and humanities sides of the world, but I learned about it via Professor Matthew Kirschenbaum‘s blog post Just Funded: Preserving Virtual Worlds.

The How They Got Game 2 post Library of Congress announces grants for preservation of digital games gives a more in depth summary of the Preserving Virtual Worlds project goals:

The main goal of the project is to help develop generalizable mechanisms and methods for preserving digital games and interactive fiction, and to begin to test these mechanism through the archiving of selected test cases. Key deliverables include the development of metadata schema and wrapper recommendations, and the long-term curation of archived cases.

I take this all a bit more personally than most might. I was a frequent denizen of an online virtual world known as TinyMUD (now usually referred to as TinyMUD Classic). TinyMUD was a text based, online, multi-player game that existed for seven months beginning in August of 1989. In practice it was sort of a cross between a chat room and a text based adventure. The players could build new parts of the MUD as they went – in many ways it was an early example of crowdsourcing. There was a passionate core of players who were constantly building new areas for others to explore and experience – not unlike what is currently the case in SecondLife. These types of text based games still exist – see MudMagic for listings.

Apparently August 20, 2007 will be TinyMUD’s 18th Annual Brigadoon Day. It will be celebrated by putting TinyMUD classic online for access. The page includes careful notes about finding and using a MUD Client to access TinyMUD. The existence of an ongoing MUD community of users has kept software like this alive and available almost 20 years later.

With projects like Preserving Virtual Worlds getting grants and gaining momentum it seems more plausible with each passing day that 18 years from now, parts of 2007’s SecondLife will still be available for people to experience. I am thankful to know that a copy of the TinyMUD world I helped build is still out there. I am even more thankful to know that the technology still exists to permit users to access it even if it is only once a year.

Update: 20th Anniversary of TinyMud Brigadoon day is set for Thursday, August 20, 2009