Menu Close

Category: original order

Concertina History Online Features Virtual Collaboration and Digitization

In the early 1960s, my father bought a Wheatstone concertina in London. He tells how he visited the factory where it was made to pick one out and recalls the ledger book in which details about the concertinas were recorded. After a recent retelling of this family classic, I was inspired to see what might be online related to concertinas. I was amazed!

First I found the Concertina Library which presents itself as a ‘Digital Reference Collection for Concertinas’. With fourteen contributing authors, the site includes in depth articles on concertina history, technology, music, research and a wide range of concertina systems.

I particularly appreciate the reasons that Robert Gaskins, site creator, lists for the creation of the site on the about page:

(1) Almost all of the historical material about concertinas has been held in research libraries where access is limited, or in private collections where access may be non-existent. The reason for this is not that the material is so valuable, but that in the past there was no way to make material of limited interest available to everyone, so it stayed safely in archives. The web has provided a way to make this material widely available—partly by the libraries themselves, and partly in collections such as this.

(2) There seems to be a growing number of people working again on the history of concertinas, perhaps in part because research materials are becoming available on the web. These people are widely scattered, so they don’t get to meet and discuss their work in person. But again the web has provided an answer, allowing people to work collaboratively and exchange information across miles and timezones, and for the resulting articles the web offers worldwide publication at almost no cost.

What an eloquent testimonial for the power of the internet to both provide access to once-inaccessible materials and support virtual collaboration within a geographically dispersed community.

Next, I found the Wheatstone Concertina Ledgers. This site features business records (in the form of ledgers) of the C. Wheatstone & Co. stretching from 1830 through 1974 (with some gaps). The originals are held at the Library of the Horniman Museum in London. It is a great reference website with a nice interface for paging through the ledgers. Armed with the serial number from my father’s concertina (36461) I found my way to page 88 of a Wheatstone Production Journal from the Dickinson Archives. If I am reading that line properly, his concertina is a 3E model and was made (or maybe sold?) April 25, 1960. I wish that there was documentation online to explain how to read the ledgers. For example, I would love to know what ‘Bulletin 3052’ means.

I liked the way that they retained the sense of turning pages in a ledger. Every page of each ledger is included, including front and back end pages and blank pages. I have total confidence that I am seeing the pages in the same order as I would in person.

You can read the overview and introduction to the project, but what intrigued me more was the very detailed narrative of how this digitization effort was accomplished. In How The Wheatstone Concertina Ledgers Were Digitized, we find Robert Gaskins of  the Concertina Library explaining how, with an older model IBM ThinkPad, a consumer grade scanner, and his existing software (Microsoft Office and Macromedia Fireworks), he created a website with 4,500 images and clean, simple navigation. From where I sit, this is a great success story – a single person’s dedication can yield fantastic results. You don’t need the latest and greatest technology to run a successful digitization project. One individual can go a long way through sheer determination and the clever leveraging of what they have on hand.

Back on the Concertina Library‘s about page we find “There is still a lot of material relevant to the study of concertinas and their history which should be digitized and placed on the web, but has not been so far. Ideas for additional contributors, items, and collections are very welcome.” If I am following the dates correctly, the Concertina Library has articles dating back to February of 2001, shortly before Mr. Gaskins started planning the ledger digitization project. At the same time as he was collaborating with other concertina enthusiasts to build the Concertina Library,  he was scanning ledgers and creating the Wheatstone Concertina Ledgers website. Three cheers to Mr. Gaskins for his obvious personal enthusiasm and dedication to virtual collaboration, digitization and well-built websites! Another three cheers for all those who joined the cause and collaborated to create great online resources to support ongoing concertina research from anywhere in the world.

All this started because my father owns a beautiful old concertina. I love it when an innocent web search leads me to find a wealth of online archival materials. Do you have a favorite online archival resource that you stumbled across while doing similar research for family or friends? Please share them in the comments below!

Image Credit: http://www.flickr.com/photos/rocketlass/ / CC BY-NC-SA 2.0

SAA2008: Preservation and Experimentation with Analog/Digital Hybrid Literary Collections (Session 203)

floppy disks

The official title of Session 203 was Getting Our Hands Dirty (and Liking It): Case Studies in Archiving Digital Manuscripts. The session chair, Catherine Stollar Peters from the New York State Archives and Records Administration, opened the session with a high level discussion of the “Theoretical Foundations of Archiving Digital Manuscripts”. The focus of this panel was preserving hybrid collections of born digital and paper based literary records. The goal was to review new ways to apply archival techniques to digital records. The presenters were all archivists without IT backgrounds who are building on others work … and experimenting. She also mentioned that this also impacts researchers, historians, and journalists.For each of the presenters, I have listed below the top challenges and recommendations. If you attended the sessions, you can skip forward to my thoughts.

Norman Mailer’s Electronic Records

Challenges & Questions:

  • 3 laptops and nearly 400 disks of correspondence
  • While the letters might have been dictated or drafted by Mailer, all the typing, organization and revisions done on the computer were done by his assistant Judith McNally. This brings into question issues of who should be identified as the record creator. How do they represent the interaction between Mailer & McNally? Who is the creator? Co-Creators?
  • All the laptops and disks were held by Judith McNally. When she died all of her possessions were seized by county officials. All the disks from her apartment were eventually recovered over a year later – but it causes issues of provenance. There is no way to know who might have viewed/changed the records.

Revelations and Recommendations:

What is accessioning and processing when dealing with electronic records? What needs to be done?

  • gain custody
  • gather information about creator’s (or creators’) use of the electronic records. In March 2007 they interviewed Mailer to understand the process of how they worked together. They learned that the computers were entirely McNally’s domain.
  • number disks, computers (given letters), other digital media
  • create disk catalog – to reflect physical information of the disk. Include color of ink.. underlining..etc. At this point the disk has never been put into a computer. This captures visual & spacial information
  • gather this info from each disk: file types, directory structure & file names

The ideal for future collections of this type is archivist involvement earlier – the earlier the better.

Papers of Peter Ganick

  • Speaker: Melissa Watterworth
  • Featured Collection: Papers of Writer and Small Press Publisher Peter Ganick, Thomas J Dodd Research Center, University of Connecticut

Challenges & Questions:

  • What are the primary sources of our modern world?
  • How do we acquire and preserve born digital records as trusted custodians?
  • How do we preserve participatory media – maybe we can learn from those who work on performance art?
  • How do we incrementally build our collections of electronic records? Should we be preserving the tools?
  • Timing of acquisition: How actively should we be pursuing personal archives? How can we build trust with creators and get them to understand the challenges?
  • Personal papers are very contextual – order matters. Does this hold true for born digital personal archives? What does the networking aspect of electronic records mean – how does it impact the idea of order?
  • First attempt to accession one of Peter Ganick’s laptops and the archivist found nothing she could identify as files.. she found fragments of text – hypertext work and lots of files that had questionable provenance (downloaded from a mailing list? his creations?). She had to sit down next to him and learn about how he worked.
  • He didn’t understand at first what her challenges were. He could get his head around the idea of metadata and issues of authenticity. He had trouble understanding what she was trying to collect.
  • How do we arrange and keep context in an online environment?
  • Biggest tech challenge: are we holding on for too long to ideas of original order and context?
  • Is there a greater challenge in collecting earlier in the cycle? What if the creator puts restrictions on groupings or chooses to withdraw them?
  • Do we want to create contracts with donors? Is that practical?

Revelations and Recommendations:

  • Collect materials that had high value as born digital works but were at a high risk of loss.
  • Build infrastructure to support preservation of born digital records.
  • Go back to the record creator to learn more about his creative process. They used to acquire records from Ganick every few years.. that wasn’t frequent enough. He was changing the tools he used and how he worked very quickly. She made sure to communicate that the past 30 years of policy wasn’t going to work anymore. It was going to have to evolve.
  • Created a ‘submission agreement’ about what kinds of records should be sent to the archive. He submitted them in groupings that made sense to him. She reviewed the records to make sure she understood what she was getting.
  • Considering using PDFa to capture snapshot of virtual texts.
  • Looked to model of ‘self archiving’ – common in the world of professors to do ongoing accruals.
  • What about ’embedded archivists’? There is a history of this in the performing arts and NGOs and it might be happening more and more.

George Whitmore Papers

Challenges & Questions:

  • How do you establish identity in a way that is complete and uncorrupted? How do you know it is authentic? How do you make an authentic copy? Are these requirements as unreasonable and unachievable?

Revelations and Recommendations:

  • Refresh and replicate files on a regular schedule.
  • They have had good success using Quick View Plus to enable access to many common file formats. On the downside, it doesn’t support everything and since it is proprietary software there are no long term guarantees.
  • In some cases they had to send CP/M files to a 3rd party to have them converted into WordStar and have the ascii normalized.
  • Varied acquisition notes.. and accession records.. loan form with the 3rd party who did the conversion that summarized the request.. they did NOT provide information about what software was used to convert from CP/M to DOS. This would be good information to capture in the future.
  • Proposed an expansion of the standards to include how electronic records were migrated in the <processinfo> processing notes.

Questions & Answers

Question: As part of a writers community, what do we tell people who want to know what they can DO about their records. They want technical information.. they want to know what to keep. Current writers are aware they are creating their legacy.

Answer: Michael: The single best resource is the interPARES 2 Creator Guidelines. The Beineke has adapted them to distrubute to authors. Melissa: Go back to your collection development policies and make sure to include functions you are trying to document (like process.. distribution networks). Also communities of practice (acid free bits) are talking about formats and guidelines like that Gabriela: People often want to address ‘value’. Right now we don’t know how to evaluate the value of electronic drafts – it is up to authors.

Question: Cal Lee: Not a question so much as an idea: the world of digital forensics and security and the ‘order of volatility’ dictate that everyone should always be making a full disk copy bit by bit before doing anything else.

Comment: Comment on digital forensic tools – there is lots of historical and editing history of documents in the software… also delete files are still there.

Question: Have you seen examples of materials that are coming into the archive where the digital materials are working drafts for a final paper version? This is in contrast to others are electronic experiments.

Answer: Yes, they do think about this. It can effect arrangement and how the records are described. The formats also impact how things are preserved.

Question: Access issues? Are you letting people link to them from the finding aids? How are the documents authenticity protected.

Answer: DSpace gives you a new version anytime you want it (the original bitstream) .. lots of cross linking supports people finding things from more than one path. In some cases documents (even electronic) can only be accessed from within the on site reading room.

Question: What is your relationship is like with your IT folks?

Answer: Gabriela: Our staff has been very helpful. We use ‘legacy’ machines to access our content. They build us computers. They are also not archivists, so there is a little divide about priorities and the kind of information that I am interested in.. but it has been a very productive conversation.

Question: (For Melissa) Why didn’t you accept Peter’s email (Melissa had said they refused a submission of email from Peter because it didn’t have research value)?

Answer: The emails that included personal medical emails were rejected. The agreement with Peter didn’t include an option to selectively accept (or weed) what was given.

Question: In terms of gathering information from the creators.. do you recommend a formal/recorded interview? Or a more informal arrangement in which you can contact them anytime on an ongoing basis?

Answer: Melissa: We do have more formal methods – ‘documentation study’ style approaches. We might do literature reviews.. Ultimately the submission agreement is the most formal document we have. Gabriela: It depends on what the author is open to.. formal documentation is best.. but if they aren’t willing to be recorded, then you take what you can get!

My Thoughts

I am very curious to see how best practices evolve in this arena. I wonder how stories written using something like Google Documents, which auto-saves and preserves all versions for future examination, will impact how scholars choose to evaluate the evolution of documents. There have already been interesting examinations of the evolution of collaborative documents. Consider this visual overview of the updates to the Wikipedia entry for Sarah Palin created by Dan Cohen and discussed in his blog post Sarah Palin, Crowdsourced. Another great example of this type of visual experience of a document being modified was linked to in the comments of that post: Heavy Metal Umlaut: The Movie. If you haven’t seen this before – take a few minutes to click through and watch the screencast which actually lets you watch as a Wikipedia page is modified over time.

While I can imagine that there will be many things to sort out if we try to start keeping these incredibly frequent snapshot save logs (disk space? quantity of versions? authenticity? author preferences to protect the unpolished versions of their work?) – I still think that being able to watch the creative process this way will still be valuable in some situations. I also believe that over time new tools will be created to automate the generation of document evolution visualization and movies (like the two I link to above) that make it easy for researchers to harness this sort of information.

Perhaps there will be ways for archivists to keep only certain parts of the auto-save versioning. I can imagine an author who does not want anyone to see early drafts of their writing (as is apparently also the case with architects and early drafts of their designs) – but who might be willing for the frequency of updates to be stored. This would let researchers at least understand the rhythm of the writing – if not the low level details of what was being changed.

I love the photo I found for the top of this post. I admit to still having stacks of 3 1/2 floppy disks. I have email from the early days of BITNET.  I have poems, unfinished stories, old resumes and SQL scripts. For the moment my disks live in a box on the shelf labeled ‘Old Media’. Lucky me – I at least still have a computer with a floppy drive that can read them!

Image Credit: oh messy disks by Blude via flickr.

As is the case with all my session summaries from SAA2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Of Pirates, Treasure Chests and Keys: Improving Access to Digitized Materials

Key to Anything by Stoker Studios (flickr)Dan Cohen posted yesterday about what he calls The Pirate Problem. Basically the Pirate Problem can be summed up as “there are ways of acting and thinking that we can’t understand or anticipate.” Why is that a ‘Pirate Problem’? Because a pirate pub opened near his home and rather than folding shortly thereafter due to lack of interest from the ‘very serious professionals’ who populate DC suburbs – the pub was a rousing success due to the pirate aficionados who came out of the woodwork to sing sea shanties and drink grog. This surprising turn of events highlighted for him the fact that there are many ways of acting and thinking (some people even know all the words to sea shanties without needing sheet music).

Dan recently delivered the keynote speech at a workshop at the University of North Carolina at Chapel Hill. The workshop brought together dozens of historians to talk about how the 16 million archival documents of the Southern Historical Collection (SHC) should be put online. He devoted his keynote “to prodding the attendees into recognizing that the future of archives and research might not be like the past” and goes on in his post to explain:

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC.

Much of the stress of Dan’s article is on fear of new techniques of analysis. The choppy waters of text mining and pattern recognition threaten to wash away traditional methods of actually reading individual pages and “most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time”.

I certainly like the idea of new technologically based ways of analyzing large sets of cultural heritage materials, but I also believe that reading individual letters will always be important. The trick is finding the right letter!

And of course – we still need the context. It isn’t as if when we digitize major collections like the SHC that we are going to scan and OCR each page without regard to which box it came out of. We can’t slice and dice archival records and manuscripts into their component parts to feed into text analysis with no way back to the originals.

I like to imagine the combination of all the new technology (be it digitization, cross collection searching, text mining or pattern recognition) as creating keys to different treasure chests. Humanities scholars are treasure hunters. Some will find their gems through careful reading of individual passages. Others will discover patterns spread across materials now co-existing virtually that before digitization would have been widely separated by space and time. Both methods will benefit from the digitization of materials and the creation of innovative search and text analysis tools. Both still require an understanding of a material’s origin. The importance of context isn’t going anywhere – we still need to know which box the letter came from (and in a perfect world, which page came before and which came after). I want scholars to still be able to read one page from the box – I just want them to be able to do it from home in the middle of the night if they are so inclined with their travel budget no worse for wear.

Dan ties his post together by pointing out that:

… in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars.

In my opinion, the core message should be that we just found more locked treasure chests – and for those who are interested, we have some new keys that just might open those locks. I enjoyed the Pirate metaphor (obviously) and I appreciate that there are real issues here relating to strong discomfort with the fast changing landscape of technology, but I have to believe that if we do something that prevents historians from being able to read one letter at a time we are abandoning the treasure chests that are already open for the new ones for which we haven’t yet found the right keys. I am greedy. I want all the treasure!

Image credit: key to anything by Stoker Studios via flickr

ISSUU: Interesting Platform for Online Publishing

Issuu, with the tag line ‘Read the world. Publish the world.’ and pronounced ‘issue’, gives anyone the ability to upload a PDF document and publish it as an online magazine. I am intrigued by the possibilities of using this service to publish digitized archival records – especially those that would lend themselves to a ‘book’ style presentation (thinking here of a ledger or equivalent).

I am not sure I totally understand the implications of the Issuu Terms of service… especially this part:

By distributing or disseminating Uploader Submissions through the Issuu Service, you hereby grant to Issuu a worldwide, non-exclusive, transferable, assignable, fully paid-up, royalty-free, license to host, transfer, display, perform, reproduce, distribute, and otherwise exploit your Uploader Submissions, in any media forms or formats, and through any media channels, now known or hereafter devised, including without limitation, RSS feeds, embeddable functionality, and syndication arrangements in order to distribute, promote or advertise your Uploader Submissions through the Issuu Service.

If I am following that properly, all the rights you are granting to the Issuu Service are only for the purposes of their distribution of your uploaded PDF.

Issuu has a special Copyright FAQ, which in combination with Peter Hirtle‘s page on Copyright Term and the Public Domain in the United States, should support those trying to figure out if they can upload what they want to upload without getting into copyright related hot water.

So how is it different from a plain old PDF? Take a look at the embedded Issuu viewer below showing a 1908 copy of The Colonial Book of The Towle Manufacturing Company Silversmiths.

I don’t think this would ever be the way you would want to give online access to digitized records in general – but I do think that this could be a great way to highlight a particularly impressive set or volume of documents. If an archives featured one of these a month on their homepage – would people subscribe to their RSS feed just to see the new one? On the actual page on which I found the above document, Issuu makes it easy to subscribe to the RSS feed for the Issuu author ‘silverlibrary’.

I don’t know why Issuu has decided that I must create an account before I may view document author silverlibrary’s user profile. I would hope that there was an elegant way for visitors to see a group of Issuu documents created by the same author without having to create an account first (or ever).

Want to know what others think? Take a look at Finally, a Web-based PDF Viewer That Does Not Suck (Issuu) over on TechCrunch. One interesting tidbit I picked up from that review is that Issuu is based in Denmark. I wonder what impact that has on which copyright rules apply to the documents uploaded into Issuu.

Want to read more about their vision? Of course they have a press release in the form of an Issuu publication and I have embedded it below. I think my favorite line is that Issuu is intended to be ‘YouTube for Publications’.

I would love to see a highlighted section created for ‘cultural heritage materials’ (or something like that anyway). Take a look around Issuu and let me know what you think. Is this a viable tool for an archives or manuscript collection to use to highlight parts of their collection?

Thoughts on Archiving Web Sites

Shortly after my last post, a thread surfaced on the Archives Listserv asking the best way to crawl and record the top few layers of a website. This led to many posts suggesting all sorts of software geared toward this purpose. This post shares some of my thinking on the subject.

Adobe Acrobat can capture a website and convert it into a PDF. As pointed out in the thread above, that would loose the original source HTML – yet there are more issues than that alone. It would also loose any interaction other than links to other pages. It is not clear to me what would happen to a video or flash interface on a site being ‘captured’ by Acrobat. Quoting a lesson for Acrobat7 titled Working with the Web : “Acrobat can download HTML pages, JPEG, PNG, SWF, and GIF graphics (including the last frame of animated GIFs), text files, image maps and form fields. HTML pages can include tables, linkes, frames, background colors, text colors, and forms. Cascading Stylesheets are supported. HTML links are turned into Web links, and HTML forms are turned into PDF forms.”

I looked at a few website HTML capture programs such as Heritrix, Teleport Pro, HTTrack Web and the related ProxyTrack. I hope to take the time to compare each of these options and discover what it does when confronted with something more complicated than HTML, images or cascading style sheets. It also got me thinking about HTML and versions of browsers. It think it safe to say that most people who browse the internet with any regularity have had the experience of viewing a page that just didn’t look right. Not looking right might be anything from strange alignment or odd fonts all the way to a page that is completely illegible. If you are a bit of a geek (like me) you might have gotten clever and tried another browser to see if it looked any better. Sometimes it does – sometimes it doesn’t. Some sites make you install something special (flash or some other type of plugin or local program).

Where does this leave us when archiving websites? A website is much more than just it’s text. If the text were all we worried about I am sure you could crawl and record (or screen scrape) just the text and links and call it a day being fairly confident that text stored as a plain ASCII file (with some special notation for links) would continue to be readable even if browsers disappeared from the world. While keeping the words is useful, it also looses a lot of the intended meaning. Have you read full text journal articles online that don’t have the images? I have – and I hate it. I am a very visually oriented person. It doesn’t help me to know there WAS a diagram after the 3rd paragraph if I can’t actually see it. Keeping all the information on a webpage is clearly important. The full range of content (all the audio, video, images and text on a page) is important to viewing the information in its original context.

Archivists who work with non-print media records that require equipment for access are already in the practice of saving old machines hoping to ensure access to their film, video and audio records. I know there are recommendations for retaining older computers and software to ensure access to data ‘trapped’ in ‘dead’ programs (I will define a dead program here as one which is no longer sold, supported or upgraded – often one that is only guaranteed to run on a dead operating system). My fear is for the websites that ran beautifully on specific old browsers. Are we keeping copies of old browsers? Will the old browsers even run on newer operating systems? The internet and its content is constantly changing – even just keeping the HTML may not be enough. What about those plugins – what about the streaming video or audio. Do the crawlers pull and store that data as well?

One of the most interesting things about reading old newspapers can be the ads. What was being advertised at the time? How much was the sale price for laundry detergent in 1948? With the internet customizing itself to individuals or simply generating random ads how would that sort of snapshot of products and prices be captured? I wonder if there is a place for advertising statistics as archival records. What google ads were most popular on a specific day? Google already has interesting graphs to show the correspondence between specific keyword searches and news stories that google perceives as related to the event. The Internet Archive (IA) could be another interesting source for statistical analysis of advertising for those sites that permit crawling.

What about customization? Only I (or someone looking over my shoulder) can see my MyYahoo page. And it changes each time I view it. It is a conglomeration of the latest travel discounts, my favorite comics, what is on my favorite TV and cable channels tonight, the headlines of the newspapers/blogs I follow and a snapshot of my stock portfolio. Take even a corporate portal inside an intranet. Often a slightly less moving target – but still customizable to the individual. Is there a practical way to archive these customized pages – even if only for a specific user of interest? Would it be worthwhile to be archiving the personalized portal pages of an ‘important’ or ‘interesting’ person on a daily basis – such that their ‘view’ of the world via a customized portal could be examined by researchers later?

A wealth of information can be found on the website for the Joint Workshop on Future-proofing Institutional Websites from January 2006. The one thing most of these presentations agree upon is that ‘future-proofing’ is something that institutions should think about at the time of website design and creation. Standards for creating future-proof websites directs website creators to use and validate against open standards. Preservation Strategies for institutional website content shows insight into NARA‘s approach for archiving US government sites, the results of which can be viewed at http://www.webharvest.gov/. A summary of the issues they found can be read in the tidy 11 page web harvesting survey.

I definitely have more work ahead of me to read through all the information available from the International Internet Preservation Consortium and the National Library of Australia’s Preserving Access to Digital Information (PADI). More posts on this topic as I have time to read through their rich resources.

All around, a lot to think about. Interesting challenges for researchers in the future. The choices archivists face today often will depend on the type of site they are archiving. Best practices are evolving both for ‘future-proofing’ sites and for harvesting sites for archiving. Unfortunately, not everyone building a website that may be worth archiving is particularly concerned with validating their sites against open standards. Institutions that KNOW that they want to archive their sites are definitely a step ahead. They can make choices in their design and development to ensure success in archiving at a later date. It is the wild west fringe of the internet that are likely to present the greatest challenge for archivists and researchers.

Paper Calendars, Palm Pilots and Google Calendar

In my intro archives class (LBSC 605 Archival Principles, Practices, and Programs), one of the first ideas that made a light bulb go on over my head related to the theory that archivists want to retain the original order of records. For example, if someone choose to put a series of 10 letters together in a file – then they should be kept that way. A researcher may be able to glean more information from these letters when he/she sees them grouped that way – organized as the person who originally used them organized them.

Our professor went on to explain that seeing what the person who used the records saw was crucial to understanding the original purpose and usage of those records. That took my mind quickly to the world of calendars. Years ago, a CEO of some important organization would have a calendar or datebook of some sort – likely managed by an assistant. Ink or pencil was used to write on paper. Perhaps fresh daily schedules would be typed.

Fast forward to now and the universe of the Palm Pilot and other such handy-dandy hand held and totally customizable devices. If you have one (or have seen those of a friend) you know that how I choose to look at my schedule may be radically different from the way you choose to see your schedule. Mine might have my to-do list shown on the bottom half of the screen. Yours might have little colored icons to show you when you have a conference call. The archivist asked to preserve a born digital calendar will have a lot of hard choices to make.

These days I actually use Google Calendar more often than my Palm. While it has more of a fixed layout (for the moment) – I have the option of including many external calendars (see examples at iCalShare). Right now I have listings of when new movies come out as well as the concert schedule for summer 2006 for the Wolf Trap National Park for the Performing Arts. In the old style paper calendar, a researcher would be able to see related events that the user of the calendar cared about because they would be written down right there. If someone wanted to include my Google calendar in an archive someday (or that of someone much more important!), I suspect they would be left with JUST the records I had added myself into my calendar. When I choose to display the Wolf Trap summer schedule, Google calendar asks me to wait while it loads – presumably from an externally published iCalendar or other public Google calendar source.

This has many implications for the archivist tasked with preserving the records in that Palm Pilot or Google calendar (or any of a laundry list of scheduling applications). This post can do nothing other than list interesting questions at this stage (both ‘this stage’ of my archival education as well as ‘this stage’ of consideration of born digital records in the archival field).

  • How important is it to preserve the appearance of the interface used by the digital calendar user?
  • Might printing or screen capturing a statistical sample (an entire month? an entire year?) help researchers in the future understand HOW the record creator in question interacted with their calendar – what sorts of information they were likely to use in making choices in their scheduling?
  • Could there be a place for preserving publicly shared calendars (like the ones you can choose to access on Google Calendar or Apple’s iCal) such that they would be available to researchers later? What organization would most likely be capable of taking this sort of task on?
  • Could emulators be used to permit easy access to centrally stored born digital calendars? At least one PalmOS Emulator already exists, created mainly for use by those developing software for hardware that runs the Palm operating system it mimics how the tested software would run in the real world. Should archivists be keeping copies of this sort of software as they look to the future of retaining the best access possible to these sorts of records?
  • How can the standard iCalendar format be leveraged by archivists working to preserve born digital calendars?
  • To what degree are the schedules of people whose records will be of interest to archivists someday moving out of private offices (and even out of personally owned computers and handheld devices) and into the centralized storage of web applications such as Google Calendar?

I know that this is just a tiny bite of the kinds of issues being grappled with by Archivists around the world as they begin to accept born digital records into archives. Each type of application (scheduling vs accounting vs business systems) will pose similar issues to those described above – along with special challenges unique to each type. Perhaps if each of the most common classes of applications (such as scheduling) are tackled one by one by a designated team we can save individual archivists the pain of reinventing the wheel. Is this already happening?