Paper Calendars, Palm Pilots and Google Calendar

In my intro archives class (LBSC 605 Archival Principles, Practices, and Programs [1]), one of the first ideas that made a light bulb go on over my head related to the theory that archivists want to retain the original order of records. For example, if someone choose to put a series of 10 letters together in a file – then they should be kept that way. A researcher may be able to glean more information from these letters when he/she sees them grouped that way – organized as the person who originally used them organized them.

Our professor went on to explain that seeing what the person who used the records saw was crucial to understanding the original purpose and usage of those records. That took my mind quickly to the world of calendars. Years ago, a CEO of some important organization would have a calendar or datebook of some sort – likely managed by an assistant. Ink or pencil was used to write on paper. Perhaps fresh daily schedules would be typed.

Fast forward to now and the universe of the Palm Pilot [2] and other such handy-dandy hand held and totally customizable devices. If you have one (or have seen those of a friend) you know that how I choose to look at my schedule may be radically different from the way you choose to see your schedule. Mine might have my to-do list shown on the bottom half of the screen. Yours might have little colored icons to show you when you have a conference call. The archivist asked to preserve a born digital calendar will have a lot of hard choices to make.

These days I actually use Google Calendar [3] more often than my Palm. While it has more of a fixed layout (for the moment) – I have the option of including many external calendars (see examples at iCalShare [4]). Right now I have listings of when new movies come out as well as the concert schedule for summer 2006 [5] for the Wolf Trap National Park for the Performing Arts [6]. In the old style paper calendar, a researcher would be able to see related events that the user of the calendar cared about because they would be written down right there. If someone wanted to include my Google calendar in an archive someday (or that of someone much more important!), I suspect they would be left with JUST the records I had added myself into my calendar. When I choose to display the Wolf Trap summer schedule, Google calendar asks me to wait while it loads – presumably from an externally published iCalendar or other public Google calendar source.

This has many implications for the archivist tasked with preserving the records in that Palm Pilot or Google calendar (or any of a laundry list of scheduling applications). This post can do nothing other than list interesting questions at this stage (both ‘this stage’ of my archival education as well as ‘this stage’ of consideration of born digital records in the archival field).

I know that this is just a tiny bite of the kinds of issues being grappled with by Archivists around the world as they begin to accept born digital records into archives. Each type of application (scheduling vs accounting vs business systems) will pose similar issues to those described above – along with special challenges unique to each type. Perhaps if each of the most common classes of applications (such as scheduling) are tackled one by one by a designated team we can save individual archivists the pain of reinventing the wheel. Is this already happening?

#1 Comment By Benjamin Rosenbaum On July 20, 2006 @ 11:13 pm

This isn’t just calendars — it’s a general problem, e.g. for website links, too.

The solution that I hope will eventually be worked out is that everyone will put important content in static, REST-friendly, spiderable — and thus archivable — formats. Things are tending to head that way anyway, I think. And then you’ll be able to find the external content on [16]. And I think browsers need to start incorporating support for archive.org — if a link or XML query (like the ones google calendar is presumably using) comes up 404, the browser should try the archive.org page. In fact there should really be two anchor elements… a href=”http://bla” archiving=”current” and a href=”http://bla” archiving=”snapshot”. The first would try to get the last available page [17], the second would try to bring up the [17] page contemporaneous with the last update of the referring page.

#2 Comment By Walter On July 21, 2006 @ 8:29 am

Nice work, I’ve been using Word Press alot at work lately, we have branched out to using it for things other than just blogs, its easy enough to use that the untrained monkeys (editors) can create content and posts for sites. One of my sites is using it to post podcasts (well only one at this point) [18]. I could go for ages about the arhiving issue, I have to archive a lot of material, most of the time there is only one correct way to do it, if you make it another way (with various changes) you lose something from the orginal content. Can’t get many of my co-workers to understand that.

#3 Comment By Linda On July 24, 2006 @ 9:48 am

Great start for your blog. You’re going for the toughest questions first. We have a journal of one of your college presidents from the 19th Century. It is fascinating and reveals a lot about the students and the college at that time. These days journals, including my own, are blogs or other born digital objects. How will researchers in the 22nd Century know what college life was like?

#4 Comment By Jeanne On July 24, 2006 @ 11:18 am

Ben: I love your idea about some default rollover to the pages on [19]. A new development in that world is the [20]. Probably worth it’s own post – archive-it.org is aiming to provide an archiving solution to ‘subscribers’. For a cost of $10,000 a year archive-it.org will provide on demand archiving of up to 10 million web documents. By default these ‘curated web collections’ are public – but can be made private. Definitely more on this in another post.

#5 Comment By Jeanne On July 24, 2006 @ 11:25 am

Linda: Yes, I suppose I did start with a hard question first.. but then again I think those are the sorts of questions that leave lots of room for brainstorming and pondering. That’s what makes them interesting to me!

I don’t know how folks will know about life in our times – I would love to think that blogs will be easier to archive in the long term than paper diaries were (easier to copy exact replicas of digital records if you bother to do so). I will think on this more (more fuel for a full post).

#6 Comment By Rob Jenson On July 25, 2006 @ 8:14 am

Yup … big questions to wrap one’s brain around. Things are getting complicated by the fact that ordinary people are starting to get used to the Internet being there, but don’t comprehend the fact that the WWW facilitates access, but it is not permanence.

I did not know what [21]
was charging. They are a service of the Internet Archive (IA) … and that should be a big caveat emptor. While IA is doing great things for preserving some parts of the Internet, their policies are very cautious, and they are, IMHO, more of a library than an archives. One big issue I have is with the fact that their WayBack Machine (and probably the underlying data set) is keyed to Domain Names in URLs instead of something more accurate over time (Domain Name + Registrar Entity … for example). Here is the problem … I own [22] … today. In whois, you find the domain registered to my corporate entity. I like IA, so I don’t block their spidering my site with robots.txt, and my web site is archived. Tomorrow, I die — the corporation is dissolved … my non-geek executors let the domain registration lapse, and the domain is bought up by a smart porno promoter who wants to take advantage of the millions of people who hit my site daily to come to his site instead. By default, he sets an IA block in robots.txt, as it only creates overhead on his site — no commercial benefit. The next time that IA spiders that domain, it is removed from their spidering list (good) and, I believe, that the past archived pages become inaccessible or deleted from the IA (not-so-good). I suppose I should raise this with the IA folks … I realize that there are practicalities involved and they are trying to minimize their LCQPH (legal counsel queries per hour), but it seems to me that an unrelated entity buying a domain name should not cause archived records to be deleted.

More later (much more) … I’ve gotta go to work.

#7 Comment By Aes On July 26, 2006 @ 7:21 am

Lovely blog, Jeanne, there’s something very calming about it. I’ve never heard the term “born” digital record. Is it one that was created digitally and exists in no other form?

#8 Comment By Jeanne On July 26, 2006 @ 2:54 pm

Aes: Exactly! A born digital record is a record which was created in the computer. It was ‘born’ as 1s and 0s – there is no analog original (while in some cases an analog version is created after the fact – as is currently done for films created digitally but distributed on analog film).