Menu Close

Category: software

SEO Evaluation of an Archival Website: Looking at UMBC’s Digital Collections

Flickr Commons: Do-it-yourself-womanEach week brings announcements of archives launching new websites. Today both my email and Twitter told me about  University of Maryland, Baltimore County’s new Digital Collections site. Who can resist peeking at new materials available online?

I have spent much of the past year learning the details of Search Engine Optimization. Usually shortened to SEO, this simply refers to the use of techniques which improve the traffic sent to a website via organic search. Want your webpage to show up at the top of the list for a specific search in Google? You want to work on your SEO.

So when I look at new archives website, I can’t help but keep an eye open for how well the site is optimized for search engines.

I hope that UMBC will forgive me for nitpicking their new site. A lot of their choices are great for SEO,  but they also have room for improvement.

Things Done Well for SEO

  • Home Page Title & Description: The site’s home page has a good meta description. This is the text displayed below the link on a search results page – as shown below:UMBC Digital Collection Google Result
  • Unique Page Titles At Collection Level: Each photography collection homepage has a unique page title and a nice block of explanatory text. Google can only read words – so the more unique text on a page, the better the job Google can do in figuring out what your page is about. Example: Ardsley Park Album
  • Good anchor text: (also known as link text) The words used in anchor text tells search engines information about the destination page. For example, the blue text below is anchor text. UMBC Anchor Text Example

Areas for SEO Improvement

  • Unique Page Titles At Item Level: Individual images and documents all use a generic page title such as ‘UMBC | Digital Archive | Document Viewer’. Document Example: Accidental Death of an Anarchist Image Example: 10 year old Bootblack
  • H1 Tags: In the HTML of each page, the dominant heading of the page should use the <h1> tag. This helps Google know the phrase you are targeting with this page. It is your 2nd best place to emphasize your content after the page title. In the case of the item pages, there seems to often be a headline type title at the top of the page – but it currently is not an demarcated with an <h1> tag.
  • Think About Search Results and Indexing: Pages displaying results of internal searches on your site are not likely to be useful as indexed pages in Google. The thinking here is that they can dilute the focus on the item and collection level pages on your site if Google also has many search results pages in the index. If UMBC wanted their search pages to be indexed, then those pages’ URLs should be simplified and the search results pages need a page title that somehow includes the search criteria. There are two ways that I know of to disable this indexing – blocking via the site’s robots.txt file or via a robots meta tag in the header of the search results page. Both of these methods tell obliging search engines to not crawl certain parts of your site.

Final Thoughts

There are plenty of other things that UMBC could do to support this new website. They could create an XML sitemap of all their pages and submit it to Google (maybe they already have). They might re-title some of their pages based on using a tool like Google Insight to see what variations of a phrase is searched on most frequently. My goal here was to give you a taste of the sorts of things that catch my eye. Also, SEO is still more of an art than a science – so you will sometimes notice that what one SEO expert recommends is the opposite of what the next expert would tell you.

In many cases changes, such as the Unique Page Title at the Item Level mentioned above, may not even be possible due to software or programmer resource limitations. The trick is to take advantage of every option that is available. There are also trade-offs to be made. UMBC’s site provides some very slick interfaces for viewing the details of a group of documents, such as theater programs and other materials related to a theatrical production. The imlementation elegantly handles the situation of multiple scanned images which relate to a coherent set of documents. Sometimes you can’t have both your innovative UI and perfect SEO. Then it gets down to what your goals are for your website. Are you trying to make a specific community of existing users happy by providing them with tools they can use? Or does your mission focus more on reaching out to a broader audience?

There is no silver bullet to search engine optimization. It just takes knowledge of the available tools and techniques combined with a willingness to keep learning and experimenting. Like the ‘Do-It-Yourself-Woman‘ pictured above in the Nationaal Archief‘s photo I found out on the Flickr Commons, you too can learn the basics and do-it-yourself. A great starting point is Google’s free SEO Guide. Also, please remember that the best time to plan your SEO strategy is before you have built your site in the first place!

I would love to do research on how much progress archives websites can make in their organic search traffic after SEO improvements. My thinking is to take a snapshot of a month of analytics (the statistics that tell you how many people are visiting your website) and then apply some SEO inspired changes. After a suitable delay (it takes some time for SEO to do its job) we consider another month of analytics to determine any change in organic traffic.

Do you want me to do a quick review of your archives website to see if there is room for SEO improvement? Please contact me or add a comment to this post. I feel like there is a conference presentation in all this if we can find a good set of websites to optimize.

Finally, thank you to unsuspecting UMBC – your new website really is beautiful.

Image credit: Doe-het-zelf vrouw /Do-it-yourself-woman from Nationaal Archief on Flickr Commons.

A History of Our Own, Representing Communities and Identities on the Web (SAA09: Session 202)

LOC Flickr Commons: Sylvia Sweets Tea RoomAndrew Flinn, University College London (UCL), was the second speaker during SAA09’s Session 202 with his presentation ‘A History of Our Own, Representing Communities and Identities on the Web’. Flinn began with the idea that archives are “a place for creating and re-working memory”. While independent community archives are constituted around many purposes, Flinn’s main interest is in communities focused on absences and mis-representation of a group or event in history. Communities in which there is a cultural, politcal, or artistic activism. Some of these communities may be considered ‘movements’.

How should/can archivists support local archiving activities?

Part of the challenge of online communities is the need to capture the interactions in order to not loose the full picture. The National Listing of Community Archives in the UK‘s website states that they “seek to document the history of all manner of local, occupations, ethnic, faith and other diverse communities”.

The UCL’s International Centre for Archives and Records Management Research and User Studies (ICARUS) “brings together researchers in user access and description, community archives and identity, concepts and contexts of records and archives, and information policy”. Flinn is the Principal Investigator on the ICARUS project Community archives and identities which focuses on in depth interviews of 4 institutions which are “documenting and sustaining community heritage”.

These are some example online community sites:

Main Findings

  • proceed from a position that ‘knowing your own history’ is beneficial their communities as well as to the public at large
  • the quality of the work is done by individual passion and sacrifice, voluntary
  • there is ambivalence to/about the mainstream archives sector — keen to work with mainstream archives, but scarred by past bad experiences
  • good practices now could lead to partnerships in the future
  • these are living archives — not static.. still alive and growing
  • these ideas prompt re-evaluation of conventional archives thinking
  • lots of access to digital objects – perhaps movement to online existence

We need to understand that these communities evolve and are fluid. They have as broad variety of structures, sizes and methods of working. What are the patterns in participation & ownership?

The site urban 75 has hosted extended discussions about recent UK history. Efforts include identification of places and people in uploaded photos. The site connects people about issues about housing and local services – it is very practical but it also has evolved to include this historical documentation. One example post from the Brixton Forum shows a discussion about an Old shop front revealed on Atlantic Road.

A Short Aside

Next Flinn apologized for taking his talk slightly off script. Setting his papers aside, he spoke to the audience about the eXHulme website which he had discovered the evening before while finishing his presentation. Having lived in Hulme, Manchester himself, he felt a great impact from looking through the site. He spent 4 hours looking at it – including photos such as the travellers living in their buses parked – otteburn close 1996 seen at the bottom of this page. His discovery and exploration of this site gave him a greater personal understanding of the impact of these types of community documentation projects. I felt he would have been happy to keep talking about this site and the directions it had sent his thoughts — but he then got back to his papers and continued.

Building Community Online

Interactions online are the historic record of the community itself. Archives evolve and change as the community builds and edits their online content. These heritage and archive sites work to shift from the idea of visitors to engaging users in interaction — they need users of the website to feel part of the community.

Examples of sites building community online:

How do you successfully encourage participation (rather than large number of passive observers) which is crucial to the success of these types of initiatives? Lurking without contributing is easy – even if joining requires action. The rate of uptake may correspond with the sense of ownership. Heritage projects might encourage and sustain such participation. See Elisa Giaccardi & Leysia Palen’s article  – The Social Production of Heritage through Cross-media Interaction: Making Place for Place-making.

Suggestions

  • encourage conversation and treat all stories as having value – value every account
  • promote a sense of ownership once a story has been shared
  • allow for multiple ways to engage with and share content and memories
  • recognize and let users shift from observer to active member

Flinn’s Conclusions

  • What are the challenges and perils facing community archives? Lack of resources. People are doing these things in unsustainable ways
  • Why should we sustain independent community archives? Benefit to individuals, communities and broader society.
  • What can professional archivists do? Support and partnership with groups seeking this sort of partnership.

My Thoughts

The image I included above is from the Library of Congress’s Flickr Commons project. If you read through the comments on this photo you can see a diverse group of individuals come together to document the history of Sylvia Sweets Tea Room. This is just another example of the process of documentation being as interesting as the original image itself.

There is still so much to learn in the arena of building productive online communities. Archivists working through how to archive what online communities create will need to understand how the process of creation is documented via various software tools. As the techniques for encouraging participation evolve – archivists will need to evolve right along with them. I think it is interesting to envision archivists working in this space and supporting these types of communities — becoming as much the champions of the community itself as preservers of a community’s collaborative creations.

Image Credit: Flickr Commons Library of Congress: Sylvia Sweets Tea Room, corner of School and Main streets, Brockton, Mass

As is the case with all my session summaries from SAA2009, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Archivists and New Technology: When Do The Records Matter?

Navigating the rapidly changing landscape of new technology is a major challenge for archivists. As quickly as new technologies come to market, people adopt them and use them to generate records. Businesses, non-profits and academic institutions constantly strive to find ways to be more efficient and to cut their budgets. New technology often offers the promise of cost reductions. In this age of constantly evolving software and technological innovation, how do archivists know when a new technology is important or established enough to take note of? When do the records generated by the latest and greatest technology matter enough to save?

Below I have include two diagrams that seek to illustrate the process of adopting new technology. I think they are both useful in aiding our thinking on this topic.

The first is the “Hype Cycle“, as proposed by analyst Jackie Fenn at Gartner Group. It breaks down the phases that new technologies move through as they progress from their initial concept through to broad acceptance in the marketplace. The generic version of the Hype Cycle diagram below is from the Wikipedia entry on hype cycle.

Gartner Hype Cycle (Wikipedia)

Each summer, Gartner comes out with a new update on Where Are We In The Hype Cycle?. Last summer, microblogging was just entering the ‘Peak of Inflated Expectations’, public virtual worlds were sliding down into the ‘Trough of Disillusionment’ and location aware applications were climbing back up the ‘Slope of Enlightenment’. There is even a book about it: Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time.

The other diagram is the Technology Adoption Lifecycle from Geoffrey Moore’s Crossing the Chasm. This perspective on the technology cycle is from the perspective of bringing new technology to market. How do you cross the chasm between early adopters and the general population?

Technology Adoption Lifecycle (Wikipedia)

Archivists need to consider new technology from two different perspectives. When to use it to further their own goals as archivists and when to address the need to preserve records being generated by new technology. A fair bit of attention has been focused on figuring out how to get archivists up to speed on new web technology. In August 2008, ArchivesNext posted about hunting for Web 2.0 related sessions at SAA2008 and Friends Told Me I Needed A Blog posted about SAA and the Hype Cycle shortly thereafter.

But how do we know when a technology is ‘important enough’ to start worrying about the records it generates? Do we focus our energy on technology that has crossed the chasm and been adopted by the ‘early majority’? Do we watch for signs of adoption by our target record creators?

I expect that the answer (such as there can be one answer!) will be community specific. As I learned in the 2007 SAA session about preserving digital records of the design community, waiting for a single clear technology or software leader to appear can lead to lost or inaccessible records. Archivists working with similar records already come together to support one another through round tables, mailing lists and conference sessions. I have noticed that I often find the most interesting presentations are those that discuss the challenges a specific user community is facing in preserving their digital records. The 2008 SAA session about hybrid analog/digital literary collections discussed issues related to digital records from authors. Those who worry about records captured in geographic information systems (GIS) were trying to sort out how to define a single GIS electronic record when last I dipped my toes into their corner of the world in the Fall of 2006.

It is not feasible to imagine archivists staying ahead of every new type of technology and attempting to design a method for archiving every possible type of digital records being created. What we can do is make it a priority for a designated archivist within every ‘vertical’ community (government, literary, architecture… etc) to keep their ear to the ground about the use of technology within that community. This could be a community of practice of its own. A group that shares info about the latest trends they are seeing while sharing their best practices for handling the latest types of records being seen.

The good news is that archivists aren’t the only ones who want to be able to preserve access to born digital records. Consider Twitter, which only provides easy access to recent tweets. A whole raft of third-party tools built to archive data from Twitter are already out there, answering the demand for a way to backup people’s tweets.

I don’t think archivists always have the luxury of waiting for technology to be adopted by the majority of people and to reach the ‘Plateau of Productivity’. If you are an archivist who works with a community  that uses cutting edge technology, you owe it to your community to stay in the loop with how they do their work now. Just because most people don’t use a specific technology doesn’t mean that an individual community won’t pick it up and use to the exclusion of more common tools.

The design community mentioned above spoke of working with those creating the tools for their community to ensure easy archiving down the line. In our fast paced world of innovation, a subset of archivists need to stay involved with the current business practices of each vertical being archived. This group can work together to identify challenges, brainstorm solutions, build relationships with the technology communities and then disseminate best practices throughout the archives community. I did find a web page for the SAA’s Technology Best Practices Task Force and its document Managing Electronic Records and Assets: A Working Bibliography, but I think that I am imagining something more ongoing, more nimble and more tied into each of the major communities that archivists must support. Am I describing something that already exists?

SAA2009: Building, Managing and Participating in Online Communities

SAA 2009: Sustainable Archives AUSTIN 09It is official – the panel I proposed for SAA 2009 (aka, Sustainable Archives: AUSTIN 2009) was accepted!

Title: Building, Managing and Participating in Online Communities: Avoiding Culture Shock Online

Abstract: As more archival materials move online, archivists must become adept at participating in and managing online communities. This session will discuss real world experiences of this involvement, including putting images into the Flickr Commons and links to archival materials in Wikipedia, as well as guidelines on cultural norms within online communities. We will also discuss choosing between building new communities from scratch vs joining a broader, existing community (such as the Flickr Commons).

I will be serving as session chair and moderator for our group of fabulous panelists (finances and travel plans permitting):

The intention is for this session to begin with very brief presentations showing off the current projects at our panelists’ institutions and follow that up with lots of time for discussion and answering of questions.

We see our target audience as archivists who want to hear about real world experiences of working within existing online communities (such as Wikipedia or Flickr) and building new communities dedicated to cultural heritage materials. The session will target individuals with less experience with Web 2.0 and social media implementations, but the lessons learned should also be of interest to those already in the implementation stages of their own projects.

I will put out a call for questions as we get closer to the conference so that our group can get an idea of what people are interested in learning about specifically, so start making notes now. Hope to see you in Austin!

Susa 2.0: Max Evans’ Finding Aid Prototype

Susa Young GatesAs part of his portion of our SAA 2008 panel in San Francisco, Max Evans demonstrated his prototype for a new way to view an EAD finding aid. You can download his presentation from the SAA’s site: Finding Aids for the 21st Century: The Next Evolution.

Max’s prototype of Susa 2.0 is now online! He asked that I make sure you know it works best (showing all the intended mouse over text for links) with Internet Explorer version 6.0. The prototype presents the finding aid of the Susa Young Gates Papers from the Utah State Historical Society. His design tackles the major issues that plague large finding aids normally displayed in traditional single page layouts. Anyone who has looked at a large finding aid online has had the experience of being scrolled down somewhere in the middle and realizing they have no idea what they are looking at. What folder is this item in? What box is this folder in? Am I reading through a list of letters from 1950 or are these the ones from 1970?

Context is hard to communicate when you are dealing with long lists of folders that stretch longer than the length of the screen. Max’s design uses a three column approach to provide context from left to right. His design also gives users a way to look at the full list of either items or folders, independent of their originating containers – each list then sortable in three different ways: ‘as arranged’, alphabetically or by date. I love this page which shows how a scanned document might be displayed within the proper context of the collection – in this case, page 2 of document 1 of the General Correspondence from 1886-1909. All of these ideas get at the heart of giving researchers more control over how to tackle the records in a collection while making sure that they don’t loose the tools that ordered documents in a folder would provide them in the research room.

His prototype takes a step beyond just changing how the finding aid itself is presented – but also considers how the work flow of a researcher can be improved while also simplifying the record request processes. The prototype gives the patron the option to request the scanning of specific folders or items. They can also add records to their ‘research cart’ to either request the proper boxes be retrieved or to store the records in a personal research area within the archives website – both possibilities sound useful to me.

Max’s prototype is such a great example of rethinking how people are expected to work with archival records within the confines of the information we already have available in finding aids as they exist today. I highly recommend you give Susa 2.0 a look. It is a testament to Max’s incredible patience that he was able to create this prototype using over 200 separate HTML files – but it also sets the bar high for what we could be doing with our interface design!

NEH Digital Humanities Startup Grant News: Visualizing Archival Collections

archivesz ng

As of August 22nd, 2008 it was official. There is even a blog post over on the NEH Office of Digital Humanities updates page to prove it. The University of Maryland was granted a Level I NEH Digital Humanities Startup Grant to fund work on the ‘Visualizing Archival Collections’ project. The official one liner is that the project will support “The development of visualization tools for assessing information contained in electronic archival finding aids created with Encoded Archival Description (EAD)”. Why did I wait so long to announce this on the blog? I wanted to have something fun to announce at the end of my SAA presentation out in San Francisco!

The project director is Dr. Jennifer Golbeck. I also have the support of University of Maryland’s Jennie Levine, Dr. Bruce Ambacher, and Dr. Doug Oard. This amazing set collaborators should help me stay on the right track and make sure I keep the sometimes competing issues relating to archives, information retrieval and interface design in balance.

I will be collecting EAD encoded finding aids over the next few months. My goal is to gather a broad sample of English language finding aids from a wide range of institutions and work on the script that extracts this data into a database. Once we have the data extracted I get to look at what we have, do some data cleanup and start thinking about what sorts of visualizations might work with our real world data. During the spring term we will design and build a 2nd generation prototype of ArchivesZ.

Want your data to be part of this? If you would like to contribute EAD finding aids in XML format to the project, please send me the following information:

  1. Archives Name
  2. Archives Parent Institution (if applicable)
  3. Archives Location
  4. Contact at Archives for questions about the finding aids (name, email and phone number)
  5. Estimate of # of finding aids being offered
  6. Controlled Vocabulary or Thesaurus used for Subject values (as many as are used)
  7. Method of finding aid delivery (sending me a zip file? pointing me at a directory online? some other way?)
  8. Do I have your permission to post a discussion of the data issues I may find in your finding aids here on Spellbound Blog? (Please see the OSU Archives post as an example of they types of issues I discuss)

You can either put this into the form on my Contact Page or send email directly to jeanne AT spellboundblog dot com.

Thank you to everyone for their enthusiasm about the ArchivesZ project. It is very exciting to have the opportunity to take all these shiny ideas to the next level.

SAA2008: Preservation and Experimentation with Analog/Digital Hybrid Literary Collections (Session 203)

floppy disks

The official title of Session 203 was Getting Our Hands Dirty (and Liking It): Case Studies in Archiving Digital Manuscripts. The session chair, Catherine Stollar Peters from the New York State Archives and Records Administration, opened the session with a high level discussion of the “Theoretical Foundations of Archiving Digital Manuscripts”. The focus of this panel was preserving hybrid collections of born digital and paper based literary records. The goal was to review new ways to apply archival techniques to digital records. The presenters were all archivists without IT backgrounds who are building on others work … and experimenting. She also mentioned that this also impacts researchers, historians, and journalists.For each of the presenters, I have listed below the top challenges and recommendations. If you attended the sessions, you can skip forward to my thoughts.

Norman Mailer’s Electronic Records

Challenges & Questions:

  • 3 laptops and nearly 400 disks of correspondence
  • While the letters might have been dictated or drafted by Mailer, all the typing, organization and revisions done on the computer were done by his assistant Judith McNally. This brings into question issues of who should be identified as the record creator. How do they represent the interaction between Mailer & McNally? Who is the creator? Co-Creators?
  • All the laptops and disks were held by Judith McNally. When she died all of her possessions were seized by county officials. All the disks from her apartment were eventually recovered over a year later – but it causes issues of provenance. There is no way to know who might have viewed/changed the records.

Revelations and Recommendations:

What is accessioning and processing when dealing with electronic records? What needs to be done?

  • gain custody
  • gather information about creator’s (or creators’) use of the electronic records. In March 2007 they interviewed Mailer to understand the process of how they worked together. They learned that the computers were entirely McNally’s domain.
  • number disks, computers (given letters), other digital media
  • create disk catalog – to reflect physical information of the disk. Include color of ink.. underlining..etc. At this point the disk has never been put into a computer. This captures visual & spacial information
  • gather this info from each disk: file types, directory structure & file names

The ideal for future collections of this type is archivist involvement earlier – the earlier the better.

Papers of Peter Ganick

  • Speaker: Melissa Watterworth
  • Featured Collection: Papers of Writer and Small Press Publisher Peter Ganick, Thomas J Dodd Research Center, University of Connecticut

Challenges & Questions:

  • What are the primary sources of our modern world?
  • How do we acquire and preserve born digital records as trusted custodians?
  • How do we preserve participatory media – maybe we can learn from those who work on performance art?
  • How do we incrementally build our collections of electronic records? Should we be preserving the tools?
  • Timing of acquisition: How actively should we be pursuing personal archives? How can we build trust with creators and get them to understand the challenges?
  • Personal papers are very contextual – order matters. Does this hold true for born digital personal archives? What does the networking aspect of electronic records mean – how does it impact the idea of order?
  • First attempt to accession one of Peter Ganick’s laptops and the archivist found nothing she could identify as files.. she found fragments of text – hypertext work and lots of files that had questionable provenance (downloaded from a mailing list? his creations?). She had to sit down next to him and learn about how he worked.
  • He didn’t understand at first what her challenges were. He could get his head around the idea of metadata and issues of authenticity. He had trouble understanding what she was trying to collect.
  • How do we arrange and keep context in an online environment?
  • Biggest tech challenge: are we holding on for too long to ideas of original order and context?
  • Is there a greater challenge in collecting earlier in the cycle? What if the creator puts restrictions on groupings or chooses to withdraw them?
  • Do we want to create contracts with donors? Is that practical?

Revelations and Recommendations:

  • Collect materials that had high value as born digital works but were at a high risk of loss.
  • Build infrastructure to support preservation of born digital records.
  • Go back to the record creator to learn more about his creative process. They used to acquire records from Ganick every few years.. that wasn’t frequent enough. He was changing the tools he used and how he worked very quickly. She made sure to communicate that the past 30 years of policy wasn’t going to work anymore. It was going to have to evolve.
  • Created a ‘submission agreement’ about what kinds of records should be sent to the archive. He submitted them in groupings that made sense to him. She reviewed the records to make sure she understood what she was getting.
  • Considering using PDFa to capture snapshot of virtual texts.
  • Looked to model of ‘self archiving’ – common in the world of professors to do ongoing accruals.
  • What about ’embedded archivists’? There is a history of this in the performing arts and NGOs and it might be happening more and more.

George Whitmore Papers

Challenges & Questions:

  • How do you establish identity in a way that is complete and uncorrupted? How do you know it is authentic? How do you make an authentic copy? Are these requirements as unreasonable and unachievable?

Revelations and Recommendations:

  • Refresh and replicate files on a regular schedule.
  • They have had good success using Quick View Plus to enable access to many common file formats. On the downside, it doesn’t support everything and since it is proprietary software there are no long term guarantees.
  • In some cases they had to send CP/M files to a 3rd party to have them converted into WordStar and have the ascii normalized.
  • Varied acquisition notes.. and accession records.. loan form with the 3rd party who did the conversion that summarized the request.. they did NOT provide information about what software was used to convert from CP/M to DOS. This would be good information to capture in the future.
  • Proposed an expansion of the standards to include how electronic records were migrated in the <processinfo> processing notes.

Questions & Answers

Question: As part of a writers community, what do we tell people who want to know what they can DO about their records. They want technical information.. they want to know what to keep. Current writers are aware they are creating their legacy.

Answer: Michael: The single best resource is the interPARES 2 Creator Guidelines. The Beineke has adapted them to distrubute to authors. Melissa: Go back to your collection development policies and make sure to include functions you are trying to document (like process.. distribution networks). Also communities of practice (acid free bits) are talking about formats and guidelines like that Gabriela: People often want to address ‘value’. Right now we don’t know how to evaluate the value of electronic drafts – it is up to authors.

Question: Cal Lee: Not a question so much as an idea: the world of digital forensics and security and the ‘order of volatility’ dictate that everyone should always be making a full disk copy bit by bit before doing anything else.

Comment: Comment on digital forensic tools – there is lots of historical and editing history of documents in the software… also delete files are still there.

Question: Have you seen examples of materials that are coming into the archive where the digital materials are working drafts for a final paper version? This is in contrast to others are electronic experiments.

Answer: Yes, they do think about this. It can effect arrangement and how the records are described. The formats also impact how things are preserved.

Question: Access issues? Are you letting people link to them from the finding aids? How are the documents authenticity protected.

Answer: DSpace gives you a new version anytime you want it (the original bitstream) .. lots of cross linking supports people finding things from more than one path. In some cases documents (even electronic) can only be accessed from within the on site reading room.

Question: What is your relationship is like with your IT folks?

Answer: Gabriela: Our staff has been very helpful. We use ‘legacy’ machines to access our content. They build us computers. They are also not archivists, so there is a little divide about priorities and the kind of information that I am interested in.. but it has been a very productive conversation.

Question: (For Melissa) Why didn’t you accept Peter’s email (Melissa had said they refused a submission of email from Peter because it didn’t have research value)?

Answer: The emails that included personal medical emails were rejected. The agreement with Peter didn’t include an option to selectively accept (or weed) what was given.

Question: In terms of gathering information from the creators.. do you recommend a formal/recorded interview? Or a more informal arrangement in which you can contact them anytime on an ongoing basis?

Answer: Melissa: We do have more formal methods – ‘documentation study’ style approaches. We might do literature reviews.. Ultimately the submission agreement is the most formal document we have. Gabriela: It depends on what the author is open to.. formal documentation is best.. but if they aren’t willing to be recorded, then you take what you can get!

My Thoughts

I am very curious to see how best practices evolve in this arena. I wonder how stories written using something like Google Documents, which auto-saves and preserves all versions for future examination, will impact how scholars choose to evaluate the evolution of documents. There have already been interesting examinations of the evolution of collaborative documents. Consider this visual overview of the updates to the Wikipedia entry for Sarah Palin created by Dan Cohen and discussed in his blog post Sarah Palin, Crowdsourced. Another great example of this type of visual experience of a document being modified was linked to in the comments of that post: Heavy Metal Umlaut: The Movie. If you haven’t seen this before – take a few minutes to click through and watch the screencast which actually lets you watch as a Wikipedia page is modified over time.

While I can imagine that there will be many things to sort out if we try to start keeping these incredibly frequent snapshot save logs (disk space? quantity of versions? authenticity? author preferences to protect the unpolished versions of their work?) – I still think that being able to watch the creative process this way will still be valuable in some situations. I also believe that over time new tools will be created to automate the generation of document evolution visualization and movies (like the two I link to above) that make it easy for researchers to harness this sort of information.

Perhaps there will be ways for archivists to keep only certain parts of the auto-save versioning. I can imagine an author who does not want anyone to see early drafts of their writing (as is apparently also the case with architects and early drafts of their designs) – but who might be willing for the frequency of updates to be stored. This would let researchers at least understand the rhythm of the writing – if not the low level details of what was being changed.

I love the photo I found for the top of this post. I admit to still having stacks of 3 1/2 floppy disks. I have email from the early days of BITNET.  I have poems, unfinished stories, old resumes and SQL scripts. For the moment my disks live in a box on the shelf labeled ‘Old Media’. Lucky me – I at least still have a computer with a floppy drive that can read them!

Image Credit: oh messy disks by Blude via flickr.

As is the case with all my session summaries from SAA2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.