Menu Close

Year: 2011

Digitization Program Site Visit: University of Maryland

I recently had the opportunity to visit with staff of the University of Maryland, College Park’s Digital Collections digitization program along with a group of my colleagues from the World Bank. This is a report on that site visit. It is my hope that these details can help others planning digitization projects – much as it is informing our own internal planning.

Date of Visit: October 13, 2011

Destination: University of Maryland, Digital Collections

University of Maryland Hosts:

Summary:  This visit was two hours in length and consisted of a one hour presentation and Q&A session with Jennie Levine Knies, Manager of Digital Collections followed by a one hour tour and Q&A session with Alexandra Carter, Digital Imaging Librarian.

Background: The Digital Collections of the University of Maryland was launched in 2006 using Fedora Commons. It is distinct from the ‘Digital Repository at the University of Maryland’, aka DRUM, which is built on DSpace. DRUM contains faculty-deposited documents, a library-managed collection of UMD theses and dissertations, and collections of technical reports. The Digital Collections project focuses on digitization of photographs, postcards, manuscripts & correspondence – mostly based on patron demand. In addition, materials are selected for digitization based on the need for thematic collections to support events, such as their recent civil war exhibition.

After a period of full funding, there has been a fall off in funding which has prevented any additional changes to the Fedora system.

Another project at UMD involves digitization of Japanese childrens’ books (George W. Prange Collection) and currently uses “in house outsourcing”. In this scenario, contractors bring all their equipment and staff on site to perform the digitization process.

Standard Procedures:

  • Requests must be made using a combination of the ‘Digital Request Cover Sheet’ and ‘Digital Surrogate Request Sheet. These sheets are then reviewed for completeness by the curator under whose jurisdiction the collection falls. Space on the request forms is provided so that the curator may add additional notes to aid in the digitization process. They decide if it is worth digitizing an entire folder when only specific item(s) are requested. Standard policy is to aim for two week turnaround for digitization based on patron request.
  • The digital request is given a code name for easy reference. They choose these names alphabetically.
  • Staff are assigned to digitize materials. This work is often done by student workers using one of three Epson 10000 XL flatbed scanners. There is also a Zeutschel OS 12000 overhead scanner available for materials which cannot be handled by the flatbed scanners.
  • Alexandra reviews all scans for quality.
  • Metadata is reviewed by another individual.
  • When both the metadata & image quality has been reviewed, materials are published online.

Improvements/Changes they wish for:

  • Easier way to create a web ‘home’ for collections, currently many do not have a main page and creating one requires the involvement of the IT department.
  • Option for users to save images being viewed
  • Option to upload content to their website in PDF format
  • Way to associate transcriptions with individual pages
  • More granularity for workflow: currently the only status they have to indicate that a folder or item is ready for review is ‘Pending’. Since there are multiple quality control activities that must be performed by different staff, currently they must make manual lists to track what phases of QA are complete for which digitized content.
  • Reduce data entry.
  • Support for description at both the folder and item level at the same time. Currently description is only permitted either at the folder level OR at the item level.
  • Enable search and sorting by date added to system. This data is captured, but not exposed.

Lessons Learned:

  • Should have adopted an existing metadata standard rather than creating their own.
  • People do not use the ‘browse terms’ – do not spend a lot of time working on this

Resources:

Image Credit: Women students in a green house during a Horticulture class at the University of Maryland, 1925. University Archives, Special Collections, University of Maryland Libraries

Day of Digital Archives

To be honest, today was a half day of digital archives, due to personal plans taking me away from computers this afternoon. In light of that, my post is more accurately my ‘week of digital archives’.

The highlight of my digital archives week was the discovery of the Digital Curation Exchange. I promptly joined and began to explore their ‘space for all things ‘digital curation’ ‘. This led me to a fabulous list of resources, including a set of syllabi for courses related to digital curation. Each link brought me to an extensive reading list, some with full slide decks related to weekly in classroom presentations. My ‘to read’ list has gotten much longer – but in a good way!

On other days recently I have found myself involved in all of the following:

  • review of metadata standards for digital objects
  • creation of internal guidelines and requirements documents
  • networking with those at other institutions to help coordinate site visits of other digitization projects
  • records management planning and reviews
  • learning about the OCR software available to our organization
  • contemplation of the web archiving efforts of organizations and governments around the world
  • reviewing my organization’s social media policies
  • listening to the audio of online training available from PLANETS (Preservation and Long-term Access through NETworked Services)
  • contemplation of the new Journal of Digital Media Management and their recent call for articles

My new favorite quote related to digital preservation comes from What we reckon about keeping digital archives: High level principles guiding State Records’ approach from the State Records folks in New South Wales Australia, which reads:

We will keep the Robert De Niro principle in mind when adopting any software or hardware solutions: “You want to be makin moves on the street, have no attachments, allow nothing to be in your life that you cannot walk out on in 30 seconds flat if you spot the heat around the corner” (Heat, 1995)

In other words, our digital archives technology will be designed to be sustainable given our limited resources so it will be flexible and scalable to allow us to utilise the most appropriate tools at a given time to carry out actions such as creation of preservation or access copies or monitoring of repository contents, but replace these tools with new ones easily and with minimal cost and with minimal impact.

I like that this speaks to the fact that no plan can perfectly accommodate the changes in technology coming down the line. Being nimble and assuming that change will be the only constant are key to ensuring access to our digital assets in the future.

SXSW Panel Proposal – Archival Records Online: Context is King

I have a panel up for evaluation on the SXSW Interactive Panel Picker titled Archival Records Online: Context is King. The evaluation process for SXSW panels is based on a combination of staff choice, advisory board recommendations and public votes. As you can see from the pie chart shown here (thank you SXSW website for the great graphic), 30% of the selection criteria is based on public votes. That is where you come in. Voting is open through 11:59 pm Central Daylight Time on Friday, September 2. To vote in favor of my panel, all you need to do is create a free account over on SXSW Panel Picker and then find Archival Records Online: Context is King and give it a big thumbs up.

If my panel is selected, I intend this session to give me the chance to review all of the following:

  1. What are the special design requirements of archival records?
  2. What are the biggest challenges to publishing archival records online?
  3. How can archivists, designers and developers collaborate to build successful web sites?
  4. Why is metadata important?
  5. How can search engine optimization (SEO) inform the design process?

All of this ties into what I have been pondering, writing about and researching for the past few years related to getting archival records online. So many people are doing such amazing work in this space. I want to show off the best of the best and give attendees some takeaways to help them build websites that make it easy to see the context of anything they find in their search.

While archival records have a very particular dependence on the effective communication of context – I also think that this is a lesson that can improve interface design across the board. These are issues that UI and IA folks are always going to be worrying about. SXSW is such a great opportunity for cross pollination. Conferences outside the normal archives, records management and library conference circuit give us a chance to bring fresh eyes and attention to the work being done in our corner of the world.

If you like the idea of this session, please take a few minutes to go sign up at the SXSW Panel Picker and give Archival Records Online: Context is King a thumbs up. You don’t need to be planning to attend in order to cast your vote, though after you start reading through all the great panel ideas you might change your mind!

Rescuing 5.25″ Floppy Disks from Oblivion

This post is a careful log of how I rescued data trapped on 5 1/4″ floppy disks, some dating back to 1984 (including those pictured here). While I have tried to make this detailed enough to help anyone who needs to try this, you will likely have more success if you are comfortable installing and configuring hardware and software.

I will break this down into a number of phases:

  • Phase 1: Hardware
  • Phase 2: Pull the data off the disk
  • Phase 3: Extract the files from the disk image
  • Phase 4: Migrate or Emulate

Phase 1: Hardware

Before you do anything else, you actually need a 5.25″ floppy drive of some kind connected to your computer.  I was lucky – a friend had a floppy drive for us to work with. If you aren’t that lucky, you can generally find them on eBay for around $25 (sometimes less). A friend had been helping me by trying to connect the drive to my existing PC – but we could never get the communications working properly. Finally I found Device Side Data’s 5.25″ Floppy Drive Controller which they sell online for $55. What you are purchasing will connect your 5.25 Floppy Drive to a USB 2.0 or USB 1.1 port. It comes with drivers for connection to Windows, Mac and Linux systems.

If you don’t want to mess around with installing the disk drive into our computer, you can also purchase an external drive enclosure and a tabletop power supply. Remember, you still need the USB controller too.

Update: I just found a fantastic step-by-step guide to the hardware installation of Device Side’s drive controller from the Maryland Institute for Technology in the Humanities (MITH), including tons of photographs, which should help you get the hardware install portion done right.

Phase 2: Pull the data off the disk

The next step, once you have everything installed, is to extract the bits (all those ones and zeroes) off those floppies. I found that creating a new folder for each disk I was extracting made things easier. In each folder I store the disk image, a copy of the extracted original files and a folder named ‘converted’ in which to store migrated versions of the files.

Device Side provides software they call ‘Disk Image and Browse’. You can see an assortment of screenshots of this software on their website, but this is what I see after putting a floppy in my drive and launching USB Floppy -> Disk Image and Browse:

You will need to select the ‘Disk Type’ and indicate the destination in which to create your disk image. Make sure you create the destination directory before you click on the ‘Capture Disk File Image’ button. This is what it may look like in progress:

Fair warning that this won’t always work. At least the developers of the software that comes with Device Side Data’s controller had a sense of humor. This is what I saw when one of my disk reads didn’t work 100%:

If you are pressed for time and have many disks to work your way through, you can stop here and repeat this step for all the disks you have on hand.

Phase 3: Extract the files from the disk image

Now that you have a disk image of your floppy, how do you interact with it? For this step I used a free tool called Virtual Floppy Drive. After I got this installed properly, when my disk image appeared, it was tied to this program. Double clicking on the Floppy Image icon opens the floppy in a view like the one shown below:

It looks like any other removable disk drive. Now you can copy any or all of the files to anywhere you like.

Phase 4: Migrate or Emulate

The last step is finding a way to open your files. Your choice for this phase will depend on the file formats of the files you have rescued. My files were almost all WordStar word processing documents. I found a list of tools for converting WordStar files to other formats.

The best one I found was HABit version 3.

It converts Wordstar files into text or html and even keeps the spacing reasonably well if you choose that option. If you are interested in the content more than the layout, then not retaining spacing will be the better choice because it will not put artificial spaces in the middle of sentences to preserve indentation. In a perfect world I think I would capture it both with layout and without.

Summary

So my rhythm of working with the floppies after I had all the hardware and software installed was as follows:

  • create a new folder for each disk, with an empty ‘converted’ folder within it
  • insert floppy into the drive
  • run DeviceSide’s Disk Image and Browse software (found on my PC running Windows under Start -> Programs -> USB Flopy)
  • paste the full path of the destination folder
  • name the disk image
  • click ‘Capture Disk Image’
  • double click on the disk image and view the files via vfd (virtual floppy drive)
  • copy all files into the folder for that disk
  • convert files to a stable format (I was going from WordStar to ASCII text) and save the files in the ‘converted’ folder

These are the detailed instructions I tried to find when I started my own data rescue project. I hope this helps you rescue files currently trapped on 5 1/4″ floppies. Please let me know if you have any questions about what I have posted here.

Update: Another great source of information is Archive Team’s wiki page on Rescuing Floppy Disks.

Career Update


I have some lovely news to share! In early July, I will join the Library and Archives of Development at the World Bank as an Electronic Records Archivist. This is a very exciting step for me. Since the completion of my MLS back in 2009, I have mostly focused on work related to metadata, taxonomies, search engine optimization (SEO) and web content management systems. With this new position, I will finally have the opportunity to put my focus on archival issues full time while still keeping my hands in technology and software.

I do have a request for all of you out there in the blogosphere: If you had to recommend a favorite book or journal article published in the past few years on the topic of electronic records, what would it be? Pointers to favorite reading lists are also very welcome.

SXSWi: You’re Dead, Your Data Isn’t: What Happens Now?

This five person panel at SXSW Interactive 2011 tackled a broad range of issues related to what happens to our online presence, assets, creations and identity after our death.

Presenters:

There was a lot to take in here. You can listen to the full audio of the session or watch a recording of the session’s live stream (the first few minutes of the stream lacks audio).

A quick and easy place to start is this lovely little video created as part of the promotion of Your Digital Afterlife – it gives a nice quick overview of the topic:

Also take a look at the Visual Map that was drawn by Ryan Robinson during the session – it is amazing! Rather than attempt to recap the entire session, I am going to just highlight the bits that most caught my attention:

Laws, Policies and Planning
Currently individuals are left reading the fine print and hunting for service specific policies regarding access to digital content after the death of the original account holder. Oklahoma recently passed a law that permits estate executors to access the online accounts of the recently deceased – the first and only state in the US to have such a law. It was pointed out during the session that in all other states, leaving your passwords to your loved ones is you asking them to impersonate you after your death.

Facebook has an online form to report a deceased person’s account – but little indication of what this action will do to the account. Google’s policy for accessing a deceased person’s email requires six steps, including mailing paper documents to Mountain View, CA.

There is a working group forming to create model terms of service – you can add your name to the list of those interested in joining at the bottom of this page.

What Does Ownership Mean?
What is the status of an individual email or digital photo? Is it private property? I don’t recall who mentioned it – but I love the notion of a tribe or family unit owning digital content. It makes sense to me that the digital model parallel the real world. When my family buys a new music CD, our family owns it – not the individual who happened to go to the store that day. It makes sense that an MP3 purchased by any member of my family would belong to our family. I want to be able to buy a Kindle for my family and know that my son can inherit my collection of e-books the same way he can inherit the books on my bookcase.

Remembering Those Who Have Passed
How does the web change the way we mourn and memorialize people? Many have now had the experience of learning of the passing of a loved one online – the process of sorting through loss in the virtual town square of Facebook. How does our identity transform after we are gone? Who is entitled to tag us in a photo?

My family suffered a tragic loss in 2009 and my reaction was to create a website dedicated to preserving memories of my cousin. At the Casey Feldman Memories site, her friends and family can contribute memories about her. As the site evolved, we also added a section to preserve her writing (she was a journalism student) – I kept imagining the day when we realized that we could no longer access her published articles online. I built the site using Omeka and I know that we have control over all the stories and photos and articles stored within the database.

It will be interesting to watch as services such as Chronicle of Life spring up claiming to help you “Save your memories FOREVER!”. They carefully explain why they are a trustworthy digital repository and why they backup their claims with a money-back guarantee.

For as little as $10, you can preserve your life story or daily journal forever: It allows you to store 1,000 pages of text, enough for your complete autobiography. For the same amount, you could also preserve less text, but up to 10 of your most important photos. – Chronicle of Life Pricing

Privacy
There are also some interesting questions about privacy and the rights of those who have passed to keep their secrets. Facebook currently deletes some parts of a profile when it converts it to a ‘memorial’ profile. They state that this is for the privacy of the original account holder. If users are ultimately given more power over the disposition of their social web presence – should these same choices be respected by archivists? Or would these choices need to be respected the way any other private information is guarded until some distant time after which it would then be made available?

Conculsion
Thanks again to all the presenters – this really was one of the best sessions for me at SXSWi! I loved that it got a whole different community of people thinking about digital preservation from a personal point of view. You may also want to read about Digital Death Day – one coming up in May 2011 in the San Francisco Bay Area and another in September 2011 in the Netherlands.

Image credit: Excerpt from Ryan Robinson’s Visual Map created live during the SXSW session.

SXSW Interactive: Data and Revelations

I am typing on a laptop in the Samsung blogger lounge at SXSW. Given this easy opportunity to blog, I wanted to share the overarching theme for my experience so far (3 days in) to SXSW Interactive. Data. It is all about data. APIs exposing data. People visualizing data. Using data to make business and policy decisions. Graphing data to keep track of web site and application performance. Privacy of data. Crowdsourcing data. Data about social media behavior. And on and on!

It has been a common thread I have traced from session to session, conversation to conversation. I expect someone with less of a database and metadata fixation might see something else as the overall meme, but I have a purse full of cards pointing me to new data sources and a notebook full of URLs to track down later to defend my view.

I keep catching myself giving mini-lessons on archives and preservation of electronic records like some sort of envoy from another universe. While I feel like a strong overall tech person at an archives conference, I feel like a data and visualization person here. This morning two of my sessions were over in the same hotel that SAA in Austin was hosted in and it was strange to be in that hotel with such a different group of people. I have managed to connect with an assortment of digital humanities folks. Someone even managed to find space for and plan an informal event for tomorrow night: Innovating and Developing with Libraries, Archives, and Museums.

My list of tech to learn (HTML5, NoSQL) and projects to contemplate and move forward (mostly ideas for visualizations using all the data everyone is sharing) is getting longer by the hour. It has been a process to figure out how to get the most I can out of SXSW. It is definitely more a space for inspiration than for deep diving into specifics. Letting go of the instinct that I am supposed to ‘learn new skills’ at a conference is fabulous!

Heading to Austin for SXSW Interactive

Anyone out there going to be at SXSWi? I would love to find like-minded DH (digital humanities) and GLAM (Galleries, Libraries, Archives & Museums) folks in Austin. If you can’t go, what do you wish I would attend and blog about after the fact?

No promises on thoroughness of my blogging of course. I never have mastered the ‘live blogging’ approach, but I do enjoy taking notes and if the past is any guide to the future I usually manage at least 2 really detailed posts on sessions from any one conference. The rest end up being notes to myself that I always mean to somehow go back to and post later. Maybe I need to spend a month just cleaning up and posting old session summaries (or at least those that still seem interesting and relevant!).

Drop me a comment below or contact me directly and let me know if you will be in Austin between March 10 and 15. Hope to see some of you there!

Creative Funding for Text-Mining and Visualization Project

The Hip-Hop word count project on Kickstarter.com caught my eye because it seems to be a really interesting new model for funding a digital humanities project. You can watch the video below – but the core of the project tackles assorted metadata from 40,000 rap songs from 1979 to the present including stats about each song (word count, syllables, education level, etc), individual words, artist location and date. This information aims to become a public online almanac fueled by visualizations.

I am a backer of this project, and you can be too. As of the original writing of this post, they are currently 47% funded twenty-eight days out from their deadline. For those of you not familiar with Kickstarter, people can post creative projects and provide rewards for their funders. The funding only goes through if they reach their goal within the time limit – otherwise nothing happens, a model they call ‘all-or-nothing funding’.

What will the money be spent on?

  • 45% for PHP programmers who have been coding the custom web interface
  • 35% for interface designers
  • 10% for data acquisition & data clean up
  • 10% for hosting bills

They aim for a five month time-line to move from their existing functional prototype to something viable to release to the public.

I am also intrigued by ways that the work on this project might be leveraged in the future to support similar text-mining projects that tie in location and date. How about doing the same thing with civil war letters? How about mining the lyrics from Broadway musical songs?

If this all sounds interesting, take a look at the video below and read more on the Hip-Hop Word Count Kickstarter home page. If half the people who follow my RSS feed pitch in $10, this project would be funded. Take a look and consider pitching in. If this project doesn’t speak to you – take a look around Kickstarter for something else you might want to support.