Menu Close

Year: 2007

The Archives and Archivists Listserv: hoping for a stay of execution

There has been a lot of discussion (both on the Archives & Archivists (A&A) Listserv and in blog posts) about the SAA‘s recent decision to not preserve the A&A listserv posts from 1996 through 2006 when they are removed from the listserv’s old hosting location at Miami University of Ohio.

Most of the outcry against this decision has fallen into two camps:

  • Those who don’t understand how the SAA task force assigned to appraise the listserv archives could decide it does not have informational value – lots of discussion about how the listserv reflects the move of archivists into the digital age as well as it’s usefulness for students
  • Those who just wish it wouldn’t go away because they still use it to find old posts. Some mentioned that there are scholarly papers that reference posts in the listserv archives as their primary sources.

I added this suggestion on the listserv:

I would have thought that the Archives Listserv would be the ideal test case for developing a set of best practices for archiving an organization’s web based listserv or bboard.

Perhaps a graduate student looking for something to work on as an independent project could take this on? Even if they only got permission for working with posts from 2001 onward [post 2001 those who posted had to agree to ‘terms of participation’ that reduce issues with copyright and ownership] – I suspect it would still be worthwhile.

I have always found that you can’t understand all the issues related to a technical project (like the preservation of a listserv) until you have a real life case to work on. Even if SAA doesn’t think we need to keep the data forever – here is the perfect set of data for archivists to experiment with. Any final set of best practices would be meant for archivists to use in the future – and would be all the easier to comprehend if they dealt with a listserv that many of them are already familiar with.

Another question: couldn’t the listserv posts still be considered ‘active records’? Many current listserv posters claim they still access the old list’s archives on a regular basis. I would be curious what the traffic for the site is. That is one nice side effect of this being on a website – it makes the usage of records quantifiable.

There are similar issues in the analog world when records people still want to use loose their physical home and are disposed of but, as others have also pointed out, digital media is getting cheaper and smaller by the day. We are not talking about paying rent on a huge wharehouse or a space that needs serious temperature and humidity control.

I was glad to see Rick Prelinger’s response on the current listerv that simply reads:

The Internet Archive is looking into this issue.

I had already checked when I posted my response to the listerv yesterday – having found my way to the A&A old listserv page in the Wayback Machine. For now all that is there is the list of links to each week’s worth of postings – nothing beyond that has been pulled in.

I have my fingers crossed that enough of the right people have become aware of the situation to pull the listserv back from the brink of the digital abyss.

NARA’s Electronic Records Archives in West Virginia

“WVU, NATIONAL ARCHIVES PARTNER” from http://wvutoday.wvu.edu/news/page/5419/

In a press release dated February 28, 2007, the National Archives and Records Administration of the United States (NARA) and West Virginia University (WVU) declared they had signed “a Memorandum of Understanding to establish a 10-year research and educational partnership in the study of electronic records and the promotion of civic awareness of the use of electronic records as educational resources.” It goes on to say that the two organizations “will engage in collaborative research and associated educational activities” including “research in the preservation and long-term access to complex electronic records and engineering design documentation.” WVU will receive “test collections” of electronic records from NARA to support their research and educational activities.

This sounded interesting. I stumbled across this on NARA’s website while looking for something else. No blog chatter or discussions about what this means for electronic records research (thinking of course of the big Footnote.com announcement and all the back and forth discussion that inspired). So I went hunting to see if I could find the actual Memorandum of Understanding. No sign of it. I did find WVU’s press release which included the photo above. This next quote is in the press release as well:

The new partnership complements NARA’s establishment of the Electronic Records Archives Program operations at the U.S. Navy’s Allegany Ballistics Laboratory in Rocket Center near Keyser in Mineral County.

Googling Allegany Ballistics Laboratory got me information about how it is a superfund site that is in late or final stages of cleanup. It also led me to an article from Senator Byrd about how pleased he was in October of 2006 about a federal spending bill that included funds for projects at ABL – including a sentence mentioning how NARA “will use the Mineral County complex for its electronic records archive program.” No mention of this on the Electronic Records Archive (ERA) website or on their special press release page. I don’t see any info about any NARA installations in West Virginia on their Locations webpage .

Then I found the WVU newspaper The Daily Athenaeum and an article titled “National Archives, WVU join forces ” dated March 1, 2007. (If the link gives you trouble – just search on the Athenaeum site for NARA and it should come right up.) The following quote is from the article:

”This is a tremendous opportunity for WVU. The National Archives has no other agreements like this with anyone else,” said John Weete, WVU’s vice president for research and economic development.

The University will help the NARA develop the next generation of technologies for the Electronic Records Archives. WVU will also assist in the management of NARA’s tremendous amount of data, Weete said.

”This is a great opportunity for students. The Archives will look for students who are masters at handling records and who care about the documents (for future job opportunities), ” said WVU President David Hardesty.

WVU students and faculty will hopefully soon have access to the Rocket Center archives, and faculty will be overseeing the maintenance of such records, Hardesty said.

Perhaps I am reading more into this than was intended, but I am confused. I was unable to find any information on the WVU website about an MLS or Archival Studies program there. I checked in both the ALA’s LIS directory and the SAA’s Directory of Archival Education to confirm there are no MLS or Archives degree programs in West Virginia. So where are the “students who are masters at handling records” going to come from? I work daily in the world of software development and I can imagine Computer Scientists who are interested in electronic records and their preservation. But as I have discovered many times over during my archives coursework there are a lot of important and unique ideas to learn in order to understand everything that is needed for the archival preservation of electronic records for “the life of the republic” (as NARA’s ERA project is so fond of saying).

I am pleased for WVU to have made such a landmark agreement with NARA to study and further research into the preservation and educational use of electronic records. Unfortunately I am also suspicious of this barely mentioned bit about the Rocket Center archives and ABL and how WVU is going to help NARA manage their data.

Has anyone else heard more about this?

Update (03/07/07):

Thanks to Donna in the comments for suggesting that WVU’s program is in ‘Public History’ (a aterm I had not thought to look under). This is definitely more reassuring.

WVU appears to offer both a Certificate in Cultural Resource Management and a M.A. in Public History – both described here on the Cultural Resource Management and Public History Requirements page.

The page listing History Department graduate courses included the two ‘public history’ courses listed below:

412 Introduction to Public History. 3 hr. Introduction to a wide range of career possibilities for historians in areas such as archives, historical societies, editing projects, museums, business, libraries, and historic preservation. Lectures, guest speakers, field trips, individual projects.

614 Internship in Public History. 6 hr. PR: HIST 212 and two intermediate public history courses. A professional internship at an agency involved in a relevant area of public history. Supervision will be exercised by both the Department of History and the host agency. Research report of finished professional project required.

Academy Awards: Archives Highlighted during the 60 second description of the Academy

Last night on the 79th Annual Academy Awards, Ellen Degeneres claimed that she bet the Academy’s President Sid Ganis a dollar that he couldn’t explain everything that the Academy of Motion Picture Arts and Sciences does (beyond the Academy Awards) in under 60 seconds. Off Mr. Ganis went – super speed talking and highlighting all the fabulous things the Academy does when it isn’t on TV giving out little statues. There in the middle was a beautiful cameo for the Margaret Herrick Library and the Academy Film Archives. It was all going so fast it was hard to get more than a fleeting impression of shelves full of film canisters, movie posters and a beautiful research space.

It is nice to see archives and special collections such as these being featured realistically and enthusiastically in the middle of a show with such a wide reach to the general public.

Understanding Born Digital Records: Journalists and Archivists with Parallel Challenges

My most recent Archival Access class had a great guest speaker from the Journalism department. Professor Ira Chinoy is currently teaching a course on Computer-Assisted Reporting. In the first half of the session, he spoke about ways that archival records can fuel and support reporting. He encouraged the class to brainstorm about what might make archival records newsworthy. How do old records that have been stashed away for so long become news? It took a bit of time, but we got into the swing of it and came up with a decent list. He then went through his own list and gave examples of published news stories that fit each of the scenarios.

In the second half of class he moved on to address issues related to the freedom of information and struggling to gain access to born digital public records. Journalists are usually early in the food chain of those vying for access to and understanding of federal, state and local databases. They have many hurdles. They must learn what databases are being kept and figure out which ones are worth pursuing. Professor Chinoy relayed a number of stories about the energy and perseverance required to convince government officials to give access to the data they have collected. The rules vary from state to state (see the Maryland Public Information Act as an example) and journalists often must quote chapter and verse to prove that officials are breaking the law if they do not hand over the information. There are officials who deny that the software they use will even permit extractions of the data – or that there is no way to edit the records to remove confidential information. Some journalists find themselves hunting down the vendors of proprietary software to find out how to perform the extract they need. They then go back to the officials with that information in the hopes of proving that it can be done. I love this article linked to in Prof. Chinoy’s syllabus: The Top 38 Excuses Government Agencies Give for Not Being Able to Fulfill Your Data Request (And Suggestions on What You Should Say or Do).

After all that work – just getting your hands on the magic file of data is not enough. The data is of no use without the decoder ring of documentation and context.

I spent most of the 1990s designing and building custom databases, many for federal government agencies. There are an almost inconceivable number of person hours that go into the creation of most of these systems. Stakeholders from all over the organization destined to use the system participate in meetings and design reviews. Huge design documents are created and frequently updated … and adjustments to the logic are often made even after the system goes live (to fix bugs or add enhancements). The systems I am describing are built using complex relational databases with hundreds of tables. It is uncommon for any one person to really understand everything in it – even if they are on the IT team for the full development life cycle.

Sometimes you get lucky and the project includes people with amazing technical writing skills, but usually those talented people are aimed at writing documentation for users of the system. Those documents may or may not explain the business processes and context related to the data. They will rarely expose the relationship between a user’s actions on a screen and the data as it is stored in the underlying tables. Some decisions are only documented in the application code itself and that is not likely to be preserved along with the data.

Teams charged with the support of these systems and their users often create their own documents and databases to explain certain confusing aspects of the system and to track bugs and their fixes. A good analogy here would be to the internal files that archivists often maintain about a collection – the notes that are not shared with the researchers but instead help the archivists who work with the collection remember such things as where frequently requested documents are or what restrictions must be applied to certain documents.

So where does that leave those who are playing detective to understand the records in these systems? Trying to figure out what the data in the tables mean based on the understanding of end-users can be a fool’s errand – and that is if you even have access to actual users of the system in the first place. I don’t think there is any easy answer given the realities of how many unique systems of managing data are being used throughout the public sector.

Archivists often find themselves struggling with the same problems. They have to fight to acquire and then understand the records being stored in databases. I suspect they have even less chance of interacting with actual users of the original system that created the records – though I recall discussions in my appraisal class last term about all the benefits of working with the producers of records long before they are earmarked to head to the archives. Unfortunately, it appeared that this was often the exception rather than the rule – even if it is the preferred scenario.

The overly ambitious and optimistic part had the idea that what ‘we’ really need is a database that lists common commercial off-the-shelf (COTS) packages used by public agencies – along with information on how to extract and redact data from these packages. For those agencies using custom systems, we could include any information on what company or contractors did the work – that sort of thing can only help later. Or how about just a list of which agencies use what software? Does something like this exist? The records of what technology is purchased are public record – right? Definitely an interesting idea (for when I have all that spare time I dream about). I wonder if I set up a wiki for people to populate with this information if people would share what they already know.

I would like to imagine a future world in which all this stuff is online and you can login and download any public record you like at any time. You can get a taste of where we are on the path to achieving this dream on the archives side of things by exploring a single series of electronic records published on the US National Archives site. For example, look at the search screen for World War II Army Enlistment Records. It includes links to sample data, record group info and an FAQ. Once you make it to viewing a record – every field includes a link to explain the value. But even this extensive detail would not be enough for someone to just pick up these records and understand them – you still need to understand about World War II and Army enlistment. You still need the context of the events and this is where the FAQ comes in. Look at the information they provide – and then take a moment to imagine what it would take for a journalist to recreate a similar level of detailed information for new database records being created in a public agency today (especially when those records are guarded by officials who are leery about permitting access to the records in the first place).

This isn’t a new problem that has appeared with born digital records. Archivists and journalists have always sought the context of the information with which they are working. The new challenge is in the added obstacles that a cryptic database system can add on top of the already existing challenges of decrypting the meaning of the records.

Archivists and Journalists care about a lot of the same issues related to born digital records. How do we acquire the records people will care about? How do we understand what they mean in the context of why and how they were created? How do we enable access to the information? Where do we get the resources, time and information to support important work like this?

It is interesting for me find a new angle from which to examine rapid software development. I have spent so much of my time creating software based on the needs of a specific user community. Usually those who are paying for the software get to call the shots on the features that will be included. Certain industries do have detailed regulations designed to promote access by external observers (I am thinking of applications related to medical/pharmaceutical research and perhaps HAZMAT data) but they are definitely exceptions.

Many people are worrying about how we will make sure that the medium upon which we record our born digital records remains viable. I know that others are pondering how to make sure we have software that can actually read the data such that it isn’t just mysterious 1s and 0s. What I am addressing here is another aspect of preservation – the preservation of context. I know this too is being worried about by others, but while I suspect we can eventually come up with best practices for the IT folks to follow to ensure we can still access the data itself – it will ultimately be up to the many individuals carrying on their daily business in offices around the world to ensure that we can understand the information in the records. I suppose that isn’t new either – just another reason for journalists and archivists to make their voices heard while the people who can explain the relationships between the born digital records and the business processes that created them are still around to answer questions.

Should we be archiving fonts?

I am a fan of beautiful fonts. This is why I find myself on the mailing list if MyFonts.com. I recently received their Winter 2007 newsleter featuring the short article titled ‘A cast-iron investment’. It starts out with:

Of all the wonderful things about fonts, there’s one that is rarely mentioned by us font sellers. It’s this: fonts last for a very long time. Unlike almost all the other software you may have bought 10 or 15 years ago, any fonts you bought are likely still working well, waiting to be called back into action when you load up that old newsletter or greetings card you made!

Interesting. The article goes on to point out:

But, of course, foundries make updates to their fonts every now and then, with both bug fixes and major upgrades in features and language coverage.

All this leaves me wondering if there is a place in the world for a digital font archive. A single source of digital font files for use by archives around the world. Of course, there would be a number of hurdles:

  1. How do you make sure that the fonts are only available for use in documents that used the fonts legally?
  2. How do you make sure that the right version of the font is used in the document to show us how the document appeared originally?

You could say this is made moot by using something like Adobe’s PDF/A format. It is also likely that we won’t be running the original word processing program that used the fonts a hundred years from now.

Hurdles aside, somehow it feels like a clever thing to do. We can’t know how we might enable access to documents that use fonts in the future. What we can do is keep the font files so we have the option to do clever things with them in the future.

I would even make a case for the fact that fonts are precious in their own right and deserve to be preserved. My mother spent many years as a graphic designer. From her I inherited a number of type specimen books – including one labeled “Adcraft Typographers, Inc”. Google led me to two archival collections that include font samples from Adcraft:

Another great reason for a digital font archive is the surge in individual foundries creating new fonts every day. What once was an elite craft now has such a low point of entry that anyone can download some software and hang out their shingle as a font foundry. Take a look around MyFonts.com. Read about selling your fonts on MyFonts.com.

While looking for a good page about type foundries I discovered the site for Precision Type which shows this on their only remaining page:

For the last 12 years, Precision Type has sought to provide our customers with convenient access to a large and diverse range of font software products. Our business grew as a result of the immense impact that digital technology had in the field of type design. At no other time in history had type ever been available from so many different sources. Precision Type was truly proud to play a part in this exciting evolution.

Unfortunately however, sales of font software for Precision Type and many others companies in the font business have been adversely affected in recent years by a growing supply of free font software via the Internet. As a result, we have decided to discontinue our Precision Type business so that we can focus on other business opportunities.

I have to go back to May 23, 2004 in the Internet Archive Wayback Machine to see what Precision Type’s used to look like.

There are more fonts than ever before. Amateurs are driving professionals out of business. Definitely sounds like digital fonts and their history are a worthy target for archival preservation.

Spring 2007:Access and Information Visualization

I don’t often post explicitly about my experiences as a graduate student – but I want to let everyone know about the focus of my studies for the next four months. I am taking two courses that I hope will complement one another. One course is on Archival Access (description, MARC, DACS, EAD and theory). The other is on Information Visualization over in the Computer Science department.

My original hope was that in my big Information Visualization final project I might get the opportunity to work with some aspect of archives and/or digital records. I want to understand how to improve access and understanding of the rich resources in the structured digital records repositories in archives around the world. What has already happened just one week into the term is that I find myself cycling through multiple points of view as I do my readings.

How can we support interaction with archival records by taking advantage of the latest information visualization techniques and tools? We can make it easier to understand what records are in a repository – both analog and digital records. I have been imagining interactive visual representations of archives collections, time periods, areas of interest and so forth. When you visit an archives’ website – it can often be so hard to get your head around the materials they offer. I suspect that this is often the case even when you are standing in the same building as the collections. In my course on appraisal last term we talked a lot about examining the collections that were already present on the path to creating a collecting policy. I am optimistic about ways that visualizing this information could improve everyone’s understanding of what an archives contains, for archivists and researchers alike.

Once I get myself to stop those daydreams… I move on to the next set of daydreams. What about the products of these visual analytics tools? How do we captured interactive visualizations in archives? This seems like a greater challenge than the average static digital record (as if there really is such an animal as an ‘average’ digital record). I can see a future in which major government and business decisions are made based on the interpretation of such interactive data models, graphs and charts. Instead of needing just the ‘records’ – don’t we need a way to recreate the experience that the original user had when interacting with the records?

This (unsurprisingly) takes me back to the struggle of how to define exactly what a record is in the digital world. Is the record a still image of a final visualization? Can this actually capture the full impact of an interactive and possible 3D visualization? With information visualization being such a rich and dynamic field I feel that there is a good chance that the race to create new methods and tools will zoom far ahead of plans to preserve its products.

I think some of my class readings will take extra effort (and extra time) as my mind cycles through these ideas. I think that a lot of this will come out in my posts over the next four months. And I still have strong hopes for rallying a team in my InfoViz class to work on an archives related project.

Book Review: Past Time, Past Place: GIS for History

Past Time, Past Place: GIS for History consists mainly of 11 case studies of geographic information systems being applied to the study of history. It includes a nice sprinkling of full color maps and images and a 20 page glossary of GIS terms. Each case study includes a list of articles and other resources for further reading.

The book begins with an introduction by the editor, Anne Kelly Knowles. This chapter explains the basics of using GIS to study history, as well as giving an overview of how the book is organized.

The meat of the book are the case studies covering the following topics:

I suspect that different audiences will take very different ideas away from this book. I was for looking for information about GIS and historical records (this is another book found during my mad hunt for information on the appraisal and preservation of GIS records) and found a bit of related information to add to my research. I think this book will be of interest to those who fall in any of the following categories:

  • Archivists curious about how GIS might enhance access to and understanding of the records under their care
  • Historians interested in understanding how GIS can be used to approach historical research in new ways
  • History buffs who love reading a good story (complete with pictures)
  • Map aficionados curious about new and different kinds of information that can be portrayed with GIS

I especially loved the maps and other images. I am a bit particular when it comes to the quality of graphics – but this book comes through with bright colors and clear images. The unusual square book format (measuring 9″x9″) gave those who arranged the layout lots of room to work – and they took full advantage of the space.

No matter if you plan to read the case studies for the history being brought to life or are looking for “how-tos” as you tackle your own GIS-History project – this book deserves some attention.

Footnote.com and US National Archives records

Thanks to Digitization 101‘s recent post “Footnote launches and announces partnership with National Archives” I was made aware of the big news about the digitization of the US National Archives’ records. Footnote.com has gone live with the first of apparently many planned installments of digitized NARA records. My first instinct was one of suspicion. In the shadow of recent historian alarm about the Smithsonian/Showtime deal, I think its valid to be concerned about new agreements between government agencies and private companies.

That said, I am feeling much more positive based on the passage below from the the January 10th National Archives Press Release about the agreement with Footnote (emphasis mine):

This non-exclusive agreement, beginning with the sizeable collection of materials currently on microfilm,will enable researchers and the general public to access millions of newly-digitized images of the National Archives historic records on a subscription basis from the Footnote web site. By February 6, the digitized materials will also be available at no charge in National Archives research rooms in Washington D.C. and regional facilities across the country. After an interval of five years, all images digitized through this agreement will be available at no charge through the National Archives web site .

This sounds like a win-win situation. NARA gets millions of records digitized (4.5 million and counting according to the press release). These records will be highlighed on the Footnote web site. They will have the advantages of Footnote’s search and browse interfaces (which I plan to do an in depth review of in the next week).

When signing up for my free account – I actually read through the entire Footnote Terms of Service including this passage (within the section labeled ‘Our Intellectual Property Rights’ – again, emphasis mine):

Content on the Website is provided to you AS IS for your information and personal use only as permitted through the functionality of the Website and may not be used, copied, reproduced, distributed, transmitted, broadcast, displayed, sold, licensed, or otherwise exploited for any other purposes whatsoever without the prior written consent of the respective owners . Footnote.com reserves all rights not expressly granted in and to the Website and the Content. You agree not to engage in the use, copying, or distribution of any of the Content other than expressly permitted herein, including any use, copying, or distribution of User Submissions of third parties obtained through the Website for any commercial purposes. If you download or print a copy of the Content for personal use, you must retain all copyright and other proprietary notices contained therein.

These terms certainly are no different from that under which most archives operate – but it did give me a moment of wondering how many extra hoops one would need to jump through if you wanted to use any of the NARA records found in Footnote for a major project like a book. A quick experiment with the Pennsylvania Archives (which are available for free with registration) did not show me any copyright information or notices related to rights. I downloaded an image to see what ‘copyright and other proprietary notices’ I might find and found none.

In his post “The Flawed Agreement between the National Archives and Footnote, Inc.“, Dan Cohen expresses his views of the agreement. I had been curious about what percentage of the records being digitized were out of copyright – Dan says they all are. If all of the records are out of copyright – exactly what rights are Footnote.com reserving (in the passage from the terms of service shown above)? I also agree with him in his frustration about the age restriction in place for using Footnote.com (you have to be over 18).

My final opinion about the agreement itself will depend on answers to a few more questions:

1) Were any of the records recently made available on Footnote.com already digitized and available via the archives.gov website?

2) What percentage of the records that were digitized by Footnote would have been digitized by NARA without this agreement?

3) What roadblocks will truly be set in place for those interested in using records found on Footnote.com?

4) What interface will be available to those accessing the records for free in “National Archives research rooms in Washington D.C. and regional facilities across the country” (from the press release above)? Will it be the Footnote.com website interface or via NARA’s own Archival Research Catalog (ARC) or Access to Archival Databases (AAD)?

If the records that Footnote has digitized and made available on Footnote.com would not otherwise have been digitized over the course of the next five years (a big if) then I think this is an interesting solution. Even the full $100 fee for a year subscription is much more reasonable than many other research databases out there (and certainly cheaper than even a single night hotel room within striking distance of National Archives II).

As I mentioned above, I plan to post a review of the Footnote.com search and browse interfaces in the next week. The Footnote.com support folks have given me permission to include screen shots – so if this topic is of interest to you, keep an eye out for it.