Supporting Appraisal of Digital Records

In his recent post to the A+A Listserv, Richard Pearce-Moses explores some really interesting ideas related to the appraisal of a listserv. The notions that particularly caught my imagination were in these passages:

We could take advantage of the fact that the list is in electronic format and conceivably use some AI filters to do some weeding. But at what cost? Is this truly feasible? And what are the implications on the integrity of the collection if only a portion are saved?

I was particularly interested in the number of people who said they searched the lists’ archives. Although demonstrated use can be used to justify preservation, what is sufficient use and how do we measure it? Are there use patterns that suggest these messages are inactive, with use falling off over time in a pattern that suggests they not be kept permanently? (To my knowledge the server logs are not accessible.)

What sort of infrastructure could archivists work toward putting in place to support automated weeding of listserv postings? If the postings were not sent via email but rather posted via some other interface, I can imagine a choice being presented at the time the post was written ‘Keep’ vs ‘Discard After 6 months’. There is something like this already in place for some government email systems – the sender indicates if the message is ‘permanent’ when the message is sent. Of course that presents a whole series of new problems. Someone in one of my classes mentioned that U.S. White House staffers had taken to marking EVERYTHING as permanent because emails that were marked not permanent were being scrutinized NOW. I wish I could find a source for this story online – but all I am finding today is the latest hubbub about White House staffers using non-government email accounts to communicate when they didn’t want to worry about it being preserved (or at least that seems to be the current allegation ). Luckily for this discussion we aren’t worried about people hiding posts.

Of course some of this can be implemented via those who post to the list. If everyone (anyone?) used standard post title prefixes, the appraising archivist of the future could easily filter out entire subsets of posts. SAA has this posted on the Terms of Participation Page for the Archives and Archivists List:

In order to maintain a highly informative and focused professional forum, SAA strongly encourages list participants to use the following labels at the beginning of all subject lines. This will allow others to filter list messages via mail rules and automatically select those types of information according to their individual needs and preferences.

“Calls:” (Calls for papers, survey participation, etc.)

“Disc:” (Discussion on various topics)

“Event:” (Conference, seminar, workshop announcements, etc.)

“FF:” (“Friday funnies,” see below)

“FYI:” (General announcements and information)

“Job:” (Job announcements)

“Media:” (Links to archives and archivists in the news)

“Qs:” (Questions)

“Pubs:” (Announcements re: books, chapters, papers, dissertations, and reviews)

There is a link to this page at the bottom of EVERY post to the listserv, but I have rarely seen anyone use these prefixes in post titles. ‘Job’ is the one that gets the most use, and usually only for a short time after someone politely ask everyone creating job posts to make sure they have good titles.

The idea of examining ‘usage’ patterns is also an interesting one. If we could easily capture and examine the view and search logs of posts we could build an understanding of what types of posts really are re-examined over time. But then what do we do with that information? Does past interest in a topic translate into permanent informational value? Just because someone didn’t look at it again yet – does that mean we assume that no-one will every be interested in its content?

My instinct (when wearing my techie hat) is to vote for the ‘keep it all – disk space is cheap’ approach. That said, I know that the expense of the space on that first hard drive you save your records on is just the tip of the iceberg in terms of digital preservation expenses.

Thinking about what you want to keep before you turn on any software system is always going to make things easier. I know that as the laws in the US continue to evolve to demand the retention of specific types of data the software will also continue evolve to make it easy to keep ONLY what needs to be kept. Private sector companies are usually quite intent on sticking to the letter of the law in that regard – they never want to keep more than they must (or so their lawyers like to tell them). It is also in the best interest of the software companies to ensure that all the required records are being kept.

Another driving force to generate systems that know how to filter and keep the ‘right’ records (whatever that means) could be individual users. In a universe of digital cameras where you can take 1000 photos as cheaply as 100 – I wonder if there is a place for software that intelligently archives your most frequently accessed (and tagged and shared) photos. The flip side of that could be auto-weeding (perhaps with a quick review option) every year. This would be the same approach some take to cleaning out their clothes closets – if I haven’t touched it in 2 years, then I should get rid of it.

While doing my research last term into the appraisal of Digital GIS records, I was amazed by how much of what was currently being done could only be accomplished by brute force. Frequently the work is being done through the sheer will of a small group of very dedicated people using tools not particularly suited to the task. I need to do more research into the realm of electronic record management – I want to understand what standard tools are being supplied (or not supplied). Are there tools for those who manage large repositories of electronic records where there is an acknowledged goal of supporting records scheduling and permanent preservation?

In our increasingly digital world, I think there will always be cases of born digital records that must be considered for appraisal without all the answers to all our questions. I am just fascinated at the notion of building the tools we need into the software systems from the start. At the end of a record’s active life cycle we would then be able to make and implement appraisal choices more easily. Imagine that – planning ahead for appraisal!