Menu Close

Controversial Photos, Archvists’ Choices and Journalism

New York Times Magazine Cover: January 1995The New York Times Magazine published The Great Ivy League Nude Posture Photo Scandal in January of 1995. Still available online, it is a fascinating tale that took reporter Ron Rosenbaum on a wild hunt through multiple archives in a quest for long lost photographs. I spotted a link to the article in a post on Boing Boing – and once I started reading it I couldn’t stop.

The story includes thorough coverage of the research (and the footwork and the paperwork) it took to find the final resting place of some very controversial photographs. Taken as part of the orientation process of new students at Ivy League and Seven Sisters school campuses predominately during the 1940’s, 50’s and 60’s, these photos were theoretically taken to screen for students who needed remedial posture classes. William Herbert Sheldon was a driving force behind many of the photos. Best known for assigning people into three categories of body types in the 1940s, Sheldon based his categories of endomorphic, mesomorphic, and ectomorphic on measurements done using the student photographs. Rosenbaum’s quest was to find the real story behind the photos and to discover if any of the photos survived the purging fires at that occurred at many of the schools involved.

His first stop was Harvard’s archives:

Harley P. Holden, curator of Harvard’s archives, said that from the 1880’s to the 1940’s the university had its own posture-photo program in which some 3,500 pictures of its students were taken. Most were destroyed 15 or 20 years ago “for privacy scruples,” Holden said. Nonetheless, quite a few Harvard nudes can be found illustrating Sheldon’s book on body types, the Atlas of Men. Radcliffe took posture photos from 1931 to 1961; the curator there said that most of them had been destroyed (although some might be missing) and that none were taken by Sheldon.

A major turning point in Sheldon’s project came in 1950. He went to the University of Washington to further his plans to make an Altas of Women. The families of a few photographed females students at the university questioned the real purpose of the photographs. The resulting upheaval culminated in the destruction of many photographs. A Time article dated September 25, 1950, Revolt at Washington, documents the events in Washington and notes that over 800 photos were burned.

Rosenbaum’s article goes on mention that thousands of photos were subsequently burned at Harvard, Vassar and Yale in the 60’s and 70’s – but he continued to hunt for the ones that some believed had escaped into Sheldon’s private archives. A chain of contacts led Rosenbaum to Sheldon’s former associate Roland D. Elderkin. An elderly gentleman of 84 at the time of the story’s publication, Elderkin spent years assisting Sheldon. He took many of the photographs. And after being turned down by many archives, he found Sheldon’s records, photos and negatives a home in the National Anthropological Archives.

In 1987, the curators of the National Anthropological Archives acquired the remains of Sheldon’s life work, which were gathering dust in “dead storage” in a Goodwill warehouse in Boston. While there were solid archival reasons for making the acquisition, the curators are clearly aware that they harbor some potentially explosive material in their storage rooms. And they did not make it easy for me to gain access.

On my first visit, I was informed by a good-natured but wary supervisor that the restrictive grant of Sheldon’s materials by his estate would permit me to review only the written materials in the Sheldon archives. The actual photographs, he said, were off-limits. To see them, I would have to petition the chief of archivists. Determined to pursue the matter to the bitter end, I began the process of applying for permission.

In their online guide to collections I found the entry for SHELDON, WILLIAM HERBERT (1898-1977), Papers. It notes that the collection is 150 linear feet. It also includes a line that reads “RESTRICTION: The photographic material is not available for research.”

While Rosenbaum’s hunt was for the photographs, some of his most interesting discoveries came from the papers themselves. During his three month wait for permission to view the photos, he reviewed boxes of letters and notes. See Rosenbaum’s article for details – but it was Sheldon’s own words in those papers that revealed he held racist views and that he seemed more concerned with his research than with the psychological impact of his research on the girls whose photos he arranged to take.

When finally Rosenbaum was given the opportunity to review some 20,000 negatives of the photos (no photos and no names) we read:

A curator trundled in a library cart from the storage facility. Teetering on top of the cart were stacks of big, gray cardboard boxes. The curator handed me a pair of the white cotton gloves that researchers must use to handle archival material.

I love it – gray cardboard boxes and white cotton gloves. He even mentions the finding aids and gives examples of how the groups of photos are described. I also appreciate the earlier acknowledgment of the “solid archival reasons for making the acquisition”.

Rosenbaum looked through a lot of the negatives, mostly to verify that what the finding aids claimed were present were in fact in those gray boxes. He was struck by the contrast between the expressions on the mens’ and womens’ faces.

For the most part, the men looked diffident, oblivious. That’s not surprising considering that men of that era were accustomed to undressing for draft physicals and athletic-squad weigh-ins. But the faces of the women were another story. I was surprised at how many looked deeply unhappy, as if pained at being subjected to this procedure. On the faces of quite a few I saw what looked like grimaces, reflecting pronounced discomfort, perhaps even anger. I was not much more comfortable myself sitting there in the midst of stacks of boxes of such images. There I was at the end of my quest. I’d tracked down the fabled photographs, but the lessons of the posture-photo ritual were elusive.

He found the missing photos – but no easy answers. This is a great combination of a compelling story and a realistic representation of archives and archivists. The records don’t always hold the answers to the question you thought you were asking – but sometimes they hold secrets you hadn’t expected.

So many elements tie back to the choices made by individual archivists – sometimes made in the heat of the moment or under great community pressure. I think this story is a particularly poignant example of the downstream effects of these sorts of hard choices. It isn’t often that we can see cause and effect this clearly.

What would you have done? Would you have burned the photos or stored them away? Would you have stepped forward to take Sheldon’s records? If something like this happened today – what do you think the future of these photos might be?

Phoenix DVD destined for Mars

Hubble's Sharpest View Of Mars

When the Phoenix Mars Mission launches (possible as early as this Friday August 3rd, 2007), it will have something unusual on board. The Planetary Society has created what they call the Phoenix DVD.

In late May of 2007 they proudly announced that their special DVD was ready for launch:

… the silica glass mini-DVD with a quarter million names on it (including all Planetary Society members) has been installed on the Phoenix spacecraft and is ready to go to Mars!

In addition to the names, the disc also contains Visions of Mars, a collection of literature and art about the Red Planet. The names and Visions of Mars were written to the silica mini-DVD by the company Plasmon OMS using a special technique. The resulting archival disk should last at least hundreds of years on the Martian surface, ready to be picked up by future explorers.

After the disc was written, a special label was applied to the disc to identify it for future explorers.

The page about Visions of Mars describes it as follows:

Visions of Mars is a message from our world to future human inhabitants of Mars. It will launch on its way to the Red Planet in the summer of 2007 aboard the spacecraft Phoenix. Along with personal messages from leading space visionaries of our time, Visions of Mars includes a priceless collection of Mars literature and art, and a list of hundreds of thousands of names of space enthusiasts from around the world. The entire collection will be encoded on a mini-DVD provided by The Planetary Society, which will be affixed to the spacecraft.

All this has been inscribed on a silica mini-DVD – and has the phrase “Attention Astronauts: Take This With You” in bright red letters on the front. I hate to be cynical (and those of you who read this blog know that it is not my nature to be so) but where will those ‘future human inhabitants of Mars’ find a DVD player to watch this DVD? I know I am not the first to doubt their plan – but I couldn’t resist. Given my suspicion of the whole affair I thought I would at least look into the company that created this very special disk.

Plasmon has an extensive website with all sorts of interesting tidbits. They explain their trademarked Ultra Density Optical (UDO) technology. They feature two PDFs – one called Archiving Defined and another labeled Plasmon Archive Solution. It looks very interesting. My VERY oversimplified summary is that they have combined a RAID approach with a very durable and secure WORM (Write Once, Read Many) flavor of DVD and packaged it into a solution for companies who need to ensure their data remains safe.

I have been meaning to learn more about the latest and greatest in hardware and material solutions aimed at digital preservation in the corporate world – and Boing Boing’s post Mars Library of books, DVDs, and database is now ready for launch just gave me a great excuse to start to scratch the surface.

I have also been following the blog StorageSwitched! for a while. It is written by the CEO of StorageSwitch LLC (“a technology provider for the fixed content data storage market with a multitude of gateway and utility products and services”). I have found it interesting to take a look at the business and technology side of preserving information. I plan more posts in this vein as I learn more about what is out there and how it is being used.

Photo Credit: David Crisp and the WFPC2 Science Team (Jet Propulsion Laboratory/California Institute of Technology)

Public.Resource.Org: Creative Financing and Public Domain Content

Sunrise on Malibu Lake by Charles O'Rear (National Archives photo no. NWDNS-412-DA-15109) Public.resource.org is dedicated to using funds contributed by individuals to buy public domain content. This content is then released online in multiple locations such as the Internet Archive and Google Video for use by anyone. I love their tag line: Underwritten By The Feds! Overwritten By You!

I spotted this in boingboing’s post Liberated public domain government docs surfacing online and I was immediately intrigued. This isn’t really an archiving issue exactly – though you could decide that it takes more of a LOCKSS approach to preservation. I also wonder how this approach could be used to finance the digitization of other public domain materials.

The website explains on their About Us page that they have recently applied for non-profit status with the IRS, so soon the purchase price of these materials could become a tax deduction for those who file US Tax Returns. They feature materials from 54 different US Federal agencies – from the Fish and Wildlife Service to the IRS. There are materials on the Environment, Public Health, Flying and many more.

But that isn’t all they are tackling – back in May they issued a message to The Internet discussing their attitude toward (and frustration with) the Smithsonian Images website. It begins:

We write to you today on the subject of SmithsonianImages.SI.Edu, a government ecommerce site built on a repository of 6,288 images of national significance. The site is breathtaking in scope, with imagery ranging from the historic cyanotypes of Edward Muybridge to historic photos from aviation, natural history, and many other fields. If the Smithsonian Institution is our attic, these photos are our collective scrapbook.

However, the web site imposes draconian limits on the use of this imagery. The site includes a copyright notice that to the layman would certainly discourage any use of the imagery. While personal, non-commercial use is purportedly allowed, it requires a half-dozen clicks before the user is allowed to download a low-resolution, watermarked image. An image without the watermark and at sufficient resolution to be useful requires a hefty fee, manual approval by the Smithsonian staff, and the resulting invoice specifically prohibits any further use without permission.

The letter goes into great detail about why they disagree with how things are being done – take a look if you are curious. Also -they didn’t just create this letter – they also created a free to download book titled Public Domain Prospectus which they declare as a tool for those researching the public domain status of the 6,288 images included (in their low resolution watermarked versions).

I went hunting on the Smithsonian Images site to see for myself. I found a few things. While the prices for prints or digital files do seem expensive to my eyes – there is the following note included in the Product and Pricing Information:

Special Note on Pricing: Smithsonian Photographic Services, as an instrument of the Smithsonian Institution, is a non-profit entity. Fees associated with the delivery of images represent material fees only and go to support the broader mission to create, archive, and preserve images associated with the Institution and it’s holdings.

That page also includes some information about how the images may be used, but for the full story I headed over to the Copyright Policy. That is when I started to get confused. The copyright policy on that page talks about “Use of text, images and other content on this website…”. Does that mean these same rules apply to the images you purchase as well?

Let’s take a closer look at one of the pages about a specific image. Here is a nice one of Fireworks over National Monuments. I click on the tempting ‘Download Image’ button and now I see more about what the Public.Resource.Org folks are talking about. One more click and I finally find what appears to be the official Commercial Use of Smithsonian Images page which concludes with:

Commercial distribution, publication or exploitation of Smithsonian files is specifically prohibited. Anyone wishing to use any of these files or images for commercial use or publication must first request and receive prior permission by contacting [Smithsonian Institution Office of Imaging & Photographic Services]. Permission for such use is granted on a case-by-case basis. A usage fee may be involved depending on the type and nature of the proposed use.

There is a special policy for school, teacher and student use of the watermarked versions of the images for free (with the right citations of course).

If I understand the Public.Resource.Org’s issues, it isn’t predominately with the price of the high resolution digital versions or even the print versions of these photos (though they DO touch on it in their letter and I think I side with Smithsonian Images on that aspect – it does cost money and time to make all that available). Rather it is with the firmness that Smithsonian Images claims that you must request permission to use any of the images you purchase for anything beyond personal or educational use. I think I like what NARA has on their website concerning the publication of their still photos which begins with these two paragraphs:

Generally, photographic records copied and sold by the National Archives and Records Administration (NARA) may be published without special permission or additional fees. NARA does not grant exclusive or non-exclusive publication privileges. Copies of Federal records, as part of the public domain, are equally available to all.

A portion of the photographs among our holdings are or may be subject to copyright restrictions. The National Archives does not confirm the copyright status of photographs, but will provide any information filed with the photograph. It is important to note that all of the digital images that are available on our website are in the public domain.

I can see how it might seem safer (from a “don’t sue us” point of view) to force a search by hand for each and every image as users request to use them. At the same time I would like to think that the folks over at Smithsonian Images already know which images are in the public domain. Maybe I am oversimplifying this, but I want to believe that the details of copyright are part of the metadata that could be supplied along with the date, photographer’s name and description.

I prefer the National Archives’ approach of stating clearly that they do not confirm the copyright status of photographs. They put it in the hands of the entity who wants to use the materials – though that might be small comfort to the average citizen not well versed in copyright rules.

The Wikipedia page on Copyright status of work by the U.S. government includes sections about digital historical material as well as work produced by government contractors. Reading through this makes me realize how quickly the copyright status of images such as those provided by Smithsonian Images and NARA can get confusing.

I think what Public.Resource.Org is doing with their propagation of public domain materials to locations where the public can actually get at them easily is interesting. I want to check back in a year and see how much they have set loose – and what materials they are asking for help to liberate. As I mentioned above, I think there could be some interesting models of individuals donating money to finance the digitization and of public domain materials. Something like what Fundable does to take pledges toward a specific fund-raising goal – and then only turn those pledges into funds if the goal is reached.

As for their great frustration with Smithsonian Images? Well, I see Public.Resource.Org’s side. In this age of Flickr.com – people are growing used to watching for Creative Commons Licenses. With so much out there with liberal Creative Commons Licenses and in the Public Domain, why struggle with images that are copyright protected unless you really need to?

I would like to think that rights management is one of the first things that would get sorted out before a large image collection is put online – especially if the goal is to produce a revenue stream. That said – I would love to know the real story here. I can imagine that the rights on many of those images are not clear cut. But if the Smithsonian Image people know that some of them are in the public domain – then why would they go through all that extra trouble to force a rights search for every image? Why not distinguish the ones which require research from those that don’t? Couldn’t it only help support the work of the Smithsonian to have their images used by as many projects as possible? Anyone reading this have an answer for us from the inside?

About the image above: Given that I prefer images without watermarks (as provided by Smithsonian Images) and that I know that the images on NARA’s site are in the public domain I went hunting for something pretty – and found the image I feature above. To find it yourself do a search for [Sunrise on Malibu Lake] in the Archival Research Catalog (ARC). These are the details included with the image:

Sunrise on Malibu Lake in the Santa Monica mountains near Malibu, California, which is located on the northwestern edge of Los Angeles County. The mountains contain the last semi-wilderness in Los Angeles County. This area so far has escaped development pressure. Some 84 percent of the state’s residents live within 30 miles of the coast and this concentration has resulted in increasing land use pressure. Several commissions have been authorized by the legislature to restrict coastal development, 05/1975.

Item from Record Group 412: Records of the Environmental Protection Agency, 1944 – 2000. NARA NAIL Control Number: NWDNS-412-DA-15109. Photograph by Charles O’Rear.

Happy Birthday Spellbound Blog

One year ago, when I posted my Introduction post on July 19th of 2006, I had taken only 3 courses towards my MLS degree. I wasn’t quite sure what I was going to write about.. or how often. I wasn’t sure anyone would be interested in my posts. I was about a month away from standing in front of my poster at SAA passing out home-made cards with the name of this blog on them (and my blog URL scribbled on scraps of paper when I ran out of the cards). I posted summaries of many of the sessions I attended, but we never really reached critical mass with bloggers at the SAA 2006 conference in DC.

One year later and I have written 45,028 words in 72 posts (special thanks to the TD Word Count plugin for easy access to those stats). I have completed 7 out of the 12 courses required for my MLS. I am on a panel at the SAA conference in Chicago. I have shiny new cards to hand out to anyone who might want one. There already exists a page in the unofficial conference wiki waiting for people to sign up to cover various sessions at SAA 2007 in Chicago.

I have 145 subscribers to my RSS feed (thank you Feedburner). Most of those subscribers use either Bloglines or the Google Reader. I am proud that this blog is included in the ArchivesBlogs aggregator. According to Technorati, this blog has an Authority of 33 (which means that 33 blogs have linked to it in the past 6 months).

According to Google Analytics, I have had just over 5,000 unique visitors to my Spellbound Blog website. Those individuals have viewed a total of 13,900 pages (each with up to 10 posts on them). I have had visitors from the Americas, Europe, Asia, Oceania and Africa (those are Google Analytics geographic breakdowns). 27% of the visitors to Spellbound Blog are recurring visitors. While almost 25% of my visitors arrive because they just typed my URL into their browser, 37% have been referred from other sites and 38% referred from search engines. A full 35% of my site traffic is the result of organic Google searches – but those site visits average a 75% bounce rate so it is possible that many of those visitors take a quick look around, realize they are in the wrong place and continue on their way.

In contrast with what Google Analytics tells me, Awstats reports that I have had over 9,000 unique visitors in 2007 alone – but that seems somehow to include requests for my RSS feed. It is interesting to note here that it is not easy to be sure what the various statistics really mean. This stats confusion made me think of this quote: “A man with one watch knows what time it is; a man with two watches is never quite sure.” (Lee Segall).

The most popular post due to organic searches is the post titled 129th anniversary of Thomas Edison’s Invention of the Phonograph. Google currently returns this post in the 2nd slot for searches of impact of thomas edison inventions and at the bottom of the first page for invention of the phonograph. I would like to imagine that the 330 or so middle or elementary school students who stumbled onto this post were intrigued by my ideas, but the average time on the page is only a bit over 2 minutes – so who knows how many of them are actually reading it.

It is hard to know who is really reading what I write. I always appreciate comments on my posts – it makes me more confident that folks are in fact reading. I also just like the feedback.

All I can be certain of is that I still enjoy the research and the writing. I haven’t run out of ideas. During this past semester (during which I was taking 2 courses and working full time) I actually found myself annoyed by all the duties that prevented me from posting more often. I had one of those moments in which I realized that writing for this blog had turned into a reward rather than any sort of ‘work’.

So three cheers for a great first blog year! I have lots of ideas for the year ahead. I hope I can meet some of you at SAA in Chicago. My talk, “Communicating Context: The Power of Digital Interfaces”, will be part of the panel titled Preserving Context and Original Order in a Digital World (Session 804: Saturday September 1 at 1pm).

Thank you to everyone who reads Spellbound Blog. Thank you for your comments. Thank you for keeping me in (and adding me to) your RSS feed readers. Without all of you I would just be talking to myself.

Thoughts on Digital Preservation, Validation and Community

The preservation of digital records is on the mind of the average person more with each passing day. Consider the video below from the recent BBC article Warning of data ticking time bomb.


Microsoft UK Managing Director Gordon Frazer running Windows 3.1 on a Vista PC
(Watch video in the BBC News Player)

The video discusses Microsoft’s Virtual PC program that permits you to run multiple operating systems via a Virtual Console. This is an example of the emulation approach to ensuring access to old digital objects – and it seems to be done in a way that the average user can get their head around. Since a big part of digital preservation is ensuring you can do something beyond reading the 1s and 0s – it is promising step. It also pleased me that they specifically mention the UK National Archives and how important it is to them that they can view documents as they originally appeared – not ‘converted’ in any way.

Dorthea Salo of Caveat Lector recently posted Hello? Is it me you’re looking for?. She has a lot to say about digital curation , IR (which I took to stand for Information Repositories rather than Information Retrieval) and librarianship. Coming, as I do, from the software development and database corners of the world I was pleased to find someone else who sees a gap between the standard assumed roles of librarians and archivists and the reality of how well suited librarians’ and archivists’ skills are to “long-term preservation of information for use” – be it digital or analog.

I skimmed through the 65 page Joint Information Systems Committee (JISC) report Dorthea mentioned (Dealing with data: Roles, rights, responsibilities and relationships). A search on the term ‘archives’ took me to this passage on page 22:

There is a view that so-called “dark archives” (archives that are either completely inaccessible to users or have very limited user access), are not ideal because if data are corrupted over time, this is not realised until point of use. (emphasis added)

For those acquainted with software development, the term regression testing should be familiar. It involves the creation of automated suites of test programs that ensure that as new features are added to software, the features you believe are complete keep on working. This was the first idea that came to my mind when reading the passage above. How do you do regression testing on a dark archive? And thinking about regression testing, digital preservation and dark archives fueled a fresh curiosity about what existing projects are doing to automate the validation of digital preservation.

A bit of Googling found me the UK National Archives requirements document for The Seamless Flow Preservation and Maintenance Project. They list regression testing as a ‘desirable’ requirement in the Statement of Requirements for Preservation and Maintenance Project Digital Object Store (defined as “those that should be included, but possibly as part of a later phase of development”). Of course it is very hard to tell if this regression testing is for the software tools they are building or for access to the data itself. I would bet the former.

Next I found my way to the website for LOCKSS (Lots of Copies Keep Stuff Safe). While their goals relate to the preservation of electronically published scholarly assets’ on the web, their approach to ensuring the validity of their data over time should be interesting to anyone thinking about long term digital preservation.

In the paper Preserving Peer Replicas By Rate­Limited Sampled Voting they share details of how they manage validation and repair of the data they store in their peer-to-peer architecture. I was bemused by the categories and subject descriptors assigned to the paper itself: H.3.7 [Information Storage and Retrieval]: Digital Libraries; D.4.5 [Operating Systems]: Reliability . Nothing about preservation or archives.

It is also interesting to note that you can view most of the original presentation at the 19th ACM Symposium on Operating Systems Principles (SOSP 2003) from a video archive of webcasts of the conference. The presentation of the LOCKSS paper begins about halfway through the 2nd video on the video archive page .

The start of the section on design principles explains:

Digital preservation systems have some unusual features. First, such systems must be very cheap to build and maintain, which precludes high-performance hardware such as RAID, or complicated administration. Second, they need not operate quickly. Their purpose is to prevent rather than expedite change to data. Third, they must function properly for decades, without central control and despite possible interference from attackers or catastrophic failures of storage media such as fire or theft.

Later they declare the core of their approach as “..replicate all persistent storage across peers, audit replicas regularly and repair any damage they find.” The paper itself has lots of details about HOW they do this – but for the purpose of this post I was more interested in their general philosophy on how to maintain the information in their care.

DAITSS (Dark Archive in the Sunshine State) was built by the Florida Center for Library Automation (FCLA) to support their own needs when creating the Florida Center for Library Automation Digital Archive (Florida Digital Archive or FDA). In mid May of 2007, FCLA announced the release of DAITSS as open source software under the GPL license.

In the document The Florida Digital Archive and DAITSS: A Working Preservation Repository Based on Format Migration I found:

… the [Florida Digital Archive] is configured to write three copies of each file in the [Archival Information Package] to tape. Two copies are written locally to a robotic tape unit, and one copy is written in real time over the Internet to a similar tape unit in Tallahassee, about 130 miles away. The software is written in such a way that all three writes must complete before processing can continue.

Similar to LOCKSS, DAITSS relies on what they term ‘multiple masters’. There is no concept of a single master. Since all three are written virtually simultaneously they are all equal in authority. I think it is very interesting that they rely on writing to tapes. There was a mention that it is cheaper – yet due to many issues they might still switch to hard drives.

With regard to formats and ensuring accessibility, the same document quoted above states on page 2:

Since most content was expected to be documentary (image, text, audio and video) as opposed to executable (software, games, learning modules), FCLA decided to implement preservation strategies based on reformatting rather than emulation….Full preservation treatment is available for twelve different file formats: AIFF, AVI, JPEG, JP2, JPX, PDF, plain text, QuickTime, TIFF, WAVE, XML and XML DTD.

The design of DAITSS was based on the Reference Model for an Open Archival Information System (OAIS). I love this paragraph from page 10 of the formal specifications for OAIS adopted as ISO 14721:2002.

The information being maintained has been deemed to need Long Term Preservation, even if the OAIS itself is not permanent. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely. (emphasis added)

Another project implementing the OAIS reference model is CASPAR – Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval. This project appears much greater in scale than DAITSS. It started a bit more than 1 year ago (April 1, 2006) with a projected duration of 42 months, 17 partners and a projected budget of 16 million Euros (roughly 22 million US Dollars at the time of writing). Their publications section looks like it could sidetrack me for weeks! On page 25 of the CASPAR Description of Work, in a section labeled Validation, a distinction is made between “here and now validation” and “the more fundamental validation techniques on behalf of the ‘not yet born'”. What eloquent turns of phrase!

Page 7 found me another great tidbit in a list of digital preservation metrics that are expected:

2) Provide a practical demonstration by means of what may be regarded as “accelerated lifetime” tests. These should involve demonstrating the ability of the Framework and digital information to survive:
a. environment (including software, hardware) changes: Demonstration to the External Review Committee of usability of a variety of digitally encoded information despite changes in hardware and software of user systems, and such processes as format migration for, for example, digital science data, documents and music
b. changes in the Designated Communities and their Knowledge Bases: Demonstration to the External Review Committee of usability of a variety of digitally encoded information by users of different disciplines

Here we have thought not only about the technicalities of how users may access the objects in the future, but consideration of users who might not have the frame of reference or understanding of the original community responsible for creating the object. I haven’t seen any explicit discussion of this notion before – at least not beyond the basic idea of needing good documentation and contextual background to support understanding of data sets in the future. I love the phrase ‘accelerated lifetime’ but I wonder how good a job we can do at creating tests for technology that does not yet exist (consider the Ladies Home Journal predictions for the year 2000 published in 1900).

What I love about LOCKSS, DAITSS and CASPAR (and no, it isn’t their fabulous acronyms) is the very diverse groups of enthusiastic people trying to do the right thing. I see many technical and research oriented organizations listed as members of the CASPAR Consortium – but I also see the Università degli studi di Urbino (noted as “created in 1998 to co-ordinate all the research and educational activities within the University of Urbino in the area of archival and library heritage, with specific reference to the creation, access, and preservation of the documentary heritage”) and the Humanities Advanced Technology and Information Institute, University of Glasgow (noted as having “developed a cutting edge research programme in humanities computing, digitisation, digital curation and preservation, and archives and records management”). LOCKSS and DAITSS have both evolved in library settings.

Questions relating to digital archives, preservation and validation are hard ones. New problems and new tools (like Microsoft’s Virtual PC shown in the video above) are appearing all the time. Developing best practices to support real world solutions will require the combined attention of those with the skills of librarians, archivists, technologists, subject matter specialists and others whose help we haven’t yet realized we need. The challenge will be to find those who have experience in multiple areas and pull them into the mix. Rather than assuming that one group or another is the best choice to solve digital preservation problems, we need to remember there are scores of problems – most of which we haven’t even confronted yet. I vote for cross pollination of knowledge and ideas rather than territorialism. I vote for doing your best to solve the problems you find in your corner of the world. There are more than enough hard questions to answer to keep everyone who has the slightest inclination to work on these issues busy for years. I would hate to think that any of those who want to contribute might have to spend energy to convince people that they have the ‘right’ skills. Worse still – many who have unique viewpoints might not be asked to share their perspectives because of general assumptions about the ‘kind’ of people needed to solve these problems. Projects like CASPAR give me hope that there are more examples of great teamwork than there are of people being left out of the action.

There is so much more to read, process and understand. Know of a digital preservation project with a unique approach to validation that I missed? Please contact me or post a comment below.

Unofficial SAA2007 Chicago Conference Wiki Now Online

wiki_green2_logo.gifIt is alive! Take a look at the fabulous new SAA2007 Unofficial Conference Wiki. The wiki exists due to the vision and dedicated effort of Cal Lee, Lori Eakin, Kate Theimer and others. You can read more about who contributed energy and resources to bring the wiki to life on the Acknowledgments page.

Are you willing to write about presentations? Direct your attention please to the Session Coverage page. As you plan your schedule for the conference, consider letting others know which panels and round tables you plan to cover. The ultimate goal would be to make sure that at least person has committed to coverage of every session. You don’t need to have a blog to cover a session – you can add your session recap as a page in the wiki. We will make sure it is easy to do when we get that far.

Are you presenting or running a roundtable? Then please consider adding to the basic information in the wiki about your session. You can add links, references, supporting documentation and background information — anything you think might be useful to those considering your session (or unable to attend because of conflicts).

Do you know Chicago? Help us add to the pages listed under the Logistics heading.

Need something to improve your conference experience? There are pages for ride sharing, looking for roommates, and special info for first time conference attendees.

Never contributed to a wiki before? There is a special page for you with tips and another waiting for you to post questions (and remember – the only stupid question is one you never ask).

So what are you waiting for? Cruise on over and take a tour, add what you can and spread the word.

International Environmental Data Rescue Organization: Rescuing At Risk Weather Records Around the World

iedro.jpgIn the middle of my crazy spring semester a few months back, I got a message about volunteer opportunities at the International Environmental Data Rescue Organization (IEDRO). I get emails from from VolunteerMatch.org every so often because I am always curious about virtual volunteer projects (ie, ways you can volunteer via your computer while in your pajamas). I filed the message away for when I actually had more time to take a closer look and it has finally made it to the top of my list.

A non-profit organization, IEDRO states their vision as being “.. to find, rescue, and digitize all historical environmental data and to make those data available to the world community.” They go on to explain on their website:

Old weather records are indeed worth the paper they are written on…actually tens of thousands times that value. These historic data are of critical importance to the countries within which they were taken, and to the world community as well. Yet, millions of these old records have already perished with the valuable information contained within, lost forever. These unique records, some dating back to the 1500s, now reside on paper at great risk from mold, mildew, fire, vermin, and old age (paper and ink deteriorate) or being tossed away because of lack of storage space. Once these data are lost, they are lost forever. There are no back up sources; nothing in reserve.

Why are these weather records valuable? IEDRO gives lots of great examples. Old weather records can:

  • inform the construction and engineering community about maximum winds recorded, temperature extremes, rainfall and floods
  • let farmers know the true frequency of drought, flood, extreme temperatures and in some areas, the amount of sunshine enabling them to better plan crop varieties and irrigation or drainage systems increasing their food production and helping to alleviate hunger.
  • assist in explaining historical events such as plague and famine, movement of cultures, insect movements (i.e. locusts in Africa), and are used in epidemiological studies.
  • provide our global climate computer models with baseline information enabling them to better predict seasonal extremes. This provides more accurate real-time forecasts and warnings and a better understanding of global change and validation of global warming.

The IEDRO site includes excellent scenarios in which accurate historical weather data can help save lives. You can read about the subsistence farmer who doesn’t understand the frequency of droughts well enough to make good choices about the kind of rice he plants, the way that weather impacts the vectorization models of diseases such as malaria and about the computer programs that need historical weather data to accurately predict floods. I also found this Global Hazards and Extremes page on the NCDC’s site – and I wonder what sorts of maps they could make about the weather one or two hundred years ago if all the historical climate data records were already available.

There was additional information available on IEDRO’s VolunteerMatch page. Another activity they list for their organization is: “Negotiating with foreign national meteorological services for IEDRO access to their original observations or microfilm/microfiche or magnetic copies of those observations and gaining their unrestricted permission to make copies of those data”.

IEDRO is making it their business to coordinate efforts in multiple countries to find and take digital photos of at risk weather records. They include information on their website about their data rescue process. I love their advice about being tenacious and creative when considering where these weather records might be found. Don’t only look at the national meteorological services! Consider airports, military sites, museums, private homes and church archives. The most unusual location logged so far was a monastery in Chile.

Once the records are located, each record is photographed with a digital camera. They have a special page showing examples of bad digital photos to help those taking the digital photos in the field, as well as a guidelines and procedures document available in PDF (and therefore easy to print and use as reference offline).

The digital images of the rescued records are then sent to NOAA’s National Climatic Data Center (NCDC) in Asheville, North Carolina. The NCDC is part of the National Environmental Satellite, Data and Information Service (NESDIS) which is in turn under the umbrella of the National Oceanic and Atmospheric Administration (NOAA). The NCDC’s website claims they have the “World’s Largest Archive of Climate Data”. The NCDC has people contracted to transcribe the data and ensure the preservation of the digital image copies. Finally, the data will be made available to the world.

IEDRO already lists these ten countries as locations where activities are underway: Kenya, Malawi, Mozambique, Niger, Senegal, Zambia, Chile, Uruguay, Dominican Republic and Nicaragua.

I am fascinated by this organization. On a personal level it brings together a lot of things I am interested in – archives, the environment, GIS data, temporal data and an interesting use of technology. This is such a great example of records that might seem unimportant – but turn out to be crucial to improving lives in the here and now. It shows the need for international cooperation, good technical training and being proactive. I know that a lot of archivists would consider this more of a scientific research mission (the goal here is to get that data for the purposes of research), but no matter what else these are – they are still archival records.

reCAPTCHA: crowdsourcing transcription comes to life

With a tag-line like ‘Stop Spam, Read Books’ – how can you not love reCAPTCHA? You might have already read about it on Boing Boing , NetworkWorld.com or digitizationblog – but I just couldn’t let it go by without talking about it.

Haven’t heard about reCAPTCHA yet? Ok.. have you ever filled out an online form that made you look at an image and type the letters or numbers that you see? These ‘verify you are a human’ sorts of challenges are used everywhere from on-line concert ticket purchase sites who don’t want scalpers to get too many of the tickets to blogs that are trying to prevent spam. What reCAPTCHA has done is harness this user effort to assist in the transcription of hard to OCR text from digitized books in the Internet Archive. Their website has a great explanation about what they are doing – and they include this great graphic below to show why human intervention is needed.

Why we need reCAPTCHA

reCAPTCHA shows two words for each challenge – one that it knows the transcription of and a second that needs human verification. Slowly but surely all the words OCR doesn’t understand get transcribed and made available for indexing and search.

I have posted before about ideas for transcription using the power of many hands and eyes (see Archival Transcriptions: for the public, by the public) – but my ideas were more along the lines of what the genealogists are doing on sites like USGenWeb. It is so exciting to me that a version of this is out there – and I LOVE their take on it. Rather than find people who want to do transcription, they have taken an action lots of folks are already used to performing and given it more purpose. The statistics behind this are powerful. Apparently 60 million of these challenges are entered every DAY.

Want to try it? Leave a comment on this post (or any post in my blog) and you will get to see and use reCAPTCHA. I can also testify that the installation of this on a WordPress blog is well documented, fast and easy.