Menu Close

Year: 2008

Learn About Wikis on Second Life (May 25th, 2008)

In case you always wondered how wikis can help archivists, this Sunday (May 25th, 2008) will see archivists gathering in Second Life to answer this question.

  • When: Sunday May 25th, 9pm-10.30pm GMT (5pm-6:30pm EDT)
  • Where: Open Air Auditorium at Cybrary City, Second Life

This sounds like a great way to kill two birds with one stone. If you have been looking for a reason to explore Second Life or you have been wondering about how wikis are being used to benefit archives and special collections (or both!) – this looks like a great combination.

Learn more about this event via the Second Life Library Project post How on Virtual Earth can Wikis Help Archivists?.

In the interest of full disclosure – I admit that I won’t be there. The first (and last) time I tried to explore Second Life I got motion sick after about 15 minutes. I understand that this is not very common – but since I am one of those people who get motion sick watching others play 3D video games I wasn’t too surprised. I have a theory about trying again one day with a Second Life expert at my side to help me tweak my settings to the least ‘hand held camera’ version of the Second Life experience – I just haven’t gotten there yet. Any tips from Second Life gurus welcome!

Clustering Data: Generating Organization from the Ground Up

Flickr: water tag clustersMy trip to the 2008 Information Architecture Summit (IA Summit) down in Miami has me thinking a lot about helping people find information. In this post I am going to examine clustering data.

Flickr Tag Clusters
Tag clusters are not new on Flickr – they were announced way back in August of 2005. The best way to understand tag clusters is to look at a few. Some of my favorites are the water clusters (shown in the image above). From this page you can view the reflection/nature/green cluster, the sky/lake/river cluster, the blue/beach/sun cluster or the sea/sand/waves cluster.

So what is going on here? Basically Flickr is analyzing groupings of tags assigned to Flickr images and identifying common clusters of tags. In our water example above – they found four different sets of tags that occurred together and distinctly apart from other sets of tags. The proof is in the pudding – the groupings make sense. They get at very subtle differences even though the mass of data being analyzed is from many different individuals with many different perspectives.

Tag clusters are very powerful and quite different from tag clouds. Tag clouds, by their nature, are a blunt instrument. They only show you the most popular tags. Take a look at the tag cloud for the Library of Congress photostream on Flickr. I do learn something from this. I get a sense of the broad brush topics, time periods and locations. But if you look at the full list of Library of Congress Flickr tags you see what a small percentage the top 150 really are (and yes.. that page does takes a while to load). Who else is now itching to ask Flickr to generate clusters within the LOC tag set?

Steve.Museum
Another example of cultural heritage images being tagged is the Steve Museum Art Museum Social Tagging Project which lets individuals tag objects from museums via Steve Tagger. It resembles the Library of Congress on Flickr project in that it includes existing metadata with each image and permits users to add any tags they deem appropriate. I think it would be fascinating to contrast the traffic of image taggers on Steve.Museum vs Flickr for a common set of images. Is it better to build a custom interface that users must seek out but where you have complete control over the user experience and collected data? Or is it better to put images in the already existing path of users familiar with tagging images? I have no answers of course. All I know is I wish I could see the tag clusters one could generate off the Steve.Museum tag database. Perhaps someday we will!

Del.icio.us Tags
del.icio.us related tagsDel.icio.us, a web service for storing and tagging your bookmarks online, supports what they call ‘related tags’ and ‘tag bundles’. If you view the page for the tag ‘archives’ – you will see to the far right a list of related tags like those shown in the image here. What is interesting is that if I look at my own personal tag page for archives I see a much longer list of related tags (big surprise that I have a lot of links tagged archives!) and I am given the option of selecting additional tags to filter my list of links via a combination of tags.

Del.icio.us’s ‘tag bundles’ let me create my own named groupings of tags – but I must assemble these groups manually rather than have them generated or suggested. On the plus side, Del.icio.us is very open about publishing its data via APIs and therefore supporting third party tools. I think my favorite off that list for now has to be MySQLicious which mirrors your del.icio.us bookmarks into a MySQL database. Once those tags are in a database, all you need are the right queries to generate the clusters I want to see.

Clusty: Clustered Search Results
Clusty: clusters screen shotAn example of what this might look like for search results can be seen via the search engine Clusty.com from the folks over at Vivisimo. For example – try a search on the term archives. This is one of those search terms for which general web searching is usually just infuriating. Clusty starts us with the same top 2 results as a search for archives on Google does, but it also gives us a list of clusters on the left sidebar. You can click on any of those clusters to filter the search results.

Those groups don’t look good to you? Click the ‘remix’ link in the upper right hand corner of the cluster list and you get a new list of clusters. In a blog post titled Introducing Clustering 2.0 Vivisimo CEO Raul Valdes-Perez explains what happens when you click remix:

With a single click, remix clustering answers the question: What other, subtler topics are there? It works by clustering again the same search results, but with an added input: ignore the topics that the user just saw. Typically, the user will then see new major topics that didn’t quite make the final cut at the last round, but may still be interesting.

I played for a while.. clicking remix over and over. It was as if it was slicing and dicing the facets for me – picking new common threads to highlight. I liked that I wasn’t stuck with what someone else thought was the right way to group things. It gave me the control to explore other groupings.

Ontology is Overrated
Clay Shirky’s talk Ontology is Overrated: Categories, Links and Tags from the spring of 2005 ties a lot of these ideas together in a way that makes a lot of sense to me. I highly recommend you go read it through – but I am going to give away the conclusion here:

It’s all dependent on human context. This is what we’re starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don’t recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we’re dealing with a significant break — by letting users tag URLs and then aggregating those tags, we’re going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

I currently spend my days working with controlled vocabularies for websites, so please don’t think I am suggesting we throw it all away. And yes, you do need a lot of information to reach the critical mass needed to support the generation of useful clusters. But there is something here that can have a real and positive impact on users of cultural heritage materials actually finding and exploring information. We can’t know how everyone will approach our records. We can’t know what aspects of them they will find interesting.

There Is No Box
Archivists already know that much of the value of records is in the picture they paint as a group. A group of records share a context and gives the individual records meaning. Librarians and catalogers have long lived in a world of shelves. A book must be assigned a single physical location. Much has been made (both in the Clay Shirky talk and elsewhere) that on the web there is no shelf.

What if we take the analogy a step further and say that for an online archives there is no box? Of course, just as with books, we still need our metadata telling us who created this record originally (and when and why and which record comes before it and after it) – but picture a world where a single record can be virtually grouped many times over. Computer programs are only going to get better at generating clusters, be they of user assigned tags or search results or other metdata. From where I sit, the opportunity for leveraging clustering to do interesting things with archival records seems very high indeed.

MayDay 2008: Do you have a disaster plan?

MayDay 2008I couldn’t let MayDay 2008 pass without pointing everyone to the amazing annotated list of MayDay resources that the Society of American Archivists (SAA) has made available.

Does your institution have a disaster plan?
If not, the list of resources include a detailed set of Free Disaster Plan Templates. Today is the perfect day to download one and start planning.

A full disaster plan too overwhelming? SAA also provides a tidy list of easy MayDay activity ideas including:

Create or Update Your Contact Lists
One of the most important elements of disaster response is knowing how to contact critical people – emergency responders, staff, and vendors. Make sure your staff members have an up-to-date list that includes as much contact information as possible: work and home phone numbers (including direct lines at work), mobile phone numbers, work and home email addresses, and any other relevant addresses. Staff at many institutions hit by hurricanes in 2005 discovered that they couldn’t use work email or phone numbers because work systems were completely out of commission; those who had an alternative phone number or email address often could connect.

Make Sure Boxes Are Off the Floor
Any number of causes – a broken pipe, a clogged toilet, fire sprinklers – may result in water in your storage areas. If shelf space is limited, use pallets for clearance. Make sure nothing is on the floor where it can be soaked.

Don’t have precious cultural heritage materials under your care? Okay then, how about you? Do you have a Family Disaster Plan and a Disaster Supplies Kit ready?

Image Credit: Society of American Archivists MayDay 2008 Logo.

Of Pirates, Treasure Chests and Keys: Improving Access to Digitized Materials

Key to Anything by Stoker Studios (flickr)Dan Cohen posted yesterday about what he calls The Pirate Problem. Basically the Pirate Problem can be summed up as “there are ways of acting and thinking that we can’t understand or anticipate.” Why is that a ‘Pirate Problem’? Because a pirate pub opened near his home and rather than folding shortly thereafter due to lack of interest from the ‘very serious professionals’ who populate DC suburbs – the pub was a rousing success due to the pirate aficionados who came out of the woodwork to sing sea shanties and drink grog. This surprising turn of events highlighted for him the fact that there are many ways of acting and thinking (some people even know all the words to sea shanties without needing sheet music).

Dan recently delivered the keynote speech at a workshop at the University of North Carolina at Chapel Hill. The workshop brought together dozens of historians to talk about how the 16 million archival documents of the Southern Historical Collection (SHC) should be put online. He devoted his keynote “to prodding the attendees into recognizing that the future of archives and research might not be like the past” and goes on in his post to explain:

The most memorable response from the audience was from an award-winning historian I know from my graduate school years, who said that during my talk she felt like “a crab being lowered into the warm water of the pot.” Behind the humor was the difficult fact that I was saying that her way of approaching an archive and understanding the past was about to be replaced by techniques that were new, unknown, and slightly scary.

This resistance to thinking in new ways about digital archives and research was reflected in the pre-workshop survey of historians. Extremely tellingly, the historians surveyed wanted the online version of the SHC to be simply a digital reproduction of the physical SHC.

Much of the stress of Dan’s article is on fear of new techniques of analysis. The choppy waters of text mining and pattern recognition threaten to wash away traditional methods of actually reading individual pages and “most historians just want to do their research they way they’ve always done it, by taking one letter out of the box at a time”.

I certainly like the idea of new technologically based ways of analyzing large sets of cultural heritage materials, but I also believe that reading individual letters will always be important. The trick is finding the right letter!

And of course – we still need the context. It isn’t as if when we digitize major collections like the SHC that we are going to scan and OCR each page without regard to which box it came out of. We can’t slice and dice archival records and manuscripts into their component parts to feed into text analysis with no way back to the originals.

I like to imagine the combination of all the new technology (be it digitization, cross collection searching, text mining or pattern recognition) as creating keys to different treasure chests. Humanities scholars are treasure hunters. Some will find their gems through careful reading of individual passages. Others will discover patterns spread across materials now co-existing virtually that before digitization would have been widely separated by space and time. Both methods will benefit from the digitization of materials and the creation of innovative search and text analysis tools. Both still require an understanding of a material’s origin. The importance of context isn’t going anywhere – we still need to know which box the letter came from (and in a perfect world, which page came before and which came after). I want scholars to still be able to read one page from the box – I just want them to be able to do it from home in the middle of the night if they are so inclined with their travel budget no worse for wear.

Dan ties his post together by pointing out that:

… in Chapel Hill I was the pirate with the strange garb and ways of behaving, and this is a good lesson for all boosters of digital methods within the humanities. We need to recognize that the digital humanities represent a scary, rule-breaking, swashbuckling movement for many historians and other scholars.

In my opinion, the core message should be that we just found more locked treasure chests – and for those who are interested, we have some new keys that just might open those locks. I enjoyed the Pirate metaphor (obviously) and I appreciate that there are real issues here relating to strong discomfort with the fast changing landscape of technology, but I have to believe that if we do something that prevents historians from being able to read one letter at a time we are abandoning the treasure chests that are already open for the new ones for which we haven’t yet found the right keys. I am greedy. I want all the treasure!

Image credit: key to anything by Stoker Studios via flickr

Copyright Slider: Quick Easy Access to Copyright Laws and Guidelines

ALA OITP Copyright SliderThanks to Digitization 101’s post I learned about the Copyright Slider. A creation of the ALA’s Office for Information Technology Policy (OITP) – you can find more official information over on ALA’s Washington Office blog (Let the OITP Copyright Slider Answer Your Questions!) and order one of your own for only a bit more than $5 (less if you order in bulk).

The Copyright Slider lets you answer questions such as (quoting the post linked to above):

  • Is a work in the public domain?
  • Do you need permission to use it?
  • When does copyright expire?

Here is their example of how it might be used:

A library in rural Pennsylvania is digitizing its local historical collection on the copper mining industry in the region. One of the collection texts, Memoirs of a Copper Miner, was published in 1953 and is still protected by copyright. Or is it? Align the black arrow on the slide-chart to materials published between 1923 and 1963 and discover that works originally published in the U.S. between 1923 and 1977 without a copyright symbol are in the public domain! Memoirs of a Copper Miner was published in 1953 and does not have a copyright symbol. Let the digitizing begin!

This looks like a dandy little tool to have in your desk drawer and I plan to order one sometime soon.

My next question is how hard would it be to make a slick flash version of this that could live online and be updated as copyright rules change?

Image Credit: A cropped version of a photo from the District Dispatch blog post quoted above. 

SAA2008: PDFs of Conference Presentations

I found another reason recently to be excited about the progress of SAA’s online presence. Buried in the ARCHIVES 2008: Archival R/Evolution & Identities Checklist for Presenters is first tidbits of a plan to provide access to PDF versions of conference presentations on the SAA website.

Send an Electronic Copy of Your Presentation to SAA. The conference organizers would like to offer meeting attendees the opportunity to view presentations after the conference on the SAA 2008 Annual Meeting website (www.archivists.org). If you’ll supply a copy of your presentation, we’ll convert it to a PDF and post it. Please note that by sending SAA a copy of your presentation in electronic format, you grant permission for your presentation to be viewed by all SAA 2008 Annual Meeting attendees.

I am so pleased! I have always wanted access to the presentations – both for those sessions I attend and those I cannot. I have often been that person hovering at the edge of the stage after a panel, waiting to request a soft copy of the presentation.

I do wonder what they mean when they say that the presentations will be “viewable by meeting attendees”. In my heart of hearts I hope they go a step further and let the speakers sign off on these presentations being shared with the world (or at least with all of SAA). I haven’t gone through every Session Page on the SAA 2007 Un-Official Wiki, but I believe that not very many presenters took the opportunity to provide links to soft copies of their presentations. I hope that SAA is more successful on this front.

No matter the choices made relating to immediate access – I see this as a big step forward in the commitment to using technology. I think one of the best ways to learn is through getting your hands dirty. Technology is listed as one of SAA’s strategic priorities. Every choice that SAA makes that encourages their membership to become more tech-savvy is a step towards supporting that priority.

Big Digital Step For SAA: American Archivist Online

SAA LogoThe Society of American Archivists has officially launched American Archivist Online (also available via the Members Only page once you login to archvists.org).

Here are a few key points that caught my eye from the FAQ :

  • Content is available as PDF files with embedded searchable text (one file per article or section of the journal)
  • It is hosted by MetaPress
  • The online version will be produced in parallel with the print version

What issues are online?

Fall/Winter 2000 (Volume 63 – Number 2) through the most recent issue – Fall/Winter 2007. The FAQ reports that additional back issues will be digitized over time.

How is it structured?

Each journal article is a separate PDF file. Talk about a boon to graduate students and archives professors everywhere! Even the front matter is there separated out – perfect for printing and attaching to your article printouts for future reference. Of course, if you are feeling green (and better at reading on screen than I am) you can bookmark them or save them locally for future reference.

Who can access it?

Officially, only members of SAA and individual or institutional subscribers to the journal can access all available issues. In reality, it appears most of the issues are available to everyone. Currently only the Fall/Winter issues of 2005, 2006 & 2007 restrict access to all the content. Even for these issues there is access to some of the articles – such as the Book Reviews section in both the 2005 and 2007 Fall/Winter issues.

The FAQ claims that non-subscribers must pay a fee to print an article – but I don’t see how they will enforce that. When viewing a PDF of an article from the most recent issue I was able to save it to my local desktop and print it without a problem. Not sure if that is a bug or how it will remain – or if maybe they are talking about official reprints that are sent through the mail?

Other features

  • Try the handy Article Category search links – like this one that shows all the Presidential Addresses.
  • Mark or save articles to your own private lists (if you are logged in)
  • Search the full text – either across the journal or within an individual issue.
  • Subscribe to the RSS feed (I spotted on the All Issues page). The feed includes the article abstract, category, author and source issue information. Be the first archivist on your block to know the instant the new issue is published online!

Final Thoughts

I think that everyone who heard President Adkins announce at SAA in Chicago that the American Archivist was going online was excited (well.. there was lots of clapping – that is for sure). That announcement was a strong indications to me of SAA’s commitment to improving their online offerings.

Finally seeing it available online is even better – action speaks louder than words.

Image Credit: SAA Logo from http://archivists.org/

ISSUU: Interesting Platform for Online Publishing

Issuu, with the tag line ‘Read the world. Publish the world.’ and pronounced ‘issue’, gives anyone the ability to upload a PDF document and publish it as an online magazine. I am intrigued by the possibilities of using this service to publish digitized archival records – especially those that would lend themselves to a ‘book’ style presentation (thinking here of a ledger or equivalent).

I am not sure I totally understand the implications of the Issuu Terms of service… especially this part:

By distributing or disseminating Uploader Submissions through the Issuu Service, you hereby grant to Issuu a worldwide, non-exclusive, transferable, assignable, fully paid-up, royalty-free, license to host, transfer, display, perform, reproduce, distribute, and otherwise exploit your Uploader Submissions, in any media forms or formats, and through any media channels, now known or hereafter devised, including without limitation, RSS feeds, embeddable functionality, and syndication arrangements in order to distribute, promote or advertise your Uploader Submissions through the Issuu Service.

If I am following that properly, all the rights you are granting to the Issuu Service are only for the purposes of their distribution of your uploaded PDF.

Issuu has a special Copyright FAQ, which in combination with Peter Hirtle‘s page on Copyright Term and the Public Domain in the United States, should support those trying to figure out if they can upload what they want to upload without getting into copyright related hot water.

So how is it different from a plain old PDF? Take a look at the embedded Issuu viewer below showing a 1908 copy of The Colonial Book of The Towle Manufacturing Company Silversmiths.

I don’t think this would ever be the way you would want to give online access to digitized records in general – but I do think that this could be a great way to highlight a particularly impressive set or volume of documents. If an archives featured one of these a month on their homepage – would people subscribe to their RSS feed just to see the new one? On the actual page on which I found the above document, Issuu makes it easy to subscribe to the RSS feed for the Issuu author ‘silverlibrary’.

I don’t know why Issuu has decided that I must create an account before I may view document author silverlibrary’s user profile. I would hope that there was an elegant way for visitors to see a group of Issuu documents created by the same author without having to create an account first (or ever).

Want to know what others think? Take a look at Finally, a Web-based PDF Viewer That Does Not Suck (Issuu) over on TechCrunch. One interesting tidbit I picked up from that review is that Issuu is based in Denmark. I wonder what impact that has on which copyright rules apply to the documents uploaded into Issuu.

Want to read more about their vision? Of course they have a press release in the form of an Issuu publication and I have embedded it below. I think my favorite line is that Issuu is intended to be ‘YouTube for Publications’.

I would love to see a highlighted section created for ‘cultural heritage materials’ (or something like that anyway). Take a look around Issuu and let me know what you think. Is this a viable tool for an archives or manuscript collection to use to highlight parts of their collection?

New Skills for a Digital Era: Official Proceedings Now Available

New Skills for a Digital Era LogoFrom May 31st through June 2nd of 2006, The National Archives, the Arizona State Library and Archives, and the Society of American Archivists hosted a colloquium to consider the question “What are the practical, technical skills that all library and records professionals must have to work with e-books, electronic records, and other digital materials?”. The website for the New Skills for a Digital Era colloquium already includes links to the eleven case studies considered over the course of the three days of discussion as well as a list of additional suggested readings. As mentioned over on The Ten Thousand Year Blog, the pre-print of the proceedings has been available since August, 2007.

As announced in SAA’s online newsletter, the Official Proceedings of the New Skills for a Digital Era Colloquium, edited by Richard Pearce-Moses and Susan E. Davis, is now available for free download. Published under Creative Commons Attribution, this document is 143 pages long and includes all the original case studies. I have a lot of reading to do!

The meat of the proceedings consists of a 32 page ‘Knowledge and Skills Inventory’ and a page and a half of reflections – both co-authored by Richard Pearce-Moses and Susan E. Davis. The Keynote Address by Margaret Hedstrom titled ‘Are We Ready for New Skills Yet?’ is also included.

I am very pleased with how much access has been provided to these materials. These topics are clearly of interest to many beyond the 60 individuals who were able to take part in the original gathering. As an archival studies student it has often been a great source of frustration that so few of the archives related conferences publish proceedings of any kind. It is part of what has driven me to attempt to assemble exhaustive session summaries for those sessions I have personally attended at the past two SAA Annual meetings (see SAA2006 and SAA2007). I think that the Unofficial Conference Wiki for SAA2007 was also a big step in the right direction and I hope it will continue to evolve and improve for the upcoming SAA2008 annual meeting in San Francisco.

The course I elected to take this term is dedicated to studying Communities of Practice. This announcement about the New Skills for a Digital Era’s proceedings has me thinking about the community of practice that seems to currently be taking form across the library, archives and records management communities. I will share more thoughts on this as I sort through them myself.

Finally, a question for anyone reading this post who attended the colloquium: Are you still discussing the case studies with others from that session two years ago? If not, do you wish you were?

Image Credit: The image at the top of this post is from the New Skills for a Digital Era website.