Menu Close

Month: May 2008

Learn About Wikis on Second Life (May 25th, 2008)

In case you always wondered how wikis can help archivists, this Sunday (May 25th, 2008) will see archivists gathering in Second Life to answer this question.

  • When: Sunday May 25th, 9pm-10.30pm GMT (5pm-6:30pm EDT)
  • Where: Open Air Auditorium at Cybrary City, Second Life

This sounds like a great way to kill two birds with one stone. If you have been looking for a reason to explore Second Life or you have been wondering about how wikis are being used to benefit archives and special collections (or both!) – this looks like a great combination.

Learn more about this event via the Second Life Library Project post How on Virtual Earth can Wikis Help Archivists?.

In the interest of full disclosure – I admit that I won’t be there. The first (and last) time I tried to explore Second Life I got motion sick after about 15 minutes. I understand that this is not very common – but since I am one of those people who get motion sick watching others play 3D video games I wasn’t too surprised. I have a theory about trying again one day with a Second Life expert at my side to help me tweak my settings to the least ‘hand held camera’ version of the Second Life experience – I just haven’t gotten there yet. Any tips from Second Life gurus welcome!

Clustering Data: Generating Organization from the Ground Up

Flickr: water tag clustersMy trip to the 2008 Information Architecture Summit (IA Summit) down in Miami has me thinking a lot about helping people find information. In this post I am going to examine clustering data.

Flickr Tag Clusters
Tag clusters are not new on Flickr – they were announced way back in August of 2005. The best way to understand tag clusters is to look at a few. Some of my favorites are the water clusters (shown in the image above). From this page you can view the reflection/nature/green cluster, the sky/lake/river cluster, the blue/beach/sun cluster or the sea/sand/waves cluster.

So what is going on here? Basically Flickr is analyzing groupings of tags assigned to Flickr images and identifying common clusters of tags. In our water example above – they found four different sets of tags that occurred together and distinctly apart from other sets of tags. The proof is in the pudding – the groupings make sense. They get at very subtle differences even though the mass of data being analyzed is from many different individuals with many different perspectives.

Tag clusters are very powerful and quite different from tag clouds. Tag clouds, by their nature, are a blunt instrument. They only show you the most popular tags. Take a look at the tag cloud for the Library of Congress photostream on Flickr. I do learn something from this. I get a sense of the broad brush topics, time periods and locations. But if you look at the full list of Library of Congress Flickr tags you see what a small percentage the top 150 really are (and yes.. that page does takes a while to load). Who else is now itching to ask Flickr to generate clusters within the LOC tag set?

Steve.Museum
Another example of cultural heritage images being tagged is the Steve Museum Art Museum Social Tagging Project which lets individuals tag objects from museums via Steve Tagger. It resembles the Library of Congress on Flickr project in that it includes existing metadata with each image and permits users to add any tags they deem appropriate. I think it would be fascinating to contrast the traffic of image taggers on Steve.Museum vs Flickr for a common set of images. Is it better to build a custom interface that users must seek out but where you have complete control over the user experience and collected data? Or is it better to put images in the already existing path of users familiar with tagging images? I have no answers of course. All I know is I wish I could see the tag clusters one could generate off the Steve.Museum tag database. Perhaps someday we will!

Del.icio.us Tags
del.icio.us related tagsDel.icio.us, a web service for storing and tagging your bookmarks online, supports what they call ‘related tags’ and ‘tag bundles’. If you view the page for the tag ‘archives’ – you will see to the far right a list of related tags like those shown in the image here. What is interesting is that if I look at my own personal tag page for archives I see a much longer list of related tags (big surprise that I have a lot of links tagged archives!) and I am given the option of selecting additional tags to filter my list of links via a combination of tags.

Del.icio.us’s ‘tag bundles’ let me create my own named groupings of tags – but I must assemble these groups manually rather than have them generated or suggested. On the plus side, Del.icio.us is very open about publishing its data via APIs and therefore supporting third party tools. I think my favorite off that list for now has to be MySQLicious which mirrors your del.icio.us bookmarks into a MySQL database. Once those tags are in a database, all you need are the right queries to generate the clusters I want to see.

Clusty: Clustered Search Results
Clusty: clusters screen shotAn example of what this might look like for search results can be seen via the search engine Clusty.com from the folks over at Vivisimo. For example – try a search on the term archives. This is one of those search terms for which general web searching is usually just infuriating. Clusty starts us with the same top 2 results as a search for archives on Google does, but it also gives us a list of clusters on the left sidebar. You can click on any of those clusters to filter the search results.

Those groups don’t look good to you? Click the ‘remix’ link in the upper right hand corner of the cluster list and you get a new list of clusters. In a blog post titled Introducing Clustering 2.0 Vivisimo CEO Raul Valdes-Perez explains what happens when you click remix:

With a single click, remix clustering answers the question: What other, subtler topics are there? It works by clustering again the same search results, but with an added input: ignore the topics that the user just saw. Typically, the user will then see new major topics that didn’t quite make the final cut at the last round, but may still be interesting.

I played for a while.. clicking remix over and over. It was as if it was slicing and dicing the facets for me – picking new common threads to highlight. I liked that I wasn’t stuck with what someone else thought was the right way to group things. It gave me the control to explore other groupings.

Ontology is Overrated
Clay Shirky’s talk Ontology is Overrated: Categories, Links and Tags from the spring of 2005 ties a lot of these ideas together in a way that makes a lot of sense to me. I highly recommend you go read it through – but I am going to give away the conclusion here:

It’s all dependent on human context. This is what we’re starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don’t recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we’re dealing with a significant break — by letting users tag URLs and then aggregating those tags, we’re going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

I currently spend my days working with controlled vocabularies for websites, so please don’t think I am suggesting we throw it all away. And yes, you do need a lot of information to reach the critical mass needed to support the generation of useful clusters. But there is something here that can have a real and positive impact on users of cultural heritage materials actually finding and exploring information. We can’t know how everyone will approach our records. We can’t know what aspects of them they will find interesting.

There Is No Box
Archivists already know that much of the value of records is in the picture they paint as a group. A group of records share a context and gives the individual records meaning. Librarians and catalogers have long lived in a world of shelves. A book must be assigned a single physical location. Much has been made (both in the Clay Shirky talk and elsewhere) that on the web there is no shelf.

What if we take the analogy a step further and say that for an online archives there is no box? Of course, just as with books, we still need our metadata telling us who created this record originally (and when and why and which record comes before it and after it) – but picture a world where a single record can be virtually grouped many times over. Computer programs are only going to get better at generating clusters, be they of user assigned tags or search results or other metdata. From where I sit, the opportunity for leveraging clustering to do interesting things with archival records seems very high indeed.

MayDay 2008: Do you have a disaster plan?

MayDay 2008I couldn’t let MayDay 2008 pass without pointing everyone to the amazing annotated list of MayDay resources that the Society of American Archivists (SAA) has made available.

Does your institution have a disaster plan?
If not, the list of resources include a detailed set of Free Disaster Plan Templates. Today is the perfect day to download one and start planning.

A full disaster plan too overwhelming? SAA also provides a tidy list of easy MayDay activity ideas including:

Create or Update Your Contact Lists
One of the most important elements of disaster response is knowing how to contact critical people – emergency responders, staff, and vendors. Make sure your staff members have an up-to-date list that includes as much contact information as possible: work and home phone numbers (including direct lines at work), mobile phone numbers, work and home email addresses, and any other relevant addresses. Staff at many institutions hit by hurricanes in 2005 discovered that they couldn’t use work email or phone numbers because work systems were completely out of commission; those who had an alternative phone number or email address often could connect.

Make Sure Boxes Are Off the Floor
Any number of causes – a broken pipe, a clogged toilet, fire sprinklers – may result in water in your storage areas. If shelf space is limited, use pallets for clearance. Make sure nothing is on the floor where it can be soaked.

Don’t have precious cultural heritage materials under your care? Okay then, how about you? Do you have a Family Disaster Plan and a Disaster Supplies Kit ready?

Image Credit: Society of American Archivists MayDay 2008 Logo.