Menu Close

Year: 2008

SAA Wiki 2008: Create an account and add your voice!

SAA 2008 WikiAs of this writing, seventy-three individuals have created accounts on the UnOfficial Wiki of the 2008 SAA Annual Meeting in San Francisco. Where are the rest of you? For all of you wondering why to create an account, here are some reasons to join the wiki fun:

Not presenting?  There are still plenty of ways you can use the wiki to improve your conference experience.

Not going to the conference? Look through the Introductions page and take the opportunity to reconnect with your colleagues. The annual meeting gives everyone a chance to focus on the latest thoughts and activities in the archives community – no matter where you are. See a session you wish you could attend? Add a note to that session’s page – let the presenters and those who might blog the session know about your interest.

Have questions or need help? Drop me a message via my contact page and I will lend a hand. Remember – wikis are very sturdy, you won’t break it!

Online Interactive U.S. Copyright Slider

Digital Copyright Slider

Remember when I posted about the Copyright Slider: Quick Easy Access to Copyright Laws and Guidelines? This was my last line in that post:

My next question is how hard would it be to make a slick flash version of this that could live online and be updated as copyright rules change?

Well, thanks to Digitization 101’s A digital version of the Copyright Slider post I discovered that exactly what I wished for now exists. Go take the Digital Copyright Slider for a spin.

The interface is clear and simple – they did a great job of taking advantage of the interactive medium to do things they couldn’t do on the paper slider. If it won’t disturb your neighbors, turn up the volume to hear the satisfying click each time you move the slider to a new scenario. Make sure you click on some of the *s to see more detailed information. Take note of the advice regarding more complicated scenarios as well as the links directly to documents detailing specific copyright laws.

I love that it has been licensed under the Creative Commons Attribution-Noncommercial-Share Alike license. The creators have included their contact information along with the idea that other institutions could host custom copies of the slider with their own copyright research contacts. The only downside I see to this is that if there are changes to US copyright law, it will take time for updates to a central copy of the slider to propagate to local customized copies.

The final question is how fast they can update the slider in the event of changes to copyright law – but we will have to wait on changes to the US copyright landscape before we can find that out!

Image Credit: Image above is taken directly from a screen shot of the Digital Copyright Slider.

Group Looking for Accreditation of Archival Education by SAA

Just in case you haven’t seen the postings elsewhere – a group of archivists and archivists-in-training is gathering support for their Request to Appoint a Task Force to Examine the Feasibility of SAA Accreditation of Graduate Archival Education Programs. The plan is to submit this as an item for consideration by the Council of the Society of America Archivists at the annual meeting in San Francisco in August. The full text of the request is online, along with this recent update:

Update on July 29: We’ve received more time to collect feedback and support. If you are interested in signing on in support of this submission to SAA Council for their August 2008 meeting, please send an email to Christine Di Bella at cdibella@gmail.com by August 4, 2008. Please indicate whether you are an SAA member in your message.

Sound like something you might want to through your support behind? Take note of that fast approaching deadline and go take a look at the full text of the request.

Dipity: Easy Hosted Timelines

Dipity LogoI discovered Dipity via the Reuters article An open-source timeline of the virtual world. The article discusses the creation of a Virtual Worlds Timeline on the Dipity website. Dipity lets anyone create an account and start building timelines. In the case of the Virtual Worlds Timeline, the creator chose to permit others to collaborate on the timeline. Dipity also provides four ways of viewing any timeline: a classic left to right scrolling view, a flipbook, a list and a map.

I chose to experiment by creating a timeline for Spellbound Blog. Dipity made this very easy – I just selected WordPress and provided my blog’s URL. This was supposed to grab my 20 most recent posts – but it seems to have taken 10 instead. I tried to provide a username/password so that Dipity could pull ‘more’ of my posts (they didn’t say how many – maybe all of them?). I couldn’t get it to work as of this writing – but if I figure it out you will see many more than 10 posts.

I particularly like the way they use the images I include in my posts in the various views. I also appreciate that you can read the full posts in-place without leaving the timeline interface. I assume this is because I publish my full articles to my RSS feed. It was also interesting to note that posts that mentioned a specific location put a marker on a map – both within the single post ‘event’ as well as the full map view.

Dipity also supports the streamlined addition of many other sources such as Flickr, Picasa, YouTube, Vimeo, Blogger, Tumblr, Pandora, Twitter and any RSS feed. They have also created some neat mashups. TimeTube uses your supplied phrase to query YouTube and generates a timeline based on the video creation dates. Tickr lets you generate an interactive timeline based on a keyword or user search of Flickr.

Why should archivists care? I always perk up anytime a new web service appears that makes it easy to present time and location sensitive information. I wrote a while ago about MIT’s SIMILE project and I like their Timeline software, but in some ways hosted services like Dipity throw the net wider. I particularly appreciate the opportunity for virtual collaboration that Dipity provides. Imagine if every online archives exhibit included a Dipity timeline? Dipity provides embed code for all the timelines. This means that it should be easy to both feature the timeline within an online exhibit and use the timeline as a way to attract a broader audience to your website.

There has been discussion in the past about creating custom GoogleMaps to show off archival records in a new and different way.  During THATCamp there was a lot of enthusiasm for timelines and maps as being two of the most accessible types of visualizations. By anchoring information in time and/or location it gives people a way to approach new information in a predictable way.

Most of my initial thoughts about how archives could use Dipity related to individual collections and exhibits – but what if an archive created one of these timelines and added an entry for every one of their collections. The map could be used if individual collections were from a single location. The timeline could let users see at a glance what time periods were the focus of collections within that archives. A link could be provided in each entry pointing to the online finding aid for each collection or record group

Dipity is still in working out the kinks of some of their services, but if this sounds at all interesting I encourage you to go take a look at a few fun examples:

And finally I have embedded the Internet Memes timeline below to give you a feel of what this looks like. Try clicking on any of the events that include a little film icon at the bottom edge and see how you can view the video right in place:

Image Credit:  I found and ‘borrowed’ the Dipity image above from Dipity’s About page.

Flickr Terms of Service, Unwritten Guidelines and Safety Levels

Flickr: Free Click by fikra (Sami Ben Gharbia)As more cultural heritage institutions add photos to Flickr, such as these sets added by the Smithsonian, an AP article discussing freedom of expression in online public spaces identifies some some issues that deserve attention. In ‘Public’ online spaces don’t carry speech, rights, Anick Jesdanun highlights a number of scenarios in which service providers (such as the Yahoo! owned Flickr) clash with their users, including this one (italics my own):

Dutch photographer Maarten Dors met the limits of free speech at Yahoo Inc.’s photo-sharing service, Flickr, when he posted an image of an early-adolescent boy with disheveled hair and a ragged T-shirt, staring blankly with a lit cigarette in his mouth.

Without prior notice, Yahoo deleted the photo on grounds it violated an unwritten ban on depicting children smoking. Dors eventually convinced a Yahoo manager that – far from promoting smoking – the photo had value as a statement on poverty and street life in Romania. Yet another employee deleted it again a few months later.

This image on Flickr gives more details about the photo being removed – and this is the reinstated photo in question. The article points out “Service providers write their own rules for users worldwide and set foreign policy when they cooperate with regimes like China. They serve as prosecutor, judge and jury in handling disputes behind closed doors.” It makes me wonder if the ‘unwritten guidelines’ are applied evenly across Flickr. With the creation of The Commons area, it would be easy to create two standards – one for the general public and another for ‘blessed’ institutions. Images that are acceptable from the Brooklyn Museum (consider this set of Behind The Scenes photos of the Ron Mueck exhibition) might not be accepted from the average person. In my research I discovered a set of Public Domain photos from the National Archives. Some of the photos included in this set are historically valuable images that I would not necessarily want a child to see. Does this mean they shouldn’t be on Flickr? I don’t think so, but that certainly isn’t up to me.

Here are the relevant passages of the Yahoo! Terms of Service:

You agree to not use the Service to:

  1. upload, post, email, transmit or otherwise make available any Content that is unlawful, harmful, threatening, abusive, harassing, tortious, defamatory, vulgar, obscene, libelous, invasive of another’s privacy, hateful, or racially, ethnically or otherwise objectionable;
  2. harm minors in any way;

You acknowledge that Yahoo! may or may not pre-screen Content, but that Yahoo! and its designees shall have the right (but not the obligation) in their sole discretion to pre-screen, refuse, or remove any Content that is available via the Service. Without limiting the foregoing, Yahoo! and its designees shall have the right to remove any Content that violates the TOS or is otherwise objectionable.

That bit about ‘otherwise objectionable’ could be used to cover removal of anything. Being subject to the terms of service of Internet service providers is nothing new, but as archives, libraries and other cultural heritage institutions look for ways to increase their revenue streams and explore innovative ways to bring more eyes to their materials it will become more import to understand these guidelines.

I understand (as the author of the article that inspired this post also points out) that Yahoo! is a business. Their priorities are not always going to be the same as those of the National Archives or the Brooklyn Museum. There are definitely images from history and the world of art that are only appropriate for adults, but isn’t that what Flickr’s content filter feature, named SafeSearch, is all about? These are the three ‘safety levels’ available on Flickr:

  • Safe – Content suitable for a global, public audience
  • Moderate – If you’re not sure whether your content is suitable for a global, public audience but you think that it doesn’t need to be restricted per se, this category is for you
  • Restricted – This is content you probably wouldn’t show to your mum, and definitely shouldn’t be seen by kids

It is interesting that Flickr has it’s own separate list of Community Guidelines, independent of Yahoo!’s terms of service. This is the passage from these guidelines about filtering content:

Take the opportunity to filter your content responsibly. If you would hesitate to show your photos or videos to a child, your mum, or Uncle Bob, that means it needs to be filtered. So, ask yourself that question as you upload your content and moderate accordingly. If you don’t, it’s likely that one of two things will happen. Your account will be reviewed then either moderated or terminated by Flickr staff.

I am still not sure what safety level I would use for a photo showing rows of dead in a concentration camp. I guess given the choices, ‘restricted’ is the best option – but that still doesn’t sit right with me somehow. I did an advanced Flickr search for ‘concentration camp’ with SafeSearch on – and those photos are not currently being marked as restricted. Who is it that we expect to be protecting using SafeSearch? From Flickr’s definition above it is supposed to at least be kids (and maybe your mom and Uncle Bob).

I think the question of the moment is how to know which images are appropriate to upload if some of the guidelines are unwritten. Flickr is a community and understanding the community is essential to success within that community. Once you believe your images are appropriate to include, then you must decide the right ‘safety level’. It is not clear to me how to tell the difference between an image that is not appropriate to be uploaded to Flickr and an image that is okay but needs to be marked with a safety level of ‘restricted’. I am very interested to see how this category of ‘appropriate but restricted’ evolves. For now, I am going to keep a watch on how the Flickr Commons grows and what range of content is included. The final answer for some of these images may be to only provide them via the institutions’ web sites rather than via service providers such as Flickr.

Image credit: Free Click by fikra (Sami Ben Gharbia) via Flickr

SAA2008: The Wiki is Online

2008 wiki logoAs you may have heard elsewhere, the wiki to support the 2008 annual meeting of the Society of American Archivists is now online and waiting for your contributions.

Check out (or add to) the the pages with Maps of San Francisco, hotel information and details about public transport. Look for a roommate or a rideshare. Learn about or organize an unofficial event.

New to wikis? Well, there is a page just for you!

New to SAA Conferences? Check out the SAA First-Timer Tips. Been a million times? Well then go make sure that the First-Timer Tips page includes everything it should!

What I mention above just scratches the surface of what is on the wiki… and remember, the goal isn’t only to read but also update, add and correct the wiki. Because a full history of every page is kept there is no way for you to do anything wrong such that we cannot roll back to a prior version very easily. I am also offering help for anyone new and nervous with wikis. Either post a question on my profile page on the wiki or send me a message via my contact page.

THATCamp 2008: Day 1 Dork Short Lightening Talks

lightningDuring lunch on the first day of THATCamp people volunteered to give lightning talks they called ‘Dork Shorts’. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info into my laptop. If you are looking for examples of inspirational and innovative work at the intersection of technology and the humanities – these are a great place to start!

Have more links to projects I missed including? Please add them in the comments below.

Image credit: Lightning by thenss (Christopher Cacho) via flickr

THATCamp 2008: Text Mining and the Persian Carpet Effect

alarch: Drift of Harrachov mine (Flickr)I attended a THATCamp session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible – but please forgive the fact that I did not catch the names of everyone who was part of this session.

What Is Text Mining?

Text mining is an umbrella phrase that covers many different techniques and types of tools.

The CHNM NEH-funded text mining initiative defined text mining as needing to support these three research functions:

  • Locating or finding: improving on search
  • Extraction: once you find a set of interesting documents, how do you extract information in new (and hopefully faster) ways? How do you pull data from unstructured bulk into structured sets?
  • Analysis: support analyzing the data, discovery of patterns, answering questions

The group discussed that there were both macro and micro aspects to text mining. Sometimes you are trying to explore a collection. Sometimes you are trying to examine a single document in great detail. Still other situations call for using text mining to generate automated classification of content using established vocabularies. Different kinds of tools will be important during different phases of research.

Projects, Tools, Examples & Cool Ideas

Andrea Eastman-Mullins, from Alexander Street Press, mentioned the University of Chicago’s ARTFL Project and these two tools:

  • PhiloLogic: An XML/SGML based full-text search, retrieval and analysis tool
  • PhiloMine: a extension being developed for PhiloLogic to provide support for “a variety of machine learning, text mining, and document clustering tasks”.

Dan Cohen directed us to his post about Mapping What Americans Did on September 11 and to Twistori which text mines Twitter.

Other Projects & Examples:

Some neat ideas that were mentioned for ways text mining could be used (lots of other great ideas were discussed – these are the two that made it into my notes):

  • Train a tool with collections of content from individual time periods, then use the tool to assist in identification of originating time period for new documents. Also could use this same setup to identify shifts in patterns in text by comparing large data sets from specific date ranges
  • If you have a tool that has learned how to classify certain types of content well… then watch for when it breaks – this can give you interesting trails to things to investigate.

Barriers to Text Mining

All of the following were touched upon as being barriers or challenges to text mining:

  • access to raw text in gated collections (ie, collections which require payment to permit access to resources) such as JSTOR and Project MUSE and others.
  • tools that are too difficult for non-programmers to use
  • questions relating to the validity of text mining as a technique for drawing legitimate conclusions

Next Steps

These ideas were ones put forward as important to move forward the field of text mining in the humanities:

  • develop and share best practices for use when cultural heritage institutions make digitization and transcription deals with corporate entities
  • create frameworks that enable individuals to reproduce the work of others and provide transparency into the assumptions behind the research
  • create tools and techniques that smooth the path from digitization to transcription
  • develop focused, easy-to-use tools that bridge the gap between computer programmers and humanities researchers

My thoughts
During the session I drew a parallel between the information one can glean in the field of archeology from the air that cannot be realized on the ground. I discovered it has a name:

“Archaeologists call it the Persian carpet effect. Imagine you’re a mouse running across an elaborately decorated rug. The ground would merely be a blur of shapes and colors. You could spend your life going back and forth, studying an inch at a time, and never see the patterns. Like a mouse on a carpet, an archaeologist painstakingly excavating a site might easily miss the whole for the parts.” from Airborne Archaeology, Smithsonian magazine, December 2005 (emphasis mine)

While I don’t see any coffee table books in the near future of text mining (such as The Past from Above: Aerial Photographs of Archaeological Sites), I do think that this idea captures the promise that we have before us in the form of the text mining tools. Everyone in our session seemed to agree that these tools will empower people to do things that no individual could have done in a lifetime by hand. The digital world is producing terabytes of text. We will need text mining tools just to find our way in this blizzard of content. It is all well and good to know that each snowflake is unique – but tell that to the 21st century historian soon to be buried under the weight of blogs, tweets, wikis and all other manner of web content.

Image credit: Drift of Harrachov Mine by alarch via flickr

As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.