Menu Close

Category: software

Online Interactive U.S. Copyright Slider

Digital Copyright Slider

Remember when I posted about the Copyright Slider: Quick Easy Access to Copyright Laws and Guidelines? This was my last line in that post:

My next question is how hard would it be to make a slick flash version of this that could live online and be updated as copyright rules change?

Well, thanks to Digitization 101’s A digital version of the Copyright Slider post I discovered that exactly what I wished for now exists. Go take the Digital Copyright Slider for a spin.

The interface is clear and simple – they did a great job of taking advantage of the interactive medium to do things they couldn’t do on the paper slider. If it won’t disturb your neighbors, turn up the volume to hear the satisfying click each time you move the slider to a new scenario. Make sure you click on some of the *s to see more detailed information. Take note of the advice regarding more complicated scenarios as well as the links directly to documents detailing specific copyright laws.

I love that it has been licensed under the Creative Commons Attribution-Noncommercial-Share Alike license. The creators have included their contact information along with the idea that other institutions could host custom copies of the slider with their own copyright research contacts. The only downside I see to this is that if there are changes to US copyright law, it will take time for updates to a central copy of the slider to propagate to local customized copies.

The final question is how fast they can update the slider in the event of changes to copyright law – but we will have to wait on changes to the US copyright landscape before we can find that out!

Image Credit: Image above is taken directly from a screen shot of the Digital Copyright Slider.

Flickr Terms of Service, Unwritten Guidelines and Safety Levels

Flickr: Free Click by fikra (Sami Ben Gharbia)As more cultural heritage institutions add photos to Flickr, such as these sets added by the Smithsonian, an AP article discussing freedom of expression in online public spaces identifies some some issues that deserve attention. In ‘Public’ online spaces don’t carry speech, rights, Anick Jesdanun highlights a number of scenarios in which service providers (such as the Yahoo! owned Flickr) clash with their users, including this one (italics my own):

Dutch photographer Maarten Dors met the limits of free speech at Yahoo Inc.’s photo-sharing service, Flickr, when he posted an image of an early-adolescent boy with disheveled hair and a ragged T-shirt, staring blankly with a lit cigarette in his mouth.

Without prior notice, Yahoo deleted the photo on grounds it violated an unwritten ban on depicting children smoking. Dors eventually convinced a Yahoo manager that – far from promoting smoking – the photo had value as a statement on poverty and street life in Romania. Yet another employee deleted it again a few months later.

This image on Flickr gives more details about the photo being removed – and this is the reinstated photo in question. The article points out “Service providers write their own rules for users worldwide and set foreign policy when they cooperate with regimes like China. They serve as prosecutor, judge and jury in handling disputes behind closed doors.” It makes me wonder if the ‘unwritten guidelines’ are applied evenly across Flickr. With the creation of The Commons area, it would be easy to create two standards – one for the general public and another for ‘blessed’ institutions. Images that are acceptable from the Brooklyn Museum (consider this set of Behind The Scenes photos of the Ron Mueck exhibition) might not be accepted from the average person. In my research I discovered a set of Public Domain photos from the National Archives. Some of the photos included in this set are historically valuable images that I would not necessarily want a child to see. Does this mean they shouldn’t be on Flickr? I don’t think so, but that certainly isn’t up to me.

Here are the relevant passages of the Yahoo! Terms of Service:

You agree to not use the Service to:

  1. upload, post, email, transmit or otherwise make available any Content that is unlawful, harmful, threatening, abusive, harassing, tortious, defamatory, vulgar, obscene, libelous, invasive of another’s privacy, hateful, or racially, ethnically or otherwise objectionable;
  2. harm minors in any way;

You acknowledge that Yahoo! may or may not pre-screen Content, but that Yahoo! and its designees shall have the right (but not the obligation) in their sole discretion to pre-screen, refuse, or remove any Content that is available via the Service. Without limiting the foregoing, Yahoo! and its designees shall have the right to remove any Content that violates the TOS or is otherwise objectionable.

That bit about ‘otherwise objectionable’ could be used to cover removal of anything. Being subject to the terms of service of Internet service providers is nothing new, but as archives, libraries and other cultural heritage institutions look for ways to increase their revenue streams and explore innovative ways to bring more eyes to their materials it will become more import to understand these guidelines.

I understand (as the author of the article that inspired this post also points out) that Yahoo! is a business. Their priorities are not always going to be the same as those of the National Archives or the Brooklyn Museum. There are definitely images from history and the world of art that are only appropriate for adults, but isn’t that what Flickr’s content filter feature, named SafeSearch, is all about? These are the three ‘safety levels’ available on Flickr:

  • Safe – Content suitable for a global, public audience
  • Moderate – If you’re not sure whether your content is suitable for a global, public audience but you think that it doesn’t need to be restricted per se, this category is for you
  • Restricted – This is content you probably wouldn’t show to your mum, and definitely shouldn’t be seen by kids

It is interesting that Flickr has it’s own separate list of Community Guidelines, independent of Yahoo!’s terms of service. This is the passage from these guidelines about filtering content:

Take the opportunity to filter your content responsibly. If you would hesitate to show your photos or videos to a child, your mum, or Uncle Bob, that means it needs to be filtered. So, ask yourself that question as you upload your content and moderate accordingly. If you don’t, it’s likely that one of two things will happen. Your account will be reviewed then either moderated or terminated by Flickr staff.

I am still not sure what safety level I would use for a photo showing rows of dead in a concentration camp. I guess given the choices, ‘restricted’ is the best option – but that still doesn’t sit right with me somehow. I did an advanced Flickr search for ‘concentration camp’ with SafeSearch on – and those photos are not currently being marked as restricted. Who is it that we expect to be protecting using SafeSearch? From Flickr’s definition above it is supposed to at least be kids (and maybe your mom and Uncle Bob).

I think the question of the moment is how to know which images are appropriate to upload if some of the guidelines are unwritten. Flickr is a community and understanding the community is essential to success within that community. Once you believe your images are appropriate to include, then you must decide the right ‘safety level’. It is not clear to me how to tell the difference between an image that is not appropriate to be uploaded to Flickr and an image that is okay but needs to be marked with a safety level of ‘restricted’. I am very interested to see how this category of ‘appropriate but restricted’ evolves. For now, I am going to keep a watch on how the Flickr Commons grows and what range of content is included. The final answer for some of these images may be to only provide them via the institutions’ web sites rather than via service providers such as Flickr.

Image credit: Free Click by fikra (Sami Ben Gharbia) via Flickr

THATCamp 2008: Day 1 Dork Short Lightening Talks

lightningDuring lunch on the first day of THATCamp people volunteered to give lightning talks they called ‘Dork Shorts’. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info into my laptop. If you are looking for examples of inspirational and innovative work at the intersection of technology and the humanities – these are a great place to start!

Have more links to projects I missed including? Please add them in the comments below.

Image credit: Lightning by thenss (Christopher Cacho) via flickr

THATCamp 2008: Text Mining and the Persian Carpet Effect

alarch: Drift of Harrachov mine (Flickr)I attended a THATCamp session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible – but please forgive the fact that I did not catch the names of everyone who was part of this session.

What Is Text Mining?

Text mining is an umbrella phrase that covers many different techniques and types of tools.

The CHNM NEH-funded text mining initiative defined text mining as needing to support these three research functions:

  • Locating or finding: improving on search
  • Extraction: once you find a set of interesting documents, how do you extract information in new (and hopefully faster) ways? How do you pull data from unstructured bulk into structured sets?
  • Analysis: support analyzing the data, discovery of patterns, answering questions

The group discussed that there were both macro and micro aspects to text mining. Sometimes you are trying to explore a collection. Sometimes you are trying to examine a single document in great detail. Still other situations call for using text mining to generate automated classification of content using established vocabularies. Different kinds of tools will be important during different phases of research.

Projects, Tools, Examples & Cool Ideas

Andrea Eastman-Mullins, from Alexander Street Press, mentioned the University of Chicago’s ARTFL Project and these two tools:

  • PhiloLogic: An XML/SGML based full-text search, retrieval and analysis tool
  • PhiloMine: a extension being developed for PhiloLogic to provide support for “a variety of machine learning, text mining, and document clustering tasks”.

Dan Cohen directed us to his post about Mapping What Americans Did on September 11 and to Twistori which text mines Twitter.

Other Projects & Examples:

Some neat ideas that were mentioned for ways text mining could be used (lots of other great ideas were discussed – these are the two that made it into my notes):

  • Train a tool with collections of content from individual time periods, then use the tool to assist in identification of originating time period for new documents. Also could use this same setup to identify shifts in patterns in text by comparing large data sets from specific date ranges
  • If you have a tool that has learned how to classify certain types of content well… then watch for when it breaks – this can give you interesting trails to things to investigate.

Barriers to Text Mining

All of the following were touched upon as being barriers or challenges to text mining:

  • access to raw text in gated collections (ie, collections which require payment to permit access to resources) such as JSTOR and Project MUSE and others.
  • tools that are too difficult for non-programmers to use
  • questions relating to the validity of text mining as a technique for drawing legitimate conclusions

Next Steps

These ideas were ones put forward as important to move forward the field of text mining in the humanities:

  • develop and share best practices for use when cultural heritage institutions make digitization and transcription deals with corporate entities
  • create frameworks that enable individuals to reproduce the work of others and provide transparency into the assumptions behind the research
  • create tools and techniques that smooth the path from digitization to transcription
  • develop focused, easy-to-use tools that bridge the gap between computer programmers and humanities researchers

My thoughts
During the session I drew a parallel between the information one can glean in the field of archeology from the air that cannot be realized on the ground. I discovered it has a name:

“Archaeologists call it the Persian carpet effect. Imagine you’re a mouse running across an elaborately decorated rug. The ground would merely be a blur of shapes and colors. You could spend your life going back and forth, studying an inch at a time, and never see the patterns. Like a mouse on a carpet, an archaeologist painstakingly excavating a site might easily miss the whole for the parts.” from Airborne Archaeology, Smithsonian magazine, December 2005 (emphasis mine)

While I don’t see any coffee table books in the near future of text mining (such as The Past from Above: Aerial Photographs of Archaeological Sites), I do think that this idea captures the promise that we have before us in the form of the text mining tools. Everyone in our session seemed to agree that these tools will empower people to do things that no individual could have done in a lifetime by hand. The digital world is producing terabytes of text. We will need text mining tools just to find our way in this blizzard of content. It is all well and good to know that each snowflake is unique – but tell that to the 21st century historian soon to be buried under the weight of blogs, tweets, wikis and all other manner of web content.

Image credit: Drift of Harrachov Mine by alarch via flickr

As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form.

Learn About Wikis on Second Life (May 25th, 2008)

In case you always wondered how wikis can help archivists, this Sunday (May 25th, 2008) will see archivists gathering in Second Life to answer this question.

  • When: Sunday May 25th, 9pm-10.30pm GMT (5pm-6:30pm EDT)
  • Where: Open Air Auditorium at Cybrary City, Second Life

This sounds like a great way to kill two birds with one stone. If you have been looking for a reason to explore Second Life or you have been wondering about how wikis are being used to benefit archives and special collections (or both!) – this looks like a great combination.

Learn more about this event via the Second Life Library Project post How on Virtual Earth can Wikis Help Archivists?.

In the interest of full disclosure – I admit that I won’t be there. The first (and last) time I tried to explore Second Life I got motion sick after about 15 minutes. I understand that this is not very common – but since I am one of those people who get motion sick watching others play 3D video games I wasn’t too surprised. I have a theory about trying again one day with a Second Life expert at my side to help me tweak my settings to the least ‘hand held camera’ version of the Second Life experience – I just haven’t gotten there yet. Any tips from Second Life gurus welcome!

ISSUU: Interesting Platform for Online Publishing

Issuu, with the tag line ‘Read the world. Publish the world.’ and pronounced ‘issue’, gives anyone the ability to upload a PDF document and publish it as an online magazine. I am intrigued by the possibilities of using this service to publish digitized archival records – especially those that would lend themselves to a ‘book’ style presentation (thinking here of a ledger or equivalent).

I am not sure I totally understand the implications of the Issuu Terms of service… especially this part:

By distributing or disseminating Uploader Submissions through the Issuu Service, you hereby grant to Issuu a worldwide, non-exclusive, transferable, assignable, fully paid-up, royalty-free, license to host, transfer, display, perform, reproduce, distribute, and otherwise exploit your Uploader Submissions, in any media forms or formats, and through any media channels, now known or hereafter devised, including without limitation, RSS feeds, embeddable functionality, and syndication arrangements in order to distribute, promote or advertise your Uploader Submissions through the Issuu Service.

If I am following that properly, all the rights you are granting to the Issuu Service are only for the purposes of their distribution of your uploaded PDF.

Issuu has a special Copyright FAQ, which in combination with Peter Hirtle‘s page on Copyright Term and the Public Domain in the United States, should support those trying to figure out if they can upload what they want to upload without getting into copyright related hot water.

So how is it different from a plain old PDF? Take a look at the embedded Issuu viewer below showing a 1908 copy of The Colonial Book of The Towle Manufacturing Company Silversmiths.

I don’t think this would ever be the way you would want to give online access to digitized records in general – but I do think that this could be a great way to highlight a particularly impressive set or volume of documents. If an archives featured one of these a month on their homepage – would people subscribe to their RSS feed just to see the new one? On the actual page on which I found the above document, Issuu makes it easy to subscribe to the RSS feed for the Issuu author ‘silverlibrary’.

I don’t know why Issuu has decided that I must create an account before I may view document author silverlibrary’s user profile. I would hope that there was an elegant way for visitors to see a group of Issuu documents created by the same author without having to create an account first (or ever).

Want to know what others think? Take a look at Finally, a Web-based PDF Viewer That Does Not Suck (Issuu) over on TechCrunch. One interesting tidbit I picked up from that review is that Issuu is based in Denmark. I wonder what impact that has on which copyright rules apply to the documents uploaded into Issuu.

Want to read more about their vision? Of course they have a press release in the form of an Issuu publication and I have embedded it below. I think my favorite line is that Issuu is intended to be ‘YouTube for Publications’.

I would love to see a highlighted section created for ‘cultural heritage materials’ (or something like that anyway). Take a look around Issuu and let me know what you think. Is this a viable tool for an archives or manuscript collection to use to highlight parts of their collection?

Digital Preservation via Emulation – Dioscuri and the Prevention of Digital Black Holes

dioscuri.JPGAvailable Online posted about the open source emulator project Dioscuri back in late September. In the course of researching Thoughts on Digital Preservation, Validation and Community I learned a bit about the Microsoft Virtual PC software. Virtual PC permits users to run multiple operating systems on the same physical computer and can therefore facilitate access to old software that won’t run on your current operating system. That emulator approach pales in comparison with what the folks over at Dioscuri are planning and building.

On the Digital Preservation page of the Dioscuri website I found this paragraph on their goals:

To prevent a digital black hole, the Koninklijke Bibliotheek (KB), National Library of the Netherlands, and the Nationaal Archief of the Netherlands started a joint project to research and develop a solution. Both institutions have a large amount of traditional documents and are very familiar with preservation over the long term. However, the amount of digital material (publications, archival records, etc.) is increasing with a rapid pace. To manage them is already a challenge. But as cultural heritage organisations, more has to be done to keep those documents safe for hundreds of years at least.

They are nothing if not ambitious… they go on to state:

Although many people recognise the importance of having a digital preservation strategy based on emulation, it has never been taken into practice. Of course, many emulators already exist and showed the usefulness and advantages it offer. But none of them have been designed to be digital preservation proof. For this reason the National Library and Nationaal Archief of the Netherlands started a joint project on emulation.

The aim of the emulation project is to develop a new preservation strategy based on emulation.

Dioscuri is part of Planets (Preservation and Long-term Access via NETworked Services) – run by the Planets consortium and coordinated by the British Library. The Dioscuri team has created an open source emulator that can be ported to any hardware that can run a Java Virtual Machine (JVM). Individual hardware components are implemented via separate modules. These modules should make it possible to mimic many different hardware configurations without creating separate programs for every possible combination.

You can get a taste of the big thinking that is going into this work by reviewing the program overview and slide presentations from the first Emulation Expert Meeting (EEM) on digital preservation that took place on October 20th, 2006.

In the presentation given by Geoffrey Brown from Indiana University titled Virtualizing the CIC Floppy Disk Project: An Experiment in Preservation Using Emulation I found the following simple answer to the question ‘Why not just migrate?’:

  • Loss of information — e.g. word edits

  • Loss of fidelity — e.g. WordPerfect to Word isn’t very good

  • Loss of authenticity — users of migrated document need access to original to verify authenticity

  • Not always possible — closed proprietary formats

  • Not always feasible — costs may be too high

  • Emulation may necessary to enable migration

After reading through Emulation at the German National Library, presented by Tobias Steinke, I found my way to the kopal website. With their great tagline ‘Data into the future’, they state their goal is “…to develop a technological and organizational solution to ensure the long-term availability of electronic publications.” The real gem for me on that site is what they call the kopal demonstrator. This is a well thought out Flash application that explains the kopal project’s ‘procedures for archiving and accessing materials’ within the OAIS Reference Model framework. But it is more than that – if you are looking for a great way to get your (or someone else’s) head around digital archiving, software and related processes – definitely take a look. They even include a full Glossary.

I liked what I saw in Defining a preservation policy for a multimedia and software heritage collection, a pragmatic attempt from the Bibliothèque nationale de France, a presentation by Grégory Miura, but felt like I was missing some of the guts by just looking at the slides. I was pleased to discover what appears to be a related paper on the same topic presented at IFLA 2006 in Seoul titled: Pushing the boundaries of traditional heritage policy: Maintaining long-term access to multimedia content by introducing emulation and contextualization instead of accepting inevitable loss . Hurrah for NOT ‘accepting inevitable loss’.

Vincent Joguin’s presentation, Emulating emulators for long-term digital objects preservation: the need for a universal machine, discussed a virtual machine project named Olonys. If I understood the slides correctly, the idea behind Olonys is to create a “portable and efficient virtual processor”. This would provide an environment in which to run programs such as emulators, but isolate the programs running within it from the disparities between the original hardware and the actual current hardware. Another benefit to this approach is that only the virtual processor need be ported to new platforms rather than each individual program or emulator.

Hilde van Wijngaarden presented an Introduction to Planets at EEM. I also found another introductory level presentation that was given by Jeffrey van der Hoeven at wePreserve in September of 2007 titled Dioscuri: emulation for digital preservation.

The wePreserve site is a gold mine for presentations on these topics. They bill themselves as “the window on the synergistic activities of DigitalPreservationEurope (DPE), Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval (CASPAR), and Preservation and Long-term Access through NETworked Services (PLANETS).” If you have time and curiosity on the subject of digital preservation, take a glance down their home page and click through to view some of the presentations.

On the site of The International Journal of Digital Curation there is a nice ten page paper that explains the most recent results of the Dioscuri project. Emulation for Digital Preservation in Practice: The Results was published in December 2007. I like being able to see slides from presentations (as linked to above), but without the notes or audio to go with them I am often left staring at really nice diagrams wondering what the author’s main point was. The paper is thorough and provides lots of great links to other reading, background and related projects.

There is a lot to dig into here. It is enough to make me wish I had a month (maybe a year?) to spend just following up on this topic alone. I found my struggle to interpret many of the Power Point slide decks that have no notes or audio very ironic. Here I was hunting for information about the preservation of born digital records and I kept finding that the records of the research provided didn’t give me the full picture. With no context beyond the text and images on the slides themselves, I was left to my own interpretation of their intended message. While I know that these presentations are not meant to be the official records of this research, I think that the effort obviously put into collecting and posting them makes it clear that others are as anxious as I to see this information.

The best digital preservation model in the world will only preserve what we choose to save. I know the famous claim on the web is that ‘content is king’ – but I would hazard to suggest that in the cultural heritage community ‘context is king’.

What does this have to do with Dioscuri and emulators? Just that as we solve the technical problems related to preservation and access, I believe that we will circle back around to realize that digital records need the same careful attention to appraisal, selection and preservation of context as ‘traditional’ records. I would like to believe that the huge hurdles we now face on the technical and process side of things will fade over time due to the immense efforts of dedicated and brilliant individuals. The next big hurdle is the same old hurdle – making sure the records we fight to preserve have enough context that they will mean anything to those in the future. We could end up with just as severe a ‘digital black hole’ due to poorly selected or poorly documented records as we could due to records that are trapped in a format we can no longer access. We need both sides of the coin to succeed in digital preservation.

Did I mention the part about ‘Hurray for open source emulator projects with ambitious goals for digital preservation’? Right. I just wanted to be clear about that.

Image Credit: The image included at the top of this post was taken from a screen shot of Dioscuri itself, the original version of which may be seen here.