<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Spellbound Blog &#187; open source</title>
	<atom:link href="http://www.spellboundblog.com/category/open-source/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spellboundblog.com</link>
	<description>Archives, Digital Humanities, Cultural Heritage, Technology</description>
	<lastBuildDate>Mon, 06 Feb 2012 14:49:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>ArchivesZ Needs You!</title>
		<link>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/</link>
		<comments>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 04:48:24 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[archival community]]></category>
		<category><![CDATA[ArchivesZ]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[virtual collaboration]]></category>
		<category><![CDATA[what if]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=996</guid>
		<description><![CDATA[I got a kind email today asking &#8220;Whither ArchivesZ?&#8221;. My reply was: &#8220;it is sleeping&#8221; (projects do need their rest) and &#8220;I just started a new job&#8221; (I am now a Metadata and Taxonomy Consultant at The World Bank) and &#8220;I need to find enthusiastic people to help me&#8221;. That final point brings me to [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/">ArchivesZ Needs You!</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spellboundblog.com/wp-content/uploads/2010/07/Unclesamwantyou2.jpg"><img class="alignright size-full wp-image-997" title="I Want You!" src="http://www.spellboundblog.com/wp-content/uploads/2010/07/Unclesamwantyou2.jpg" alt="" width="288" height="320" /></a>I got a kind email today asking &#8220;Whither ArchivesZ?&#8221;. My reply was: &#8220;it is sleeping&#8221; (projects do need their rest) and &#8220;I just started a new job&#8221; (I am now a Metadata and Taxonomy Consultant at The World Bank) and &#8220;I need to find enthusiastic people to help me&#8221;. That final point brings me to this post.</p>
<p>I find myself in the odd position of having finished my Master&#8217;s Degree and not wanting to sign on for the long haul of a PhD. So I have a big project that was born in academia, initially as a joint class project and more recently as independent research with a grant-funded programmer, but I am no longer in academia.</p>
<p>What happens to projects like ArchivesZ? Is there an evolutionary path towards it being a collaborative project among dispersed enthusiastic individuals? Or am I more likely to succeed by recruiting current graduate students at my former (and still nearby) institution? I have discussed this one-on-one with a number of individuals, but I haven&#8217;t thrown open the gates for those who follow me here online.</p>
<p>For those of you who have been waiting patiently, the <a title="ArchivesZ" href="http://zaphod.mindlab.umd.edu/ArchivesZ/Main.html">ArchivesZ  version 2 prototype</a> is avaiable online. I can&#8217;t promise it will stay  online for long &#8211; it is definitely brittle for reasons I haven&#8217;t  totally identified. A few things to be aware of:</p>
<ul>
<li>when you  load the main page, you should see tags listed at the bottom &#8211; if you  don&#8217;t at all, then drop me an email via my contact form and I will try  and get Tomcat and Solr back up. If you have a small screen &#8211; you may need to  view your browser full screen to get to all the parts of the UI.</li>
<li>I know there are lots of bugs of various sizes. Some paths through  the app work &#8211; some don&#8217;t. Some screens are just placeholders. Feel free  to poke around and try things &#8211; you can&#8217;t break it for anyone else!</li>
</ul>
<p>I think there are a few key challenges to building what I would think of as the first &#8216;full&#8217; version of ArchivesZ &#8211; listed here in no particular order:</p>
<ul>
<li>In the process of creating version 2, I was too ambitious. The current version of ArchivesZ has lots of issues, some usability &#8211; some bugs (see prototype above!)</li>
<li>Wherever a collaborative workspace of ArchivesZ were going to live, it would need large data sets. I did a lot of work on data from eleven institutions in the spring of 2009, so there is a lot of data available &#8211; but it is still a challenge.</li>
<li>A lot of my future ideas for ArchivesZ are trapped in my head. The good news is that I am honestly open to others&#8217; ideas for where to take it in the future.</li>
<li>How do we build a community around the creation of ArchivesZ?</li>
</ul>
<p>I still feel that there is a lot to be gained by building a centralized visualization tool/service through which researchers and archivists could explore and discover archival materials. I even think there is promise to a freestanding tool that supports exploration of materials within a single institution. I can&#8217;t build it alone. This is a good thing &#8211; it will be a much better in the end with the input, energy and knowledge of others. I am good at ideas and good at playing the devil&#8217;s advocate. I have lots of strength on the data side of things and visualization has been a passion of mine for years. I need smart people with new ideas, strong tech skills (or a desire to learn) and people who can figure out how to organize the herd of cats I hope to recruit.</p>
<p>So &#8211; what can you do to help ArchivesZ? Do you have mad Action Script 3 skills? Do you want to dig into the scary little ruby script that populates the database? Maybe you prefer to organize and coordinate? You have always wanted to figure out how a project like this could group from a happy (or awkward?) prototype into a real service that people depend on?</p>
<p>Do you have a vision for how to tackle this as a project? Open source? Grant funded? Something else clever?</p>
<p>Know any graduate students looking for good research topics? There are juicy bits here for those interested in data, classification, visualization and cross-repository search.</p>
<p>I will be at SAA in DC in August chairing a panel on search engine optimization of archival websites. If there is even just one of you out there who is interested, I would cheerfully organize an ArchivesZ summit of some sort in which I could show folks the good, bad and ugly of the prototype as it stands. Let me know in the comments below.</p>
<p>Won&#8217;t be at SAA but want to help? Chime in here too. I am happy to set up some shared desktop tours of whatever you would like to see.</p>
<p>PS: Yes, I do have all the version 2 code &#8211; and what is online at the <a title="Google Code: ArchivesZ" href="http://code.google.com/p/archivesz/">Google Code ArchivesZ page</a> is not up to date. Updating the <a title="ArchivesZ" href="http://www.archivesz.org">ArchivesZ website</a> and uploading the current code is on my to do list!</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/">ArchivesZ Needs You!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Topic Modeling, Auto-Classification and Archival Description</title>
		<link>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/</link>
		<comments>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 06:28:08 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[what if]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=963</guid>
		<description><![CDATA[In an example of Twitter serendipity, @silverasm&#8216;s (Aditi Muralidharan) tweet pointed me to @historying&#8216;s blog post about Topic Modeling. In this post Cameron Blevins explains the results of using the topic modeling feature of UMass Amherst&#8216;s MAchine Learning for LanguagE Toolkit (MALLET) on the text of Martha Ballard’s Diary. I have spent lot of time [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/">Topic Modeling, Auto-Classification and Archival Description</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://mallet.cs.umass.edu/index.php"><img class="alignright size-full wp-image-964" title="MALLET logo" src="http://www.spellboundblog.com/wp-content/uploads/2010/04/logo3.png" alt="" width="215" height="95" /></a>In an example of Twitter serendipity, <a title="Twitter: silverasm" href="http://twitter.com/silverasm">@silverasm</a>&#8216;s (Aditi Muralidharan) <a title="tweet about text mining" href="http://twitter.com/silverasm/statuses/12842112825">tweet</a> pointed me to <a title="Twitter: historying" href="http://twitter.com/historying">@historying</a>&#8216;s <a title="Topic Modeling Martha Ballard’s Diary" href="http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/">blog post about Topic Modeling</a>. In this post Cameron Blevins explains the results of using the <a title="MALLET: Topic Modeling" href="http://mallet.cs.umass.edu/topics.php">topic modeling</a> feature of <a title="UMass Amherst" href="http://www.umass.edu/">UMass Amherst</a>&#8216;s <a title="MAchine Learning for LanguagE Toolkit" href="http://mallet.cs.umass.edu/index.php">MAchine Learning for LanguagE Toolkit</a> (MALLET) on the text of <a title="Martha Ballard's Diary Online" href="http://dohistory.org/diary/">Martha Ballard’s Diary</a>.</p>
<p>I have spent lot of time thinking about how to generate thematic overviews of groups of archival collections. My information visualization project, <a title="ArchivesZ Blog Posts" href="http://www.spellboundblog.com/category/archivesz/">ArchivesZ</a>, aims to provide ways of understanding aggregated archival description data, both from a single institution or across institutional boundaries. Now I find myself wondering if text mining with a tool like MALLET might generate smart topic groupings more elegantly than fighting with the wide range of non-standardized collection subjects.</p>
<p><strong>Topic Modeling with MALLET</strong></p>
<p>To get a sense of what MALLET generates, see the excerpt below from Blevins&#8217;s post:</p>
<blockquote><p>With some tinkering, MALLET generated a list of thirty topics  comprised of twenty words each, which I then labeled with a descriptive  title. Below is a quick sample of what the program<em> </em>“thinks” are  some of the topics in the diary:</p>
<ul>
<li><strong>MIDWIFERY:</strong> birth deld safe morn receivd calld left  cleverly pm labour fine reward arivd infant expected recd shee born  patient</li>
<li><strong>CHURCH: </strong>meeting attended  afternoon reverend worship foren mr famely performd vers attend public  supper st service lecture discoarst administred supt</li>
<li><strong>DEATH:</strong> day yesterday  informd morn years death ye hear expired expird weak dead las past heard  days drowned departed evinn</li>
<li><strong>GARDENING:</strong> gardin sett  worked clear beens corn warm planted matters cucumbers gatherd potatoes  plants ou sowd door squash wed seeds</li>
</ul>
</blockquote>
<p>He goes on to explain that &#8220;MALLET also allows us to track those topics across the text.&#8221; What if, instead of text mining a diary, we pumped the descriptions of every archival collection from a single institution into MALLET. Of course we would need a good list of stop words including such common terms as archives, history, sources and records. But I wonder how the topics MALLET suggests would compare to the official subjects associated with each collection? Could this give us a broad overview of the topics covered by a specific repository and give us a new way to build paths to the collections based on topic?</p>
<p><strong>Auto-Classification Using Castanet</strong></p>
<p>Text miner <a title="Aditi Muralidharan" href="http://www.cs.berkeley.edu/~aditi/">Aditi Muralidharan</a> also posted recently on this theme in <a title="Castanet: automatically generating a browsing structure for a collection" href="http://mininghumanities.com/2010/04/24/castanet-automatically-generating-a-browsing-structure-for-a-collection/">Castanet: automatically generating a browsing structure for a collection</a> and explains:</p>
<blockquote><p>Castanet automatically carves a sub-structure from the hierarchical  concept dictionary, WordNet (<a href="http://wordnet.princeton.edu/">http://wordnet.princeton.edu</a>),  and matches items in the collection to one or many appropriate places  within that hierarchy. Then, after some automated trimming and  flattening, the result is a hierarchical browsing system.</p></blockquote>
<p>I have heard of Castanet before via the <a title="Flamenco Search Interface Project" href="http://flamenco.berkeley.edu/">Flamenco Search Interface Project</a>. Apparently Muralidharan did a project using Castanet last summer to create <a href="http://go2.wordpress.com/?id=725X1342&amp;site=textdigihum.wordpress.com&amp;url=http%3A%2F%2Forange.sims.berkeley.edu%2Fcgi-bin%2Fflamenco.cgi%2Fflickr%2FFlamenco&amp;sref=http%3A%2F%2Fmininghumanities.com%2F2010%2F04%2F24%2Fcastanet-automatically-generating-a-browsing-structure-for-a-collection%2F">a category system</a> for <a title="Flickr Commons" href="http://www.flickr.com/commons">Flickr Commons</a> images based on the images&#8217;  tags which is then rendered using a Flamenco interface. I include a partial screen-shot below to give you a taste of what the navigation of images feels like a few levels down in the hierarchy. I love the classification of &#8216;Group Action&#8217; then filtered by a sub-classification of &#8216;Commerce&#8217;. The first images shown are of &#8216;horse trading&#8217; &#8211; with additional headings and images beneath them as well as additional filter options on the left.</p>
<p style="text-align: center;"><a title="Flickr Commons: group_action &gt; commerce" href="http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/flickr/Flamenco?q=actX:322&amp;group=actX"><img class="aligncenter size-full wp-image-966" title="Flickr Commons Images via Canasta &amp; Flamenco" src="http://www.spellboundblog.com/wp-content/uploads/2010/04/flickr-canasta.jpg" alt="" width="547" height="308" /></a></p>
<p><strong>What If?</strong></p>
<p>What if we pulled all the English language archival descriptions from around the world as our original data set. If we used this data for topic modeling, our subjects clusters would be cross-institutional. Maybe we could map the local institution assigned subjects to the topic model generated topics for each collection and get a sort of automated crosswalk for finding related collections. If we used the local institution assigned subjects from the archival descriptions for Canasta style auto-classification, maybe we could generate a way to hierarchically browse collections topically.</p>
<p>Both MALLET and Flamenco are open source (I am not sure of the status of Castanet) and, as I discovered working on ArchivesZ, many institutions will share their archival description data for a good cause. So &#8211; is this a good cause? I need to tease these ideas out a bit more, but what do you all think of it at first blush? Feasible? Interesting? Worthwhile experiments?</p>
<p><em>Image Credits:</em> MALLET logo from <a title="MALLET Homepage" href="http://mallet.cs.umass.edu/index.php">MALLET homepage</a>. Images in screen shot from <a title="Flickr Commons" href="http://www.flickr.com/commons">Flickr Commons</a> with no known copyright.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/">Topic Modeling, Auto-Classification and Archival Description</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>THATCamp 2008: Day 1 Dork Short Lightening Talks</title>
		<link>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/</link>
		<comments>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/#comments</comments>
		<pubDate>Sun, 15 Jun 2008 03:09:28 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[information visualization]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[THATCamp2008]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/</guid>
		<description><![CDATA[During lunch on the first day of THATCamp people volunteered to give lightning talks they called &#8216;Dork Shorts&#8217;. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/">THATCamp 2008: Day 1 Dork Short Lightening Talks</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://flickr.com/photos/thenss/2443187542/" title="Lightning by thenss (Christopher Cacho) via flickr"><img src="http://www.spellboundblog.com/wp-content/uploads/2008/06/2443187542_af1a3fe851.jpg" alt="lightning" align="right" height="247" width="330" /></a>During lunch on the first day of THATCamp people volunteered to give <a href="http://en.wikipedia.org/wiki/Lightning_Talk" title="Wikipedia: lightning talk">lightning talks</a> they called &#8216;Dork Shorts&#8217;. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info into my laptop. If you are looking for examples of inspirational and innovative work at the intersection of technology and the humanities &#8211; these are a great place to start!</p>
<ul>
<li><a href="http://www.worlddigitallibrary.org/project/english/index.html" title="World Digital Library">World Digital Library</a> (<a href="http://www.loc.gov" title="Library of Congress">Library of Congress</a> )</li>
<li><a href="http://www.piclens.com/" title="PicLens">PicLens</a> + FireFox + any search results page from the <a href="http://digitalgallery.nypl.org/nypldigital/index.cfm" title="NYPL Digital Gallery">New York Public Library Digital Gallery</a> = a 3D experience of ALL the photos at one time. PicLens uses the RSS feed to retrieve the full set of images along with their captions and will work with any RSS feed of images &#8211; such as RSS image feeds from <a href="http://flickr.com/" title="Flickr">Flickr</a> or <a href="http://smugmug.com/" title="Smugmug">Smugmug</a> .</li>
<li><a href="http://historywired.si.edu/" title="HistoryWired">HistoryWired</a> (<a href="http://americanhistory.si.edu/" title="National Museum of American History">National Museum of American History</a>): A new spin on a <a href="http://www.cs.umd.edu/hcil/treemap/" title="about treemaps">treemap</a> visualization built on top of museum metadata. One box is displayed per item and the box size is based on popularity. The rest of its innovations are just easier to experience than describe.</li>
<li><a href="http://objectofhistory.org/" title="The Object of History">The Object of History</a> (<a href="http://americanhistory.si.edu/" title="National Museum of American History">National Museum of American History</a> + <a href="http://chnm.gmu.edu/" title="CHNM">CHNM</a> )</li>
<li><a href="http://omeka.org/" title="Omeka">Omeka</a> (<a href="http://chnm.gmu.edu/" title="CHNM">CHNM</a> )</li>
<li><a href="http://exhibitions.nypl.org/eminent/" title="Eminent Domain">Eminent Domain</a> (<a href="http://www.nypl.org/" title="New York Public Library">NYPL</a>Online Exhibition): built on Omeka</li>
<li><a href="http://nocoma.grainger.uiuc.edu/" title="American Social History Online">American Social History Online</a> (<a href="www.diglib.org" title="Digital Library Federation">Digital Library Federation</a>): Zotero enabled. They are on the <a href="http://wiki.dlib.indiana.edu/confluence/display/DLFAquifer/Collection+Submission" title="Collection Submission Guidelines">hunt for more MODS records</a>. Built on Ruby On Rails (RoR) and will be put out as open source software within a couple of months.</li>
<li><a href="http://www4.ncsu.edu/~dmrieder/typographia/" title="Typographia">Typographia</a>(David Rieder, NC State University)</li>
</ul>
<p>Have more links to projects I missed including? Please add them in the comments below.</p>
<p><em>Image credit: <a href="http://flickr.com/photos/thenss/2443187542/" title="Lightning by thenss (Christopher Cacho) via flickr">Lightning</a> by <a href="http://flickr.com/people/thenss/" title="Flickr: thenss">thenss</a> (Christopher Cacho) via flickr</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/">THATCamp 2008: Day 1 Dork Short Lightening Talks</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</title>
		<link>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/</link>
		<comments>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 02:25:07 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[digitization]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[oral history]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[THATCamp2008]]></category>
		<category><![CDATA[transcription]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[virtual collaboration]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/</guid>
		<description><![CDATA[The THATCamp session officially titled &#8216;Crowdsourcing&#8217; on the schedule was actually aimed at discussing the intersection of crowdsourced transcription and collaborative annotation. The group was small &#8211; just six of us and Ben Brumfield got us going by giving us an overview of transcription software and projects: The FamilySearch Indexing Project is an LDS church [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/">THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a title="Free pencils by zone41" href="http://flickr.com/photos/zone41/2302365649/"></a></p>
<p style="text-align: center"><a title="Free pencils by zone41" href="http://flickr.com/photos/zone41/2302365649/"><img src="http://www.spellboundblog.com/wp-content/uploads/2008/06/2302365649_6facc7e838.jpg" alt="Free Pencils by zone41 on Flickr" width="351" height="263" /></a></p>
<p>The <a title="THATCamp" href="http://thatcamp.org">THATCamp</a> session officially titled &#8216;Crowdsourcing&#8217; on the <a title="THATCamp Schedule" href="http://thatcamp.org/schedule/">schedule</a> was actually aimed at discussing the intersection of <a title="THATCamp Blog: Session Idea - Crowdsourcing Transcriptions" href="http://thatcamp.org/2008/05/crowdsourcing-transcriptions/">crowdsourced transcription</a> and <a title="THATCamp Blog: Session Idea - Collaborative Annotation" href="http://thatcamp.org/2008/05/collaborative-annotation/">collaborative annotation</a>. The group was small &#8211; just six of us and <a title="Ben Brunfield: Manuscript Transcription Blog" href="http://manuscripttranscription.blogspot.com">Ben Brumfield</a> got us going by giving us an overview of transcription software and projects:</p>
<ul>
<li>The <a title="FamilySearch Indexing Project" href="http://www.ldsindexing.org/">FamilySearch Indexing Project</a> is an <a title="The Church of Latter-day Saints" href="http://www.lds.org">LDS church</a> project put out by the <a title="FamilySearch Labs" href="http://labs.familysearch.org/">FamilySearch Labs</a>. Their goals: &#8220;Volunteers extract family history information from digital images of historical documents to create searchable indexes that assist everyone in finding their ancestors.&#8221;</li>
<li>The <a title="Manuscript Transcription Assistant" href="http://www.wpi.edu/Academics/Depts/IGSD/Projects/Venice/Center/Projects/MQP/Transcription/download.html">Manuscript Transcription Assistant</a> is based at <a title="Worcester Polytechnic Institute" href="http://www.wpi.edu">Worcester Polytechnic Institute</a> (WPI) and is described as &#8220;a tool to assist transcribers in creating transcriptions, and incorporate meta-data about each image and transcription that can then be used to search through an electronic library of transcriptions&#8221;. I found mention in the <a title="Manuscript Transcription Project: FAQ" href="http://www.wpi.edu/Academics/Depts/IGSD/Projects/Venice/Center/Projects/MQP/Transcription/faq.html#22">FAQ</a> of the desire to create a community so that &#8220;transcribers will be able to collaborate their work by rating the quality of other user&#8217;s transcriptions. By ranking the transcriptions, specific versions of transcriptions will emerge as an authority for that manuscript. &#8221; Unfortunately, a lot of the links on that site are broken and my attempt to register gave me an error. It is not clear to me that this project is actually still active.</li>
<li><a title="Soldier Studies" href="http://www.soldierstudies.org">Soldier Studies</a> is a website dedicated to posting transcriptions of civil war letters and diaries. This is not a tool for transcribing, but is clearly a repository targeting specifically transcriptions (see their <a title="Soldier Studies: Mission Statement" href="http://www.soldierstudies.org/index.php?action=mission">Mission Statement</a> for more information).</li>
<li><a title="Oh No Robot" href="http://ohnorobot.com/">Oh No Robot</a> is a comics transcription and search tool. It provides a page to <a title="Oh No Robot: comics that need transcriptions" href="http://www.ohnorobot.com/helpout.pl">find comics needing transcription</a> and a great page to explain <a title="Oh No Robot: Transcription Explained" href="http://www.ohnorobot.com/transcriptionexplained.pl">how transcription works</a> on their site.</li>
</ul>
<p>After examining what was out there, Ben concluded that what he wanted didn&#8217;t exist &#8211; so he started to build it himself. He gave us a demo of his &#8220;very beta&#8221; software. His goal is to build a web based tool to support collaborative manuscript transcription and annotation by individuals without a strong technical background. In its current (and private beta) state the software supports transcription, an innovative approach to linking individual words or phrases to collection defined subjects and some basic community tools to let his virtual team discuss transcription issues. Ben is working hard on the software &#8211; if you are interested in his project, definitely <a title="Ben Brumfield" href="http://thatcamp.org/camper/benwbrum/">get in touch with him</a>.</p>
<p><a title="Travis Brown" href="http://thatcamp.org/camper/travis/">Travis Brown</a> showed us his creation: <a title="ecomma" href="http://ecomma.cwrl.utexas.edu/0.2.0/">eComma</a>. eComma aims to &#8220;enable groups of students, scholars, or general readers to build collaborative commentaries on a text and to search, display, and share those commentaries online&#8221;. He showed us how users could tag or add comments on individual words or phrases of a loaded text. Take a look at the <a title="eComma: Sonnet 18" href="http://ecomma.cwrl.utexas.edu/0.2.0/texts/comments/4">eComma page for Sonnet 18 by William Shakespeare</a>. The words highlighted in blue are those which are tagged or have comments associated with them. If you highlight &#8216;the eye of heaven&#8217; in line 5 you will see that it is tagged as a metaphor. Travis reported that he will have 2 other programmers working on eComma with him this summer and has his eye on improving some interface issues and adding a few more features.</p>
<p>We also talked about ways to display transcription. <a title="Elena Razlogova" href="http://elenarazlogova.org/">Elena Razlogova</a> guided us over to the <a title="DoHistory" href="http://dohistory.org">DoHistory</a> website. There she showed us the <a title="DoHistory: Magic Lens" href="http://dohistory.org/diary/exercises/lens/index.html">Magic Lens</a> interface. This interface displays the transcription of a handwritten diary page via a lens style overlay that you can move with your mouse. This reminded me of the <a title="Gilder Lehrman Battle Lines: Letters from America's Wars" href="http://www.gilderlehrman.org/collection/battlelines/index_good.html">Gilder Lehrman Battle Lines: Letters from America&#8217;s Wars</a> interface that I found when doing research for my <a title="Communicating Context in Online Collections Poster" href="http://www.spellboundblog.com/poster/">Communicating Context in Online Collections Poster</a>. If you haven&#8217;t seen it before &#8211; go examine the page showing the transcription of (turn down your speaker if a reader&#8217;s voice will disturb those around you)  <a title="Nathanael Green's letter to Catherine Greene dated July 17, 1778" href="http://www.gilderlehrman.org/collection/battlelines/chapter3/chapter3_1a.html">Nathanael Green&#8217;s letter</a> to Catherine Greene dated July 17, 1778.</p>
<p>While on the DoHistory site I also found the <a title="DoHistory: Try Transcribing" href="http://dohistory.org/diary/exercises/tryTranscribing.html">Try Your Hand At Transcribing page</a>. This page shows the challenge of transcribing handwritten documents by giving you the chance to try it yourself and then lets you check your transcription with the click of a button.</p>
<p>We talked a bit about the technology behind eComma (forgive me Travis for not having enough details in my notes to explain your current architecture here) and the challenges inherent in wanting to annotate overlapping sets of words. Though he isn&#8217;t using it in the current implementation of eComma, Travis mentioned the <a title="LMNL" href="http://lmnl.net">Layered Markup Annotation Language</a> (LMNL) which the <a title="LMNL: Tutorial" href="http://lmnl.net/prose/tutorial/index.html">tutorial page</a> explains as:</p>
<blockquote><p>&#8230;LMNL documents contain character data which is marked up using named and occasionally overlapping ranges. Ranges can have annotations, which can themselves be annotated and can have structured content. To support authoring, especially collaborative authoring, markup is namespaced and divided into layers, which might reflect different views on the text.</p></blockquote>
<p>I can definitely see how LMNL might be an interesting framework for building transcription and annotation software.</p>
<p><a title="Krissy O'Hare" href="http://storytelling.concordia.ca/staff/krissy/">Krissy O&#8217;Hare</a> brought up the challenges of transcribing audio and video that she has faced working on <a title="Concordia University: Oral History" href="http://storytelling.concordia.ca/oralhistory/index.html">oral history projects at Concordia University</a>. This led to Travis (I think?) mentioning the <a title="Texas German Dialect Project" href="http://www.tgdp.org">Texas German Dialect Project</a> (TGDP) and the <a title="CMU Sphinx Group Speech Recognition Engine" href="http://cmusphinx.sourceforge.net/html/cmusphinx.php">CMU Sphinx Group Speech Recognition Engine</a>. TGDP has an online archive of recorded interviews along with their transcriptions and translations. CMU Sphinx&#8217;s introduction explains that their software tools are targeted at expert users wanting to build speech-using applications.</p>
<p>This was a great session. The small group gave everyone a chance to contribute and take over the keyboard in order to show off their favorite sites. It was immediately after the <a title="THATCamp 2008: Text Mining Session" href="http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/">Text Mining</a> session, so our minds were already full of all the great things one could do with text once it is transcribed.</p>
<p>I am excited to watch the evolution of group transcription and annotation software. If you know of other transcription or annotation tools or projects &#8211; please post them to the comments.</p>
<p><em>Image credit: <a title="Free pencils by zone41 via flickr" href="http://flickr.com/photos/zone41/2302365649/">Free pencils by zone41 via flickr</a><a title="Free pencils by zone41 via Flickr" href="http://flickr.com/photos/zone41/2302365649/"></a></em></p>
<p><em>As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via</em> <a title="contact Jeanne Kramer-Smyth" href="http://www.spellboundblog.com/contact/"><em>my contact form</em></a><em>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/">THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>MIT&#8217;s SIMILE Project: Innovations in Metadata Interaction and Analysis</title>
		<link>http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/</link>
		<comments>http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/#comments</comments>
		<pubDate>Sun, 13 Jan 2008 06:37:16 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[EAD]]></category>
		<category><![CDATA[information visualization]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/</guid>
		<description><![CDATA[Well-formed Data&#8217;s post on Exhibit led me to explore what was available from MIT&#8216;s Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) project. I took a little time to examine some of the SIMILE project tools with an eye to how they could impact interaction with archival records and metadata, as well as [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/">MIT&#8217;s SIMILE Project: Innovations in Metadata Interaction and Analysis</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://simile.mit.edu/" title="MIT Simile Project"><img align="left" src="http://www.spellboundblog.com/wp-content/uploads/2008/01/logo.png" alt="MIT SIMILE project" title="MIT SIMILE project" /></a><a href="http://well-formed-data.net/archives/119/exhibit" title="Well-formed Data: Exhibit">Well-formed Data&#8217;s post on Exhibit</a> led me to explore what was available from <a href="http://mit.edu/" title="Massachusetts Institute of Technology">MIT</a>&#8216;s <a href="http://simile.mit.edu/" title="Semantic Interoperability of Metadata and Information in unLike Environments">Semantic Interoperability of Metadata and Information in unLike Environments</a> (SIMILE) project. I took a little time to examine some of the SIMILE project tools with an eye to how they could impact interaction with archival records and metadata, as well as how they might support the work of archivists. All the tools appear to be available via an open source <a href="http://www.opensource.org/licenses/bsd-license.php" title="Open Source: BSD License">BSD license</a>.</p>
<p><strong>Babel</strong></p>
<p><a href="http://simile.mit.edu/babel/" title="SIMILE: Babel">Babel</a> converts files from one format to another. I did a test to see if it would convert one of the <a href="http://lcweb2.loc.gov/faid/source.html" title="LOC: EAD Finding Aids in XML format">Library of Congress EAD Finding Aids</a> from XML to some other format &#8211; but it gave me an error (&#8216;unqualified attribute &#8216;repositoryencoding&#8217; not allowed&#8217;). I love the idea that I could just point this at an EAD finding aid and get something useful out the other side &#8211; but apparently that is a bit on the wishful thinking side &#8211; at least for the moment.</p>
<p><strong>Exhibit 2.0</strong></p>
<p><a href="http://simile.mit.edu/exhibit/" title="SIMILE: Exhibit 2.0">Exhibit 2.0</a><strong> </strong>is described on the Exhibit homepage as follows:</p>
<blockquote>
<p class="blurb">Exhibit is a <em>three-tier web application framework</em> written in Javascript, which you can include like you would include Google Maps. If you just want to show a few hundred records of data on maps, timelines, scatter plots, interactive tables, etc., why bother learning SQL, ASP, PHP, CGI, or whatever when you can just use Exhibit? To use Exhibit, you write: a simple data file, and an HTML file in which you specify how the data should be shown. Data + Presentation. That&#8217;s all there is to publishing, as it should be.</p>
</blockquote>
<p>Sounds fabulous, doesn&#8217;t it? I wish I had a week to play with this tool. They have a whole slew of <a href="http://simile.mit.edu/wiki/Exhibit/Examples" title="SIMILE: Exhibit Examples">examples</a>, but I think the two I list below do a fine job of showing what you can create (not to mention being fairly thematic for those of you paying attention to the US Presidential Primaries news coverage):</p>
<ul>
<li><a href="http://ryanlee.org/2007/08/decide.html" title="2008 Presidential Election Candidates on the Issues">2008 Presidential Election Candidates on the Issues</a></li>
<li><a href="http://simile.mit.edu/exhibit/examples/presidents/presidents.html" title="US Presidents (in Exhibit)">US Presidents</a></li>
</ul>
<p><strong>Gadget</strong></p>
<p><a href="http://simile.mit.edu/wiki/Gadget" title="SIMILE: Gadget">Gadget</a><strong> </strong>is an XML inspector designed to create useful summaries of vast pools of XML data. I didn&#8217;t download and play with this one &#8211; but it sounds like something that might be very interesting to pump a big pile of EAD XML format finding aids into to see what could be discovered from an <a href="http://www.prjunction.com/" style="color: #000; font-weight: normal;">aggregate</a> point of view.</p>
<p><strong>Longwell &amp; RDFizers</strong></p>
<p><a href="http://simile.mit.edu/wiki/Longwell" title="SIMILE: Longwell">Longwell</a> is a <a href="http://simile.mit.edu/wiki/Faceted_Browser" title="faceted browser definition">faceted browser</a> for <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework" title="Wikipedia: RDF">RDF</a> formatted data, while <a href="http://simile.mit.edu/wiki/RDFizers" title="SIMILE: RDFizers">RDFizers</a> is actually a directory of tools which convert other data formats into the RDF format. It doesn&#8217;t exist now, but if there was an RDFizer that went from EAD to RDF then Longwell would become more interesting to archivists.</p>
<p>That said, they already do have both a <a href="http://simile.mit.edu/wiki/MARC/MODS_RDFizer" title="SIMILE: MARC/MODS RDFizer">MARC/MODS RDFizer</a> and an <a href="http://simile.mit.edu/wiki/OAI-PMH_RDFizer" title="SIMILE: OAI-PMH RDFizer">OAI-PMH RDFizer</a>. I suspect that many archivists could put their hands on archival data in one of these two formats &#8211; which makes experimenting with Longwell more plausible in the near term.</p>
<p><strong>Final Thoughts </strong></p>
<p>There are lots other tools that are part of the SIMILE project (<a href="http://simile.mit.edu/wiki/Solvent" title="SIMILE: Solvent">screen scrapers</a> and <a href="http://simile.mit.edu/timeplot/" title="SIMILE: Timeplot">timeplotters</a> and <a href="http://simile.mit.edu/wiki/Referee" title="SIMILE: Referee">more</a>), but the ones listed above most ignited my imagination. Surely there are geek archivists even now rolling up their sleeves to figuring out how to leverage free open source tools like these, both to improve access to records and increase understanding of what we have and how well it is (or isn&#8217;t) documented.</p>
<p>I hope to find time to play with each of these over the next few months &#8211; but I would love to know if anyone else out there has already tried any of these tools. Have suggestions for likely datasets? Have knowledge of existing archive related applications using these tools? Please post your comments below or drop me a line via <a href="http://www.spellboundblog.com/contact/" title="Spellbound Blog Contact Form">my contact form</a>!</p>
<p><em>Image Credit: The Simile Project logo displayed above is from MIT&#8217;s <a href="http://simile.mit.edu/" title="MIT Simile Project">Simile Project website</a>. </em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/">MIT&#8217;s SIMILE Project: Innovations in Metadata Interaction and Analysis</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/01/13/mits-simile-project-innovations-in-metadata-interaction-and-analysis/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Digital Preservation via Emulation &#8211; Dioscuri and the Prevention of Digital Black Holes</title>
		<link>http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/</link>
		<comments>http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/#comments</comments>
		<pubDate>Tue, 25 Dec 2007 05:27:20 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[at risk records]]></category>
		<category><![CDATA[born digital records]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[electronic records]]></category>
		<category><![CDATA[future-proofing]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/</guid>
		<description><![CDATA[Available Online posted about the open source emulator project Dioscuri back in late September. In the course of researching Thoughts on Digital Preservation, Validation and Community I learned a bit about the Microsoft Virtual PC software. Virtual PC permits users to run multiple operating systems on the same physical computer and can therefore facilitate access [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/">Digital Preservation via Emulation &#8211; Dioscuri and the Prevention of Digital Black Holes</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="https://sourceforge.net/project/screenshots.php?group_id=200001&amp;ssid=62512" title="Dioscuri Screenshot"><img src="http://www.spellboundblog.com/wp-content/uploads/2007/12/dioscuri.JPG" title="dioscuri.JPG" alt="dioscuri.JPG" align="right" /></a><a href="http://availableonline.wordpress.com" title="Available Online">Available Online</a> posted about the open source emulator project <a href="http://dioscuri.sourceforge.net/" title="Dioscuri - Open Source Emmulator">Dioscuri</a> back in  <a href="http://availableonline.wordpress.com/2007/09/26/files-lost-on-wordperfect-51-drawperfect-11-and-norton-commander/" title="Available Online: Files lost on WordPerfect 5.1, DrawPerfect 1.1 and Norton Commander?">late September</a>. In the course of researching <a href="http://www.spellboundblog.com/2007/07/06/thoughts-on-digital-preservation-validation-and-community/" title="Spellbound Blog: Thoughts on Digital Preservation, Validation and Community">Thoughts on Digital Preservation, Validation and Community</a> I learned a bit about the <a href="http://www.microsoft.com/windows/products/winfamily/virtualpc/default.mspx" title="Microsoft Virtual PC 2007 software">Microsoft Virtual PC software</a>. Virtual PC permits users to run multiple operating systems on the same physical computer and can therefore  facilitate access to old software that won&#8217;t run on your current operating system. That emulator approach pales in comparison with what the folks over at Dioscuri are planning and building.</p>
<p>On the  <a href="http://dioscuri.sourceforge.net/preservation.html" title="Dioscuri: Digital Preservation">Digital Preservation</a> page of the Dioscuri website I found this paragraph on their goals:</p>
<blockquote><p>To prevent a digital black hole, the Koninklijke Bibliotheek (KB), National Library of the Netherlands, and the Nationaal Archief of the Netherlands started a joint project to research and develop a solution. Both institutions have a large amount of traditional documents and are very familiar with preservation over the long term. However, the amount of digital material (publications, archival records, etc.) is increasing with a rapid pace. To manage them is already a challenge. But as cultural heritage organisations, more has to be done to keep those documents safe for hundreds of years at least.</p></blockquote>
<p>They are nothing if not ambitious&#8230; they go on to state:</p>
<blockquote><p>Although many people recognise the importance of having a digital preservation strategy based on emulation, it has never been taken into practice. Of course, many emulators already exist and showed the usefulness and advantages it offer. But none of them have been designed to be digital preservation proof. For this reason the National Library and Nationaal Archief of the Netherlands started a joint project on emulation.</p>
<p>The aim of the emulation project is to develop a new preservation strategy based on emulation.</p></blockquote>
<p>Dioscuri is part of  <a href="http://www.planets-project.eu/" title="The PLANETS project">Planets</a> (Preservation and Long-term Access via NETworked Services) &#8211; run by the <a href="http://www.planets-project.eu/about/#partners" title="Planets Partners">Planets consortium</a> and coordinated by the British Library. The Dioscuri team has created an open source emulator that can be ported to any hardware that can run a <a href="http://en.wikipedia.org/wiki/Java_Virtual_Machine" title="Wikipedia: Java Virtual Machine (JVM)">Java Virtual Machine</a> (JVM). Individual hardware components are implemented via separate modules. These modules should make it possible to mimic many different hardware configurations without creating separate programs for every possible combination.</p>
<p>You can get a taste of the big thinking that is going into this work by reviewing the <a href="http://www.kb.nl/hrd/dd/dd_projecten/projecten_emulatie-eemprogramme-en.html" title="EEM: 2006 Slides and Program Overview">program overview and slide presentations</a> from the first Emulation Expert Meeting (EEM) on digital preservation that took place on October 20th, 2006.</p>
<p>In the presentation given by <a href="http://www.cs.indiana.edu/~geobrown/" title="Geoffrey Brown">Geoffrey Brown</a> from <a href="http://www.indiana.edu/" title="Indiana University">Indiana University</a> titled <a href="http://www.kb.nl/hrd/dd/dd_projecten/slides/eem_iu_gbrown.pdf" title="Virtualizing the CIC Floppy Disk Project: An Experiment in Preservation Using Emulation">Virtualizing the CIC Floppy Disk Project: An Experiment in Preservation Using Emulation</a> I found the following simple answer to the question &#8216;Why not just migrate?&#8217;:</p>
<ul dir="ltr">
<li>
<p style="margin-right: 0px">Loss of information &#8212; e.g. word edits</p>
</li>
<li>
<p style="margin-right: 0px">Loss of fidelity &#8212; e.g. WordPerfect to Word isn’t very good</p>
</li>
<li>
<p style="margin-right: 0px">Loss of authenticity &#8212; users of migrated document need access to original to verify authenticity</p>
</li>
<li>
<p style="margin-right: 0px">Not always possible &#8212; closed proprietary formats</p>
</li>
<li>
<p style="margin-right: 0px">Not always feasible &#8212; costs may be too high</p>
</li>
<li>
<p style="margin-right: 0px">Emulation may necessary to enable migration</p>
</li>
</ul>
<p>After reading through <a href="http://www.kb.nl/hrd/dd/dd_projecten/slides/eem_dnb_tsteinke.pdf" title="Emmulation at the German National Library">Emulation at the German National Library</a>, presented by <a href="http://www.tobias-steinke.de/" title="Tobias Steinke">Tobias Steinke</a>, I found my way to the <a href="http://kopal.langzeitarchivierung.de/" title="kopal: data into the future">kopal</a> website. With their great tagline &#8216;Data into the future&#8217;, they state their <a href="http://kopal.langzeitarchivierung.de/index_ziel.php.en" title="kopal: goal">goal</a> is &#8220;&#8230;to develop a technological and organizational solution to ensure the long-term availability of electronic publications.&#8221; The real gem for me on that site is what they call the <a href="http://kopal.langzeitarchivierung.de/index_demonstrator.php.en" title="kopal demonstrator">kopal demonstrator</a>. This is a well thought out Flash application that explains the kopal project&#8217;s &#8216;procedures for archiving and accessing materials&#8217; within the <a href="http://ssdoo.gsfc.nasa.gov/nost/isoas/" title="OAIS">OAIS Reference Model</a> framework. But it is more than that &#8211; if you are looking for a great way to get your (or someone else&#8217;s) head around digital archiving, software and related processes &#8211; definitely take a look. They even include a full Glossary.</p>
<p>I liked what I saw in <a href="http://www.kb.nl/hrd/dd/dd_projecten/slides/eem_bnf_gmiura.pdf" title="EEM: Grégory Miura Presentation">Defining a preservation policy for a multimedia and software heritage collection, a pragmatic attempt from the Bibliothèque nationale de France</a>, a presentation by Grégory Miura, but felt like I was missing some of the guts by just looking at the slides. I was pleased to discover what appears to be a related paper on the same topic presented at IFLA 2006 in Seoul titled: <a href="http://www.ifla.org/IV/ifla72/papers/091-Miura-en.pdf" title="IFLA 2006 Seoul: Pushing the boundaries of traditional heritage policy">Pushing the boundaries of traditional heritage policy: Maintaining long-term access to multimedia content by introducing emulation and contextualization instead of accepting inevitable loss</a> . Hurrah for NOT &#8216;accepting inevitable loss&#8217;.</p>
<p>Vincent Joguin&#8217;s presentation,  <a href="http://www.kb.nl/hrd/dd/dd_projecten/slides/eem_aconit_vjoguin.pdf" title="EEM: Vincent Joguin">Emulating emulators for long-term digital objects preservation: the need for a universal machine</a>, discussed a virtual machine project named Olonys. If I understood the slides correctly, the idea behind Olonys is to create a &#8220;portable and efficient virtual processor&#8221;. This would provide an environment in which to run programs such as emulators, but isolate the programs running within it from the disparities between the original hardware and the actual current hardware. Another benefit to this approach is that only the virtual processor need be ported to new platforms rather than each individual program or emulator.</p>
<p>Hilde van Wijngaarden presented an <a href="http://www.kb.nl/hrd/dd/dd_projecten/slides/eem_kb_hvwijngaarden.pdf" title="EEM: planets overview">Introduction to Planets</a> at EEM. I also found another introductory level presentation that was given by Jeffrey van der Hoeven at <a href="http://www.wepreserve.eu/events/fp6-2007/" title="wePreserve">wePreserve</a> in September of 2007 titled <a href="http://www.wepreserve.eu/events/fp6-2007/presentations/2007-09-05_emulation_wepreserve_portugal_jrvanderhoeven.pdf" title="Dioscuri: emulation for digital preservation">Dioscuri: emulation for digital preservation</a>.</p>
<p>The <a href="http://www.wepreserve.eu/events/fp6-2007/" title="wePreserve">wePreserve</a> site is a gold mine for presentations on these topics. They <a href="http://www.wepreserve.eu/about/" title="About wePreserve">bill themselves</a> as &#8220;the window on the synergistic activities of DigitalPreservationEurope (DPE), Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval (CASPAR), and Preservation and Long-term Access through NETworked Services (PLANETS).&#8221; If you have time and curiosity on the subject of digital preservation, take a glance down their home page and click through to view some of the presentations.</p>
<p>On the site of <a href="http://www.ijdc.net/ijdc" title="International Journal of Digital Curation">The International Journal of Digital Curation</a> there is a nice ten page paper that explains the most recent results of the Dioscuri project. <a href="http://www.ijdc.net/ijdc/article/viewFile/50/203" title="Emulation for Digital Preservation in Practice: The Results">Emulation for Digital Preservation in Practice: The Results</a> was published in December 2007. I like being able to see slides from presentations (as linked to above), but without the notes or audio to go with them I am often left staring at really nice diagrams wondering what the author&#8217;s main point was. The paper is thorough and provides lots of great links to other reading, background and related projects.</p>
<p>There is a lot to dig into here. It is enough to make me wish I had a month (maybe a year?) to spend just following up on this topic alone. I found my struggle to interpret many of the Power Point slide decks that have no notes or audio very ironic. Here I was hunting for information about the preservation of born digital records and I kept finding that the records of the research provided didn&#8217;t give me the full picture. With no context beyond the text and images on the slides themselves, I was left to my own interpretation of their intended message. While I know that these presentations are not meant to be the official records of this research, I think that the effort obviously put into collecting and posting them makes it clear that others are as anxious as I to see this information.</p>
<p>The best digital preservation model in the world will only preserve what we choose to save. I know the famous claim on the web is that &#8216;content is king&#8217; &#8211; but I would hazard to suggest that in the cultural heritage community &#8216;context is king&#8217;.</p>
<p>What does this have to do with Dioscuri and emulators? Just that as we solve the technical problems related to preservation and access, I believe that we will circle back around to realize that digital records need the same careful attention to appraisal, selection and preservation of context as &#8216;traditional&#8217; records. I would like to believe that the huge hurdles we now face on the technical and process side of things will fade over time due to the immense efforts of dedicated and brilliant individuals. The next big hurdle is the same old hurdle &#8211; making sure the records we fight to preserve have enough context that they will mean anything to those in the future. We could end up with just as severe a &#8216;digital black hole&#8217; due to poorly selected or poorly documented records as we could due to records that are trapped in a format we can no longer access. We need both sides of the coin to succeed in digital preservation.</p>
<p>Did I mention the part about &#8216;Hurray for open source emulator projects with ambitious goals for digital preservation&#8217;? Right. I just wanted to be clear about that.</p>
<p><em>Image Credit: The image included at the top of this post was taken from a screen shot of Dioscuri itself, the original version of which may be <a href="https://sourceforge.net/project/screenshots.php?group_id=200001&amp;ssid=62512" title="Dioscuri Screenshot">seen here</a>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/">Digital Preservation via Emulation &#8211; Dioscuri and the Prevention of Digital Black Holes</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2007/12/25/digital-preservation-via-emulation-dioscuri-and-the-prevention-of-digital-black-holes/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The MemoryArchive Affiliate Program: A Wiki Engine for Collecting Memoirs</title>
		<link>http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/</link>
		<comments>http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/#comments</comments>
		<pubDate>Thu, 15 Nov 2007 02:17:23 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[archival community]]></category>
		<category><![CDATA[born digital records]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[oral history]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[virtual collaboration]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/</guid>
		<description><![CDATA[A Beautiful WWW posted A Review of MemoryArchive.org. MemoryArchive, founded by historian Marshall Poe, is a new MediaWiki based website aimed at collecting first person accounts that they term &#8216;memoirs&#8217;. In sharp contrast with the communal authorship approach of most wikis, MemoryArchive locks down edits of each entry after a format review. What sorts of [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/">The MemoryArchive Affiliate Program: A Wiki Engine for Collecting Memoirs</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a title="MemoryArchive" href="http://www.memoryarchive.org"><img title="MemoryArchive Logo" src="http://www.spellboundblog.com/wp-content/uploads/2007/11/wiki.png" alt="MemoryArchive Logo" hspace="10" align="left" /></a><a title="A Beautiful WWW" href="http://abeautifulwww.com">A Beautiful WWW</a> posted <a title="A Review of MemoryArchive.org" href="http://abeautifulwww.com/2007/11/05/a-review-of-memoryarchiveorg-3/">A Review of MemoryArchive.org</a>. <a title="MemoryArchive" href="http://www.memoryarchive.org/en/MemoryArchive">MemoryArchive</a>, founded by historian <a title="Wikipedia: Marshall Poe" href="http://en.wikipedia.org/wiki/Marshall_Poe">Marshall Poe</a>, is a new <a title="MediaWiki" href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> based website aimed at collecting first person accounts that they term &#8216;memoirs&#8217;. In sharp contrast with the communal authorship approach of most wikis, MemoryArchive locks down edits of each entry after a format review.</p>
<p>What sorts of memoirs are they looking for? In their <a title="MemoryArchive FAQ" href="http://www.memoryarchive.org/en/FAQ">FAQ</a> they say they want &#8220;pretty much anything you remember that someone else might conceivably find interesting, now or in 500 years&#8221;.</p>
<p>I spent some time exploring. I read a very moving memorial titled <a title="MemoryArchive: Death by AIDS, 1992, by Jay Blotcher" href="http://www.memoryarchive.org/en/Death_by_AIDs%2C_1992%2C_by_Jay_Blotcher"><span style="text-decoration: line-through;">Death by Aids</span> The Goodbye Party, 1992, by Jay Blotcher</a> (<em>ed note: Jay emailed me with the correct title for this memoir</em>). I wandered through some 9/11 memories. Eventually something dawned on me. Maybe it is the fact that I am spending most of my days lately thinking deep thoughts about metadata and classification &#8212; or maybe my archives course work is to blame &#8212; whatever the reason, I realized that I wanted more information about the storytellers. Right now it appears that each memoir includes Who, What, When and Where data &#8211; to whatever degree the contributors choose to furnish such information. Categories are also available and seem to be frequently employed.</p>
<p>But I want to know more about the individuals who are telling the stories. I appreciate that some posts will be made more powerful through anonymity, but for those cases that an individual is willing to share additional biographic information it would be great to have an easy place for that information to be captured.</p>
<p>I think the most interesting aspect of the Memory Archive to the archives community is the <a title="Memory Archive Affiliate Program" href="http://www.memoryarchive.org/en/Become_an_Affiliate">Memory Archive Affiliate Program</a>. The theory behind this program is to support the collection and archiving of personal histories online. It is described as being of interest to the following types of organizations:</p>
<ul>
<li>historical societies (urban, state, or national)</li>
<li>institutions interested in recording their own history (a club, society, or military unit)</li>
<li>educational institutions teaching history (high school or college)</li>
<li>public history projects (oral history gathering, or document collection)</li>
</ul>
<p>This is a powerful idea. Any time you can accumulate a critical mass of of a single type of information on the web (in this case, memoirs) you have the chance of becoming a destination. There is also the added benefit of enabling smaller organizations to launch an online memoir collection initiatives without needing to worry about the technology, costs and people-power that would usually be required.</p>
<p>There does needs to be an easy way for the Memory Archive Affiliates to download these born digital memoirs for offline use and preservation purposes. This could be accomplished by an &#8216;export&#8217; or &#8216;format for printing&#8217; button on each memoir page, or perhaps some form of bulk download for all memoirs collected for a single affiliate&#8217;s project. I will say that the default print format isn&#8217;t bad. It seems to already do some special reformatting (such as displaying URL links in their entirety). I still also would want more metadata, though perhaps the definition of attributes to be collected could be customized per project.</p>
<p>I am curious to see the overall quality of the memoirs a year from now. I suspect that memoirs collected is association with a topically focused program may be more compelling than the average &#8216;man-on-the-net&#8217; first person narratives. That isn&#8217;t to say that there is no value in the memories of someone who feels compelled to share their story &#8211; but a collection created around a theme would have the additional power of that common thread. The affiliate program memoirs would also be more likely to come with some contextual background explaining the source and origin of the solicited accounts. I am a fan the existing thematic memory sites, such as <a title="April 16th Archive" href="http://april16archive.org/">The April 16 Archive</a> and the <a title="Hurricane Digital Memory Bank" href="http://hurricanearchive.org/">Hurricane Digital Memory Bank</a>. I love that the <a title="Omeka" href="http://omeka.org/">Omeka</a> software used to create these two example sites is open source and free. Unfortunately, I don&#8217;t think the average small historical society or public history project is likely to have the resources to build and support a site like this even with free software. I think that a program like the Memory Archive Affiliate Program (or something like it) could bridge the gap for these smaller organizations and make the creation of online memoir collection projects a reality.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/">The MemoryArchive Affiliate Program: A Wiki Engine for Collecting Memoirs</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2007/11/14/the-memoryarchive-affiliate-program-a-wiki-engine-for-collecting-memoirs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WordPress Blog Magic &#8211; A look under the hood</title>
		<link>http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/</link>
		<comments>http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/#comments</comments>
		<pubDate>Fri, 22 Jun 2007 14:25:03 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[blogs]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/</guid>
		<description><![CDATA[Spellbound Blog is served to you via the fabulous open source software that is WordPress. Last night I finally created a page about the major WordPress plugins I have used to customize this blog. If you are interested in such things, take a look at my new WordPress Customizations page. This post is from from: [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/">WordPress Blog Magic &#8211; A look under the hood</a></p>
]]></description>
			<content:encoded><![CDATA[<p>Spellbound Blog is served to you via the fabulous open source software that is <a href="http://wordpress.org/" title="Wordpress">WordPress</a>. Last night I finally created a page about the major WordPress plugins I have used to customize this blog. If you are interested in such things, take a look at my new <a href="http://www.spellboundblog.com/wordpress-customizations/" title="Spellbound Blog WordPress Customizations">WordPress Customizations</a> page.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/">WordPress Blog Magic &#8211; A look under the hood</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2007/06/22/wordpress-blog-magic-a-look-under-the-hood/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>reCAPTCHA: crowdsourcing transcription comes to life</title>
		<link>http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/</link>
		<comments>http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/#comments</comments>
		<pubDate>Mon, 28 May 2007 20:20:02 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[digitization]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[transcription]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/</guid>
		<description><![CDATA[With a tag-line like &#8216;Stop Spam, Read Books&#8217; &#8211; how can you not love reCAPTCHA? You might have already read about it on Boing Boing , NetworkWorld.com or digitizationblog &#8211; but I just couldn&#8217;t let it go by without talking about it. Haven&#8217;t heard about reCAPTCHA yet? Ok.. have you ever filled out an online [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/">reCAPTCHA: crowdsourcing transcription comes to life</a></p>
]]></description>
			<content:encoded><![CDATA[<p>With a tag-line like &#8216;Stop Spam, Read Books&#8217; &#8211; how can you not love <a href="http://recaptcha.net/" title="reCAPTCHA: Stop Spam, Read Books">reCAPTCHA</a>? You might have already read about it on <a href="http://www.boingboing.net/2007/05/24/can-captchas-solve-b.html" title="Boing Boing: Can CAPTCHAs solve book-digitizing?">Boing Boing</a> , <a href="http://www.networkworld.com/community/?q=node/15522" title="NetworkWorld: You might be digitzing books on the Web without knowing it thanks to this stealthy anti-spam technology">NetworkWorld.com</a> or <a href="http://digitizationblog.interoperating.info/?p=390" title="Next-gen captcha aids digitization">digitizationblog</a> &#8211; but I just couldn&#8217;t let it go by without talking about it.</p>
<p>Haven&#8217;t heard about reCAPTCHA yet? Ok.. have you ever filled out an online form that made you look at an image and type the letters or numbers that you see? These &#8216;verify you are a human&#8217; sorts of challenges are used everywhere from on-line concert ticket purchase sites who don&#8217;t want <a href="http://en.wikipedia.org/wiki/Ticket_resale" title="Wikipedia: Ticket Resale (aka scalper or tout)">scalpers</a> to get too many of the tickets to blogs that are trying to prevent spam. What reCAPTCHA has done is harness this user effort to assist in the transcription of hard to <a href="http://en.wikipedia.org/wiki/Optical_character_recognition" title="Wikipedia: OCR: Optical Character Recognition">OCR</a> text from digitized books in the <a href="http://www.archive.org/" title="Internet Archive">Internet Archive</a>. Their website has a great explanation about <a href="http://recaptcha.net/learnmore.html" title="reCAPTCHA: What Is reCAPTCHA">what they are doing</a> &#8211; and they include this great graphic below to show why human intervention is needed.</p>
<p><a href="http://recaptcha.net/learnmore.html" title="Why we need reCAPTCHA"><img src="http://www.spellboundblog.com/wp-content/uploads/2007/05/sample-ocr.gif" alt="Why we need reCAPTCHA" width="487" height="103" /></a></p>
<p>reCAPTCHA shows two words for each challenge &#8211; one that it knows the transcription of and a second that needs human verification. Slowly but surely all the words OCR doesn&#8217;t understand get transcribed and made available for indexing and search.</p>
<p>I have posted before about ideas for transcription using the power of many hands and eyes (see <a href="http://www.spellboundblog.com/2006/10/12/archival-transcriptions-for-the-public-by-the-public/" title="Archival Transcriptions: for the public, by the public">Archival Transcriptions: for the public, by the public</a>) &#8211; but my ideas were more along the lines of what the genealogists are doing on sites like <a href="http://www.rootsweb.com/~usgenweb/" title="USGenWeb Archives">USGenWeb</a>.  It is so exciting to me that a version of this is out there &#8211; and I LOVE their take on it. Rather than find people who want to do transcription, they have taken an action lots of folks are already used to performing and given it more purpose. The statistics behind this are powerful. Apparently 60 million of these challenges are entered every DAY.</p>
<p>Want to try it? Leave a comment on this post (or any post in my blog) and you will get to see and use reCAPTCHA. I can also testify that the installation of this on a WordPress blog is <a href="http://recaptcha.net/plugins/wordpress/" title="Wordpress reCAPTCHA installation">well documented</a>, fast and easy.</p>
<p><a href="http://www.networkworld.com/community/?q=node/15522"></a></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/">reCAPTCHA: crowdsourcing transcription comes to life</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2007/05/28/recaptcha-crowdsourcing-transcription-comes-to-life/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Book Review: Dreaming in Code (a book about why software is hard)</title>
		<link>http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/</link>
		<comments>http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/#comments</comments>
		<pubDate>Fri, 25 May 2007 02:09:44 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[book review]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[electronic records]]></category>
		<category><![CDATA[journalism]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/</guid>
		<description><![CDATA[Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software (or &#8220;A book about why software is hard&#8221;) by Scott Rosenberg Before I dive into my review of this book &#8211; I have to come clean. I must admit that I have lived and breathed the world of software [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/">Book Review: Dreaming in Code (a book about why software is hard)</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.amazon.com/gp/product/1400082471?ie=UTF8&#038;tag=spellboundblog-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=1400082471"><img src="http://www.spellboundblog.com/images/21jY8gsy4zL._AA_.jpg" border="0" alt="" align="left" /></a><img style="border: medium none  ! important; margin: 0px ! important" src="http://www.assoc-amazon.com/e/ir?t=csectionrecov-20&amp;l=as2&amp;o=1&amp;a=1400082463" border="0" alt="" width="1" height="1" /><a href="http://www.amazon.com/gp/product/1400082471?ie=UTF8&#038;tag=spellboundblog-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=1400082471">Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software</a><img src="http://www.assoc-amazon.com/e/ir?t=spellboundblog-20&#038;l=as2&#038;o=1&#038;a=1400082471" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /><br />
(or &#8220;A book about why software is hard&#8221;) by <a title="Scott Rosenberg" href="http://www.wordyard.com/about/">Scott Rosenberg</a></p>
<p>Before I dive into my review of this book &#8211; I have to come clean. I must admit that I have lived and breathed the world of software development for years. I have, in fact, dreamt in code. That is NOT to say that I was programming in my dream, rather that the logic of the dream itself was rooted in the logic of the programming language I was learning at the time (they didn&#8217;t call it Oracle Bootcamp for nothing).</p>
<p>With that out of the way I can say that I loved this book. This book was so good that I somehow managed to read it cover to cover while taking two graduate school courses and working full time. Looking back, I am not sure when I managed to fit in all 416 pages of it (ok, there are some appendices and such at the end that I merely skimmed).</p>
<p><a title="Scott Rosenberg" href="http://www.wordyard.com/about/">Rosenberg</a> reports on the creation of an open source software tool named <a href="http://chandler.osafoundation.org/">Chandler</a>. He got permission to report on the project much as an <a title="Wikipedia: Embedded Journalist" href="http://en.wikipedia.org/wiki/Embedded_journalist">embedded journalist</a> does for a military unit. He went to meetings. He interviewed team members. He documented the ups and downs and real-world challenges of building a complex software tool based on a <a title="Chandler Vision" href="http://chandler.osafoundation.org/1.0_vision.php">vision</a>.</p>
<p>If you have even a shred of interest in the software systems that are generating records that archivists will need to preserve in the future &#8211; read this book. It is well written &#8211; and it might just scare you. If there is that much chaos in the creation of these software systems (and such frequent failure in the process), what does that mean for the archivist charged with the preservation of the data locked up inside these systems?</p>
<p>I have written about some of this before (see <a href="http://www.spellboundblog.com/2007/02/17/understanding-born-digital-records-journalists-and-archivists-with-parallel-challenges/">Understanding Born Digital Records: Journalists and Archivists with Parallel Challenges</a>), but it stands repeating: If you think preserving records originating from standardized packages of off-the-shelf software is hard, then please consider that really understanding the meaning of all the data (and business rules surrounding its creation) in custom built software systems is harder still by a factor of 10 (or a 100).</p>
<p>It is interesting for me to feel so pessimistic about finding (or rebuilding) appropriate contextual information for electronic records. I am usually such an optimist. I suspect it is a case of knowing too much for my own good. I also think that so many attempts at preservation of archival electronic records are in their earliest stages &#8211; perhaps in that phase in which you think you have all the pieces of the puzzle. I am sure there are others who have gotten further down the path only to discover that their map to the data does not bear any resemblance to the actual records they find themselves in charge of describing and arranging. I know that in some cases everything is fine. The records being accessioned are well documented and thoroughly understood.</p>
<p>My fear is that in many cases we won&#8217;t know that we don&#8217;t have all the pieces we need to decipher the data until many years down the road leads me to an even darker place. While I may sound alarmist, I don&#8217;t think I am overstating the situation. This comes from my first hand experience in working with large custom built databases. Often (back in my life as a software consultant) I would be assigned to fix or add on to a program I had not written myself. This often feels like trying to crawl into someone else&#8217;s brain.</p>
<p>Imagine being told you must finish a 20 page paper tonight &#8211; but you don&#8217;t get to start from scratch and you have no access to the original author. You are provided a  theoretically almost complete 18 page paper and piles of books with scraps of paper stuck in them. The citations are only partly done. The original assignment leaves room for original ideas &#8211; so you must discern the topic chosen by the original author by reading the paper itself. You decide that writing from scratch is foolish &#8211; but are then  faced with figuring out what the person who originally was writing this was trying to say. You find 1/2 finished sentences here and there. It seems clear they meant to add entire paragraphs in some sections. The final thorn in your side is being forced to write in a voice that matches that of the original author &#8211; one that is likely odd sounding and awkward for you. About halfway through the evening you start wishing you had started from scratch &#8211; but now it is too late to start over, you just have to get it done.</p>
<p>So back to the archivist tasked with ensuring that future generations can make use of the electronic records in their care. The challenges are great. This sort of thing is hard even when you have the people who wrote the code sitting next to you available to answer questions and a working program with which to experiment. It just makes my head hurt to imagine piecing together the meaning of data in custom built databases long after the working software and programmers are well beyond reach.</p>
<p>Does this sound interesting or scary or relevant to your world? <a href="http://www.amazon.com/gp/product/1400082471?ie=UTF8&#038;tag=spellboundblog-20&#038;linkCode=as2&#038;camp=1789&#038;creative=390957&#038;creativeASIN=1400082471">Dreaming in Code</a> is really a great read. The people are interesting. The issues are interesting. The author does a good job of explaining the inner workings of the software world by following one real world example and grounding it in the landscape of the history of software creation. And he manages to include great analogies to explain things to those looking in curiously from outside of the software world. I hope you enjoy it as much as I did.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/">Book Review: Dreaming in Code (a book about why software is hard)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

