<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Spellbound Blog &#187; software</title>
	<atom:link href="http://www.spellboundblog.com/category/software/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spellboundblog.com</link>
	<description>Archives, Digital Humanities, Cultural Heritage, Technology</description>
	<lastBuildDate>Mon, 06 Feb 2012 14:49:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Digitization Program Site Visit: University of Maryland</title>
		<link>http://www.spellboundblog.com/2011/12/12/digitization-program-site-visit-university-of-maryland/</link>
		<comments>http://www.spellboundblog.com/2011/12/12/digitization-program-site-visit-university-of-maryland/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 04:58:26 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[digitization]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=1194</guid>
		<description><![CDATA[I recently had the opportunity to visit with staff of the University of Maryland, College Park&#8217;s Digital Collections digitization program along with a group of my colleagues from the World Bank. This is a report on that site visit. It is my hope that these details can help others planning digitization projects &#8211; much as [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2011/12/12/digitization-program-site-visit-university-of-maryland/">Digitization Program Site Visit: University of Maryland</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://digital.lib.umd.edu/archivesum/?pid=umd:2258"><img class="size-full wp-image-1200    alignright" style="margin-left: 3px; margin-right: 3px;" title="University Archives, Special Collections, University of Maryland Libraries" src="http://www.spellboundblog.com/wp-content/uploads/2011/11/univarch.000969.0001.jpg" alt="" width="320" height="251" /></a></p>
<p style="text-align: left;">I recently had the opportunity to visit with staff of the University of Maryland, College Park&#8217;s Digital Collections digitization program along with a group of my colleagues from the World Bank. This is a report on that site visit. It is my hope that these details can help others planning digitization projects &#8211; much as it is informing our own internal planning.</p>
<p><strong>Date of Visit:</strong> October 13, 2011</p>
<p><strong>Destination:</strong> University of Maryland, Digital Collections</p>
<p><strong>University of Maryland Hosts:</strong></p>
<ul>
<li><a title="Jennie Levine Knies, Manager, Digital Collections" href="http://www.linkedin.com/in/jenniealevine">Jennie Levine Knies, Manager, Digital Collections</a></li>
<li><a title="Alexandra Carter, Digital Imaging Librarian" href="http://www.linkedin.com/pub/alexandra-carter/1a/814/1b3">Alexandra Carter, Digital Imaging Librarian</a></li>
</ul>
<p><strong>Summary: </strong> This visit was two hours in length and consisted of a one hour presentation and Q&amp;A session with Jennie Levine Knies, Manager of Digital Collections followed by a one hour tour and Q&amp;A session with Alexandra Carter, Digital Imaging Librarian.</p>
<p><strong>Background: </strong>The <a title="Digital Collections" href="http://digital.lib.umd.edu/">Digital Collections of the University of Maryland</a> was launched in 2006 using <a title="Fedora" href="http://www.fedora-commons.org/">Fedora Commons</a>. It is distinct from the ‘Digital Repository at the University of Maryland’, aka <a title="DRUM" href="http://drum.lib.umd.edu/">DRUM</a>, which is built on <a title="DSpace" href="http://www.dspace.org/">DSpace</a>. DRUM contains faculty-deposited documents, a library-managed collection of UMD theses and dissertations, and collections of technical reports. The Digital Collections project focuses on digitization of photographs, postcards, manuscripts &amp; correspondence – mostly based on patron demand. In addition, materials are selected for digitization based on the need for thematic collections to support events, such as their recent civil war exhibition.</p>
<p>After a period of full funding, there has been a fall off in funding which has prevented any additional changes to the Fedora system.</p>
<p>Another project at UMD involves digitization of Japanese childrens&#8217; books (<a href="http://digital.lib.umd.edu/prange.jsp">George W. Prange Collection</a>) and currently uses “in house outsourcing”. In this scenario, contractors bring all their equipment and staff on site to perform the digitization process.</p>
<p><strong>Standard Procedures:</strong></p>
<ul>
<li>Requests must be made using a combination of the ‘Digital Request Cover Sheet’ and ‘<a title="Digital Surrogate Request Sheet" href="http://www.lib.umd.edu/special/forms/digorderform.pdf">Digital Surrogate Request Sheet</a>. These sheets are then reviewed for completeness by the curator under whose jurisdiction the collection falls. Space on the request forms is provided so that the curator may add additional notes to aid in the digitization process. They decide if it is worth digitizing an entire folder when only specific item(s) are requested. Standard policy is to aim for two week turnaround for digitization based on patron request.</li>
<li>The digital request is given a code name for easy reference. They choose these names alphabetically.</li>
<li>Staff are assigned to digitize materials. This work is often done by student workers using one of three <a title="Epson Expression 10000XL" href="http://www.epson.com/cgi-bin/Store/jsp/Product.do?BV_UseBVCookie=yes&amp;sku=E10000XL-PH">Epson 10000 XL</a> flatbed scanners. There is also a <a title="Zeutschel OS 12000" href="http://www.zeutschel.com/products/book_copiers_os12000_bc.html">Zeutschel OS 12000</a> overhead scanner available for materials which cannot be handled by the flatbed scanners.</li>
<li>Alexandra reviews all scans for quality.</li>
<li>Metadata is reviewed by another individual.</li>
<li>When both the metadata &amp; image quality has been reviewed, materials are published online.</li>
</ul>
<p><strong>Improvements/Changes they wish for: </strong></p>
<ul>
<li>Easier way to create a web ‘home’ for collections, currently many do not have a main page and creating one requires the involvement of the IT department.</li>
<li>Option for users to save images being viewed</li>
<li>Option to upload content to their website in PDF format</li>
<li>Way to associate transcriptions with individual pages</li>
<li>More granularity for workflow: currently the only status they have to indicate that a folder or item is ready for review is ‘Pending’. Since there are multiple quality control activities that must be performed by different staff, currently they must make manual lists to track what phases of QA are complete for which digitized content.</li>
<li>Reduce data entry.</li>
<li>Support for description at both the folder and item level at the same time. Currently description is only permitted either at the folder level OR at the item level.</li>
<li>Enable search and sorting by date added to system. This data is captured, but not exposed.</li>
</ul>
<p><strong>Lessons Learned:</strong></p>
<ul>
<li>Should have adopted an existing metadata standard rather than creating their own.</li>
<li>People do not use the ‘browse terms’ – do not spend a lot of time working on this</li>
</ul>
<p><strong>Resources:</strong></p>
<ul>
<li><a title="Digital Content Guidelines" href="http://www.lib.umd.edu/dcr/publications/SelectionCriteriaforDigitalObjects2010.pdf">Digital Content Guidelines: Selection Criteria for Digital Objects</a></li>
</ul>
<p><em>Image Credit:</em> Women students in a green house during a Horticulture class at the University of Maryland, 1925. University Archives, Special Collections, University of Maryland Libraries</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2011/12/12/digitization-program-site-visit-university-of-maryland/">Digitization Program Site Visit: University of Maryland</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2011/12/12/digitization-program-site-visit-university-of-maryland/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Rescuing 5.25&#8243; Floppy Disks from Oblivion</title>
		<link>http://www.spellboundblog.com/2011/07/25/rescuing-5-25-floppy-disks-from-oblivion/</link>
		<comments>http://www.spellboundblog.com/2011/07/25/rescuing-5-25-floppy-disks-from-oblivion/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 02:42:03 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[at risk records]]></category>
		<category><![CDATA[electronic records]]></category>
		<category><![CDATA[future-proofing]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[preservation]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=1006</guid>
		<description><![CDATA[Step-by-step instructions for saving files from 5 1/4" floppy disks.<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2011/07/25/rescuing-5-25-floppy-disks-from-oblivion/">Rescuing 5.25&#8243; Floppy Disks from Oblivion</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spellboundblog.com/wp-content/uploads/2011/07/IMG_4121.jpg"><img class="alignright size-medium wp-image-1158" title="My 5 1/4&quot; Floppy Disks from the 1980s" src="http://www.spellboundblog.com/wp-content/uploads/2011/07/IMG_4121-300x225.jpg" alt="" width="300" height="225" /></a>This post is a careful log of how I rescued data trapped on 5 1/4&#8243; floppy disks, some dating back to 1984 (including those pictured here). While I have tried to make this detailed enough to help anyone who needs to try this, you will likely have more success if you are comfortable installing and configuring hardware and software.</p>
<p>I will break this down into a number of phases:</p>
<ul>
<li>Phase 1: Hardware</li>
<li>Phase 2: Pull the data off the disk</li>
<li>Phase 3: Extract the files from the disk image</li>
<li>Phase 4: Migrate or Emulate</li>
</ul>
<p><strong>Phase 1: Hardware</strong></p>
<p>Before you do anything else, you actually need a 5.25&#8243; floppy drive of some kind connected to your computer.  I was lucky &#8211; a friend had a floppy drive for us to work with. If you aren&#8217;t that lucky, you can generally find them on eBay for around $25 (sometimes less). A friend had been helping me by trying to connect the drive to my existing PC &#8211; but we could never get the communications working properly. Finally I found Device Side Data&#8217;s <a title="5.25&quot; Floppy Drive Controller" href="http://www.deviceside.com/fc5025.html">5.25&#8243; Floppy Drive Controller</a> which they <a title="Buy 5.25&quot; Floppy Drive Controller" href="http://shop.deviceside.com/prod/FC5025">sell online</a> for $55. What you are purchasing will connect your 5.25 Floppy Drive to a USB 2.0 or USB 1.1 port. It comes with drivers for connection to Windows, Mac and Linux systems.</p>
<p>If you don&#8217;t want to mess around with installing the disk drive into our computer, you can also purchase an <a title="Disk drive external enclosure and power supply" href="http://shop.deviceside.com/prod/CASE1">external drive enclosure and a tabletop power supply</a>. Remember, you still need the USB controller too.</p>
<p><em>Update:</em> I just found a <a title="Device Side's Drive Controller operation instructions" href="http://mith.umd.edu/vintage-computers/fc5025-operation-instructions">fantastic step-by-step guide to the hardware installation of Device Side&#8217;s drive controller</a> from the Maryland Institute for Technology in the Humanities (MITH), including tons of photographs, which should help you get the hardware install portion done right.</p>
<p><strong>Phase 2: Pull the data off the disk</strong></p>
<p>The next step, once you have everything installed, is to extract the bits (all those ones and zeroes) off those floppies. I found that creating a new folder for each disk I was extracting made things easier. In each folder I store the disk image, a copy of the extracted original files and a folder named &#8216;converted&#8217; in which to store migrated versions of the files.</p>
<p>Device Side provides software they call &#8216;Disk Image and Browse&#8217;. You can see an assortment of <a title="Disc Image &amp; Browse Screenshots" href="http://www.deviceside.com/screenshots.html">screenshots</a> of this software on their website, but this is what I see after putting a floppy in my drive and launching USB Floppy -&gt; Disk Image and Browse:</p>
<p><a href="http://www.spellboundblog.com/wp-content/uploads/2011/07/disk-image-and-browse.jpg"><img class="aligncenter size-full wp-image-1146" title="Disk Image and Browse" src="http://www.spellboundblog.com/wp-content/uploads/2011/07/disk-image-and-browse.jpg" alt="" width="385" height="337" /></a></p>
<p>You will need to select the &#8216;Disk Type&#8217; and indicate the destination in which to create your disk image. Make sure you create the destination directory <em>before</em> you click on the &#8216;Capture Disk File Image&#8217; button. This is what it may look like in progress:</p>
<p><a href="http://www.spellboundblog.com/wp-content/uploads/2011/07/disk-capture-in-progress.jpg"><img class="aligncenter size-full wp-image-1147" title="Disk Capture in Progress" src="http://www.spellboundblog.com/wp-content/uploads/2011/07/disk-capture-in-progress.jpg" alt="" width="289" height="160" /></a></p>
<p>Fair warning that this won&#8217;t always work. At least the developers of the software that comes with Device Side Data&#8217;s controller had a sense of humor. This is what I saw when one of my disk reads didn&#8217;t work 100%:</p>
<p><a href="http://www.spellboundblog.com/wp-content/uploads/2010/07/capture-disk-image-bummer.jpg"><img class="size-full wp-image-1007 aligncenter" title="Capturing Disk Image File... Bummer!" src="http://www.spellboundblog.com/wp-content/uploads/2010/07/capture-disk-image-bummer.jpg" alt="" width="289" height="159" /></a></p>
<p>If you are pressed for time and have many disks to work your way through, you can stop here and repeat this step for all the disks you have on hand.</p>
<p><strong>Phase 3: Extract the files from the disk image</strong></p>
<p>Now that you have a disk image of your floppy, how do you interact with it? For this step I used a free tool called <a title="Virtual Floppy Drive" href="http://vfd.sourceforge.net/">Virtual Floppy Drive</a>. After I got this installed properly, when my disk image appeared, it was tied to this program. Double clicking on the Floppy Image icon opens the floppy in a view like the one shown below:</p>
<p style="text-align: center;"><a href="http://www.spellboundblog.com/wp-content/uploads/2011/07/vfd-display.jpg"><img class="alignnone size-full wp-image-1143" title="Virtual Floppy Disk Display" src="http://www.spellboundblog.com/wp-content/uploads/2011/07/vfd-display.jpg" alt="" width="501" height="394" /></a></p>
<p style="text-align: left;">It looks like any other removable disk drive. Now you can copy any or all of the files to anywhere you like.</p>
<p><strong>Phase 4: Migrate or Emulate<br />
</strong></p>
<p>The last step is finding a way to open your files. Your choice for this phase will depend on the file formats of the files you have rescued. My files were almost all <a title="WordStar" href="http://www.wordstar.org/">WordStar</a> word processing documents. I found a <a title="tools for converting wordstar files" href="http://www.wordstar.org/index.php/wordstar-file-conversion/wordstar-for-dos">list of tools for converting WordStar files to other formats</a>.</p>
<p>The best one I found was <a title="HABit Version 3" href="http://www.hotkey.net.au/%7Ehambar/habit/wsc-ver3.htm">HABit version 3</a>.</p>
<p>It converts Wordstar files into text or html and even keeps the spacing reasonably well if you choose that option. If you are interested in the content more than the layout, then not retaining spacing will be the better choice because it will not put artificial spaces in the middle of sentences to preserve indentation. In a perfect world I think I would capture it both with layout and without.</p>
<p><strong>Summary</strong></p>
<p>So my rhythm of working with the floppies after I had all the hardware and software installed was as follows:</p>
<ul>
<li>create a new folder for each disk, with an empty &#8216;converted&#8217; folder within it</li>
<li>insert floppy into the drive</li>
<li>run DeviceSide&#8217;s Disk Image and Browse software (found on my PC running Windows under Start -&gt; Programs -&gt; USB Flopy)</li>
<li>paste the full path of the destination folder</li>
<li>name the disk image</li>
<li>click &#8216;Capture Disk Image&#8217;</li>
<li>double click on the disk image and view the files via vfd (virtual floppy drive)</li>
<li>copy all files into the folder for that disk</li>
<li>convert files to a stable format (I was going from WordStar to ASCII text) and save the files in the &#8216;converted&#8217; folder</li>
</ul>
<p>These are the detailed instructions I tried to find when I started my own data rescue project. I hope this helps you rescue files currently trapped on 5 1/4&#8243; floppies. Please let me know if you have any questions about what I have posted here.</p>
<p><em>Update:</em> Another great source of information is Archive Team&#8217;s wiki page on <a title="Archive Team: Rescuing Floppy Disks" href="http://archiveteam.org/index.php?title=Rescuing_Floppy_Disks">Rescuing Floppy Disks</a>.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2011/07/25/rescuing-5-25-floppy-disks-from-oblivion/">Rescuing 5.25&#8243; Floppy Disks from Oblivion</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2011/07/25/rescuing-5-25-floppy-disks-from-oblivion/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>ArchivesZ Needs You!</title>
		<link>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/</link>
		<comments>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/#comments</comments>
		<pubDate>Wed, 07 Jul 2010 04:48:24 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[archival community]]></category>
		<category><![CDATA[ArchivesZ]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[virtual collaboration]]></category>
		<category><![CDATA[what if]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=996</guid>
		<description><![CDATA[I got a kind email today asking &#8220;Whither ArchivesZ?&#8221;. My reply was: &#8220;it is sleeping&#8221; (projects do need their rest) and &#8220;I just started a new job&#8221; (I am now a Metadata and Taxonomy Consultant at The World Bank) and &#8220;I need to find enthusiastic people to help me&#8221;. That final point brings me to [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/">ArchivesZ Needs You!</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.spellboundblog.com/wp-content/uploads/2010/07/Unclesamwantyou2.jpg"><img class="alignright size-full wp-image-997" title="I Want You!" src="http://www.spellboundblog.com/wp-content/uploads/2010/07/Unclesamwantyou2.jpg" alt="" width="288" height="320" /></a>I got a kind email today asking &#8220;Whither ArchivesZ?&#8221;. My reply was: &#8220;it is sleeping&#8221; (projects do need their rest) and &#8220;I just started a new job&#8221; (I am now a Metadata and Taxonomy Consultant at The World Bank) and &#8220;I need to find enthusiastic people to help me&#8221;. That final point brings me to this post.</p>
<p>I find myself in the odd position of having finished my Master&#8217;s Degree and not wanting to sign on for the long haul of a PhD. So I have a big project that was born in academia, initially as a joint class project and more recently as independent research with a grant-funded programmer, but I am no longer in academia.</p>
<p>What happens to projects like ArchivesZ? Is there an evolutionary path towards it being a collaborative project among dispersed enthusiastic individuals? Or am I more likely to succeed by recruiting current graduate students at my former (and still nearby) institution? I have discussed this one-on-one with a number of individuals, but I haven&#8217;t thrown open the gates for those who follow me here online.</p>
<p>For those of you who have been waiting patiently, the <a title="ArchivesZ" href="http://zaphod.mindlab.umd.edu/ArchivesZ/Main.html">ArchivesZ  version 2 prototype</a> is avaiable online. I can&#8217;t promise it will stay  online for long &#8211; it is definitely brittle for reasons I haven&#8217;t  totally identified. A few things to be aware of:</p>
<ul>
<li>when you  load the main page, you should see tags listed at the bottom &#8211; if you  don&#8217;t at all, then drop me an email via my contact form and I will try  and get Tomcat and Solr back up. If you have a small screen &#8211; you may need to  view your browser full screen to get to all the parts of the UI.</li>
<li>I know there are lots of bugs of various sizes. Some paths through  the app work &#8211; some don&#8217;t. Some screens are just placeholders. Feel free  to poke around and try things &#8211; you can&#8217;t break it for anyone else!</li>
</ul>
<p>I think there are a few key challenges to building what I would think of as the first &#8216;full&#8217; version of ArchivesZ &#8211; listed here in no particular order:</p>
<ul>
<li>In the process of creating version 2, I was too ambitious. The current version of ArchivesZ has lots of issues, some usability &#8211; some bugs (see prototype above!)</li>
<li>Wherever a collaborative workspace of ArchivesZ were going to live, it would need large data sets. I did a lot of work on data from eleven institutions in the spring of 2009, so there is a lot of data available &#8211; but it is still a challenge.</li>
<li>A lot of my future ideas for ArchivesZ are trapped in my head. The good news is that I am honestly open to others&#8217; ideas for where to take it in the future.</li>
<li>How do we build a community around the creation of ArchivesZ?</li>
</ul>
<p>I still feel that there is a lot to be gained by building a centralized visualization tool/service through which researchers and archivists could explore and discover archival materials. I even think there is promise to a freestanding tool that supports exploration of materials within a single institution. I can&#8217;t build it alone. This is a good thing &#8211; it will be a much better in the end with the input, energy and knowledge of others. I am good at ideas and good at playing the devil&#8217;s advocate. I have lots of strength on the data side of things and visualization has been a passion of mine for years. I need smart people with new ideas, strong tech skills (or a desire to learn) and people who can figure out how to organize the herd of cats I hope to recruit.</p>
<p>So &#8211; what can you do to help ArchivesZ? Do you have mad Action Script 3 skills? Do you want to dig into the scary little ruby script that populates the database? Maybe you prefer to organize and coordinate? You have always wanted to figure out how a project like this could group from a happy (or awkward?) prototype into a real service that people depend on?</p>
<p>Do you have a vision for how to tackle this as a project? Open source? Grant funded? Something else clever?</p>
<p>Know any graduate students looking for good research topics? There are juicy bits here for those interested in data, classification, visualization and cross-repository search.</p>
<p>I will be at SAA in DC in August chairing a panel on search engine optimization of archival websites. If there is even just one of you out there who is interested, I would cheerfully organize an ArchivesZ summit of some sort in which I could show folks the good, bad and ugly of the prototype as it stands. Let me know in the comments below.</p>
<p>Won&#8217;t be at SAA but want to help? Chime in here too. I am happy to set up some shared desktop tours of whatever you would like to see.</p>
<p>PS: Yes, I do have all the version 2 code &#8211; and what is online at the <a title="Google Code: ArchivesZ" href="http://code.google.com/p/archivesz/">Google Code ArchivesZ page</a> is not up to date. Updating the <a title="ArchivesZ" href="http://www.archivesz.org">ArchivesZ website</a> and uploading the current code is on my to do list!</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/">ArchivesZ Needs You!</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2010/07/07/archivesz-needs-you/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Gridworks: Super Data Cleanup and Exploration Tool</title>
		<link>http://www.spellboundblog.com/2010/05/29/gridworks-data-cleanup-exploration-tool/</link>
		<comments>http://www.spellboundblog.com/2010/05/29/gridworks-data-cleanup-exploration-tool/#comments</comments>
		<pubDate>Sat, 29 May 2010 06:26:31 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[electronic records]]></category>
		<category><![CDATA[information visualization]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[MARAC]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=987</guid>
		<description><![CDATA[In my presentation at the Spring 2010 Mid-Atlantic Regional Archives Conference (MARAC), Whirlwind Tour of Visualization-Land,  I showed some screenshots of a tool called Gridworks. At the time, Gridworks was not available to the general public. The good news is that earlier this month Gridworks 1.0 was officially released and you can get Gridworks right [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/05/29/gridworks-data-cleanup-exploration-tool/">Gridworks: Super Data Cleanup and Exploration Tool</a></p>
]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://code.google.com/p/freebase-gridworks/"><img class="size-full wp-image-988  aligncenter" title="ridworks" src="http://www.spellboundblog.com/wp-content/uploads/2010/05/gridworks.jpg" alt="" width="400" height="100" /></a></p>
<p>In my presentation at the Spring 2010 <a title="MARAC" href="http://www.marac.info">Mid-Atlantic Regional Archives Conference</a> (MARAC), <a title="Whirlwind Tour of Visualization-Land" href="http://www.slideshare.net/JKramerSmyth/marac-2010-visualization">Whirlwind Tour of  Visualization-Land</a>,  I showed some screenshots of a tool called Gridworks. At the time, Gridworks was not available to the general public. The good news is that earlier this month <a title="Gridworks 1.0 Announcment" href="http://blog.freebase.com/2010/05/10/announcing-the-release-of-freebase-gridworks-1-0/">Gridworks 1.0 was officially released</a> and you can <a title="Gridworks on Google Code" href="http://code.google.com/p/freebase-gridworks/">get Gridworks right now</a>.</p>
<p>For those of you who didn&#8217;t see my presentation, Gridworks is tool you run locally on your computer via a web browser. It permits you to load &#8216;grid-shaped data&#8217; for examination, filtering and data cleanup. That makes is sound so much less exciting than it is. The best way to get a sense of what you can do is to watch the <a title="Gridworks Videos" href="http://vimeo.com/groups/gridworks/videos">Gridworks Videos</a>.</p>
<p>What sort of data do I think there is in archives to be pumped  into Gridworks? How about collection descriptive data and electronic  record datasets? Since all the data is kept locally, you don&#8217;t need to worry about uploading your data to some anonymous server in order to work with it. It all stays safely on your local computer the whole time.</p>
<p>A quick list of things that Gridworks can do:</p>
<ul>
<li>Cluster data to find values that are almost the same so you can normalize your data (for example &#8211; NYC vs N.Y.C.)</li>
<li>Create instant facetted browsing based on any column in your data</li>
<li>Provide scatterplots of the values from any two numeric columns as well as a way to spot the most interesting combinations across many possible columns</li>
<li>Reconcilliation and validation of values based on data from within <a title="Freebase.com" href="http://www.freebase.com/">Freebase.com</a></li>
<li>Pull data from Freebase.com based on a matched column &#8211; such as the population of a country, if you have a column in your dataset with country specified</li>
<li>Splitting data within a cell based on a specified delimiter</li>
<li>Application of <a title="Wikipedia: Regular Expressions" href="http://en.wikipedia.org/wiki/Regular_expression">regular expressions</a> and other simple code to data to create new columns</li>
</ul>
<p>This list just scratches the surface, but it should give you a decent idea of the power of Gridworks. Even if the only feature you ever use is the one which lets you cluster and update your data to remove the &#8216;almost the same&#8217; values, Gridworks can save you hours of painstaking data cleanup.</p>
<p>Why is data cleanup exciting? Because once you have nice clean data with all the attributes that are usefull to have for your data set &#8211; then you can start playing with the data in visualization tools! So go watch some <a title="Gridworks Videos" href="http://vimeo.com/groups/gridworks/videos">Gridworks Videos</a>, <a title="Gridworks on Google Code" href="http://code.google.com/p/freebase-gridworks/">get Gridworks for yourself</a> and start playing with data. It is free and it makes working with data fun!</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/05/29/gridworks-data-cleanup-exploration-tool/">Gridworks: Super Data Cleanup and Exploration Tool</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2010/05/29/gridworks-data-cleanup-exploration-tool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Topic Modeling, Auto-Classification and Archival Description</title>
		<link>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/</link>
		<comments>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 06:28:08 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[what if]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=963</guid>
		<description><![CDATA[In an example of Twitter serendipity, @silverasm&#8216;s (Aditi Muralidharan) tweet pointed me to @historying&#8216;s blog post about Topic Modeling. In this post Cameron Blevins explains the results of using the topic modeling feature of UMass Amherst&#8216;s MAchine Learning for LanguagE Toolkit (MALLET) on the text of Martha Ballard’s Diary. I have spent lot of time [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/">Topic Modeling, Auto-Classification and Archival Description</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://mallet.cs.umass.edu/index.php"><img class="alignright size-full wp-image-964" title="MALLET logo" src="http://www.spellboundblog.com/wp-content/uploads/2010/04/logo3.png" alt="" width="215" height="95" /></a>In an example of Twitter serendipity, <a title="Twitter: silverasm" href="http://twitter.com/silverasm">@silverasm</a>&#8216;s (Aditi Muralidharan) <a title="tweet about text mining" href="http://twitter.com/silverasm/statuses/12842112825">tweet</a> pointed me to <a title="Twitter: historying" href="http://twitter.com/historying">@historying</a>&#8216;s <a title="Topic Modeling Martha Ballard’s Diary" href="http://historying.org/2010/04/01/topic-modeling-martha-ballards-diary/">blog post about Topic Modeling</a>. In this post Cameron Blevins explains the results of using the <a title="MALLET: Topic Modeling" href="http://mallet.cs.umass.edu/topics.php">topic modeling</a> feature of <a title="UMass Amherst" href="http://www.umass.edu/">UMass Amherst</a>&#8216;s <a title="MAchine Learning for LanguagE Toolkit" href="http://mallet.cs.umass.edu/index.php">MAchine Learning for LanguagE Toolkit</a> (MALLET) on the text of <a title="Martha Ballard's Diary Online" href="http://dohistory.org/diary/">Martha Ballard’s Diary</a>.</p>
<p>I have spent lot of time thinking about how to generate thematic overviews of groups of archival collections. My information visualization project, <a title="ArchivesZ Blog Posts" href="http://www.spellboundblog.com/category/archivesz/">ArchivesZ</a>, aims to provide ways of understanding aggregated archival description data, both from a single institution or across institutional boundaries. Now I find myself wondering if text mining with a tool like MALLET might generate smart topic groupings more elegantly than fighting with the wide range of non-standardized collection subjects.</p>
<p><strong>Topic Modeling with MALLET</strong></p>
<p>To get a sense of what MALLET generates, see the excerpt below from Blevins&#8217;s post:</p>
<blockquote><p>With some tinkering, MALLET generated a list of thirty topics  comprised of twenty words each, which I then labeled with a descriptive  title. Below is a quick sample of what the program<em> </em>“thinks” are  some of the topics in the diary:</p>
<ul>
<li><strong>MIDWIFERY:</strong> birth deld safe morn receivd calld left  cleverly pm labour fine reward arivd infant expected recd shee born  patient</li>
<li><strong>CHURCH: </strong>meeting attended  afternoon reverend worship foren mr famely performd vers attend public  supper st service lecture discoarst administred supt</li>
<li><strong>DEATH:</strong> day yesterday  informd morn years death ye hear expired expird weak dead las past heard  days drowned departed evinn</li>
<li><strong>GARDENING:</strong> gardin sett  worked clear beens corn warm planted matters cucumbers gatherd potatoes  plants ou sowd door squash wed seeds</li>
</ul>
</blockquote>
<p>He goes on to explain that &#8220;MALLET also allows us to track those topics across the text.&#8221; What if, instead of text mining a diary, we pumped the descriptions of every archival collection from a single institution into MALLET. Of course we would need a good list of stop words including such common terms as archives, history, sources and records. But I wonder how the topics MALLET suggests would compare to the official subjects associated with each collection? Could this give us a broad overview of the topics covered by a specific repository and give us a new way to build paths to the collections based on topic?</p>
<p><strong>Auto-Classification Using Castanet</strong></p>
<p>Text miner <a title="Aditi Muralidharan" href="http://www.cs.berkeley.edu/~aditi/">Aditi Muralidharan</a> also posted recently on this theme in <a title="Castanet: automatically generating a browsing structure for a collection" href="http://mininghumanities.com/2010/04/24/castanet-automatically-generating-a-browsing-structure-for-a-collection/">Castanet: automatically generating a browsing structure for a collection</a> and explains:</p>
<blockquote><p>Castanet automatically carves a sub-structure from the hierarchical  concept dictionary, WordNet (<a href="http://wordnet.princeton.edu/">http://wordnet.princeton.edu</a>),  and matches items in the collection to one or many appropriate places  within that hierarchy. Then, after some automated trimming and  flattening, the result is a hierarchical browsing system.</p></blockquote>
<p>I have heard of Castanet before via the <a title="Flamenco Search Interface Project" href="http://flamenco.berkeley.edu/">Flamenco Search Interface Project</a>. Apparently Muralidharan did a project using Castanet last summer to create <a href="http://go2.wordpress.com/?id=725X1342&amp;site=textdigihum.wordpress.com&amp;url=http%3A%2F%2Forange.sims.berkeley.edu%2Fcgi-bin%2Fflamenco.cgi%2Fflickr%2FFlamenco&amp;sref=http%3A%2F%2Fmininghumanities.com%2F2010%2F04%2F24%2Fcastanet-automatically-generating-a-browsing-structure-for-a-collection%2F">a category system</a> for <a title="Flickr Commons" href="http://www.flickr.com/commons">Flickr Commons</a> images based on the images&#8217;  tags which is then rendered using a Flamenco interface. I include a partial screen-shot below to give you a taste of what the navigation of images feels like a few levels down in the hierarchy. I love the classification of &#8216;Group Action&#8217; then filtered by a sub-classification of &#8216;Commerce&#8217;. The first images shown are of &#8216;horse trading&#8217; &#8211; with additional headings and images beneath them as well as additional filter options on the left.</p>
<p style="text-align: center;"><a title="Flickr Commons: group_action &gt; commerce" href="http://orange.sims.berkeley.edu/cgi-bin/flamenco.cgi/flickr/Flamenco?q=actX:322&amp;group=actX"><img class="aligncenter size-full wp-image-966" title="Flickr Commons Images via Canasta &amp; Flamenco" src="http://www.spellboundblog.com/wp-content/uploads/2010/04/flickr-canasta.jpg" alt="" width="547" height="308" /></a></p>
<p><strong>What If?</strong></p>
<p>What if we pulled all the English language archival descriptions from around the world as our original data set. If we used this data for topic modeling, our subjects clusters would be cross-institutional. Maybe we could map the local institution assigned subjects to the topic model generated topics for each collection and get a sort of automated crosswalk for finding related collections. If we used the local institution assigned subjects from the archival descriptions for Canasta style auto-classification, maybe we could generate a way to hierarchically browse collections topically.</p>
<p>Both MALLET and Flamenco are open source (I am not sure of the status of Castanet) and, as I discovered working on ArchivesZ, many institutions will share their archival description data for a good cause. So &#8211; is this a good cause? I need to tease these ideas out a bit more, but what do you all think of it at first blush? Feasible? Interesting? Worthwhile experiments?</p>
<p><em>Image Credits:</em> MALLET logo from <a title="MALLET Homepage" href="http://mallet.cs.umass.edu/index.php">MALLET homepage</a>. Images in screen shot from <a title="Flickr Commons" href="http://www.flickr.com/commons">Flickr Commons</a> with no known copyright.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/">Topic Modeling, Auto-Classification and Archival Description</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2010/04/27/topic-modeling-auto-classification-archival-description/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>SEO Evaluation of an Archival Website: Looking at UMBC&#8217;s Digital Collections</title>
		<link>http://www.spellboundblog.com/2009/09/12/seo-evaluation-archival-websites-umbc/</link>
		<comments>http://www.spellboundblog.com/2009/09/12/seo-evaluation-archival-websites-umbc/#comments</comments>
		<pubDate>Sat, 12 Sep 2009 07:21:32 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[access]]></category>
		<category><![CDATA[context]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=719</guid>
		<description><![CDATA[Each week brings announcements of archives launching new websites. Today both my email and Twitter told me about  University of Maryland, Baltimore County&#8217;s new Digital Collections site. Who can resist peeking at new materials available online? I have spent much of the past year learning the details of Search Engine Optimization. Usually shortened to SEO, [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/09/12/seo-evaluation-archival-websites-umbc/">SEO Evaluation of an Archival Website: Looking at UMBC&#8217;s Digital Collections</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a title="Flickr Commons Nationaal Archief: Do-It-Yourself-Woman" href="http://www.flickr.com/photos/nationaalarchief/3333357969/"><img class="alignright size-full wp-image-732" title="Flickr Commons: Do-it-yourself-woman" src="http://www.spellboundblog.com/wp-content/uploads/2009/09/3333357969_99f9a5c49a.jpg" alt="Flickr Commons: Do-it-yourself-woman" width="282" height="370" /></a>Each week brings announcements of archives launching new websites. Today both my email and Twitter told me about  <a title="UMBC Digital Collections" href="http://contentdm.ad.umbc.edu/">University of Maryland, Baltimore County&#8217;s new Digital Collections</a> site. Who can resist peeking at new materials available online?</p>
<p>I have spent much of the past year learning the details of <a title="Wikipedia: Search Engine Optimization" href="http://en.wikipedia.org/wiki/Search_engine_optimization">Search Engine Optimization</a>. Usually shortened to SEO, this simply refers to the use of techniques which improve the traffic sent to a website via <a title="Wikipedia: Organic Search" href="http://en.wikipedia.org/wiki/Organic_search">organic search</a>. Want your webpage to show up at the top of the list for a specific search in Google? You want to work on your SEO.</p>
<p>So when I look at new archives website, I can&#8217;t help but keep an eye open for how well the site is optimized for search engines.</p>
<p>I hope that UMBC will forgive me for nitpicking their new site. A lot of their choices are great for SEO,  but they also have room for improvement.</p>
<p><strong>Things Done Well for SEO<br />
</strong></p>
<ul>
<li><strong>Home Page Title &amp; Description</strong>: The site&#8217;s home page has a good meta description. This is the text displayed below the link on a search results page &#8211; as shown below:<img class="alignnone size-full wp-image-723" title="UMBC Digital Collection Google Result" src="http://www.spellboundblog.com/wp-content/uploads/2009/09/umbc_google_result.jpg" alt="UMBC Digital Collection Google Result" width="450" height="83" /></li>
<li><strong>Unique Page Titles At Collection Level</strong>: Each photography collection homepage has a unique page title and a nice block of explanatory text. Google can only read words &#8211; so the more unique text on a page, the better the job Google can do in figuring out what your page is about. Example: <a title="Ardsley Park Album" href="http://contentdm.ad.umbc.edu/ardsley.php">Ardsley Park Album</a></li>
<li><strong>Good <a title="Wikipedia: Anchor Text" href="http://en.wikipedia.org/wiki/Anchor_text">anchor text</a></strong>: (also known as link text) The words used in anchor text tells search engines information about the destination page. For example, the blue text below is anchor text.<a title="Back view of Bretz's portable wet plate case " href="http://contentdm.ad.umbc.edu/u?/georgebretz,63"> </a><a title="Back view of Bretz's portable wet plate case " href="http://contentdm.ad.umbc.edu/u?/georgebretz,63"><img class="size-full wp-image-724 aligncenter" title="UMBC Anchor Text Example" src="http://www.spellboundblog.com/wp-content/uploads/2009/09/UMBC-anchor-text.jpg" alt="UMBC Anchor Text Example" width="215" height="191" /></a></li>
</ul>
<p><strong>Areas for SEO Improvement</strong></p>
<ul>
<li><strong>Unique Page Titles At Item Level</strong>: Individual images and documents all use a generic page title such as &#8216;UMBC | Digital Archive | Document Viewer&#8217;. Document Example: <a title="Accidental Death of an Anarchist" href="http://contentdm.ad.umbc.edu/u?/theatreprod,1080">Accidental Death of an Anarchist</a> Image Example: <a title="Image: 10 year old Bootblack" href="http://contentdm.ad.umbc.edu/u?/hinecoll,3957">10 year old Bootblack</a></li>
<li><strong>H1 Tags</strong>: In the HTML of each page, the dominant heading of the page should use the &lt;h1&gt; tag. This helps Google know the phrase you are targeting with this page. It is your 2nd best place to emphasize your content after the page title. In the case of the item pages, there seems to often be a headline type title at the top of the page &#8211; but it currently is not an demarcated with an &lt;h1&gt; tag.</li>
<li><strong>Think About Search Results and Indexing</strong>: Pages displaying <a title="UMBC Digital Collections: Search for Bootblack" href="http://contentdm.ad.umbc.edu/cdm4/results.php?CISOOP1=all&amp;CISOBOX1=bootblack&amp;CISOFIELD1=CISOSEARCHALL&amp;CISOOP2=exact&amp;CISOBOX2=&amp;CISOFIELD2=CISOSEARCHALL&amp;CISOOP3=any&amp;CISOBOX3=&amp;CISOFIELD3=CISOSEARCHALL&amp;CISOOP4=none&amp;CISOBOX4=&amp;CISOFIELD4=CISOSEARCHALL&amp;CISOROOT=all&amp;t=a">results of internal searches</a> on your site are not likely to be useful as indexed pages in Google. The thinking here is that they can dilute the focus on the item and collection level pages on your site if Google also has many search results pages in the index. If UMBC wanted their search pages to be indexed, then those pages&#8217; URLs should be simplified and the search results pages need a page title that somehow includes the search criteria. There are two ways that I know of to disable this indexing &#8211; <a title="Wikipedia: Robots Exclusion Standard" href="http://en.wikipedia.org/wiki/Robots_exclusion_standard">blocking via the site&#8217;s robots.txt file</a> or via a <a title="Robots Meta Tag" href="http://www.robotstxt.org/meta.html">robots meta tag</a> in the header of the search results page. Both of these methods tell obliging search engines to not crawl certain parts of your site.</li>
</ul>
<p><strong>Final Thoughts<br />
</strong></p>
<p>There are plenty of other things that UMBC could do to support this new website. They could create an XML sitemap of all their pages and submit it to Google (maybe they already have). They might re-title some of their pages based on using a tool like <a title="Google Insight into Search" href="http://www.google.com/insights/search/#">Google Insight</a> to see what variations of a phrase is searched on most frequently. My goal here was to give you a taste of the sorts of things that catch my eye. Also, SEO is still more of an art than a science &#8211; so you will sometimes notice that what one SEO expert recommends is the opposite of what the next expert would tell you.</p>
<p>In many cases changes, such as the Unique Page Title at the Item Level mentioned above, may not even be possible due to software or programmer resource limitations. The trick is to take advantage of every option that is available. There are also trade-offs to be made. UMBC&#8217;s site provides some very slick interfaces for viewing the details of a group of documents, such as <a title="Theatre Department Production Materials Archive" href="http://contentdm.ad.umbc.edu/cdm4/browse.php?CISOROOT=/theatreprod">theater programs and other materials related to a theatrical production</a>. The imlementation elegantly handles the situation of multiple scanned images which relate to a coherent set of documents. Sometimes you can&#8217;t have both your innovative UI and perfect SEO. Then it gets down to what your goals are for your website. Are you trying to make a specific community of existing users happy by providing them with tools they can use? Or does your mission focus more on reaching out to a broader audience?</p>
<p>There is no silver bullet to search engine optimization. It just takes knowledge of the available tools and techniques combined with a willingness to keep learning and experimenting. Like the &#8216;<a title="Doe-het-zelf vrouw /Do-it-yourself-woman" href="http://www.flickr.com/photos/nationaalarchief/3333357969/">Do-It-Yourself-Woman</a>&#8216; pictured above in the <a title="Flickr Commons: Nationaal Archief" href="http://www.flickr.com/people/nationaalarchief/">Nationaal Archief</a>&#8216;s photo I found out on the Flickr Commons, you too can learn the basics and do-it-yourself. A great starting point is <a title="Google SEO Guide" href="http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf">Google&#8217;s free SEO Guide</a>. Also, please remember that the best time to plan your SEO strategy is before you have built your site in the first place!</p>
<p>I would love to do research on how much progress archives websites can make in their organic search traffic after SEO improvements. My thinking is to take a snapshot of a month of <a title="Wikipedia: Analytics" href="http://en.wikipedia.org/wiki/Analytics">analytics</a> (the statistics that tell you how many people are visiting your website) and then apply some SEO inspired changes. After a suitable delay (it takes some time for SEO to do its job) we consider another month of analytics to determine any change in organic traffic.</p>
<p>Do you want me to do a quick review of your archives website to see if there is room for SEO improvement? Please <a title="Contact Jeanne" href="http://www.spellboundblog.com/contact/">contact me</a> or add a comment to this post. I feel like there is a conference presentation in all this if we can find a good set of websites to optimize.</p>
<p>Finally, thank you to unsuspecting UMBC &#8211; your new website really is beautiful.</p>
<p><em>Image credit: <a title="Doe-het-zelf vrouw /Do-it-yourself-woman" href="http://www.flickr.com/photos/nationaalarchief/3333357969/">Doe-het-zelf vrouw /Do-it-yourself-woman</a> from Nationaal Archief on Flickr Commons.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/09/12/seo-evaluation-archival-websites-umbc/">SEO Evaluation of an Archival Website: Looking at UMBC&#8217;s Digital Collections</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2009/09/12/seo-evaluation-archival-websites-umbc/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>A History of Our Own, Representing Communities and Identities on the Web (SAA09: Session 202)</title>
		<link>http://www.spellboundblog.com/2009/09/08/representing-communities-and-identities-on-the-web-saa09-session-202/</link>
		<comments>http://www.spellboundblog.com/2009/09/08/representing-communities-and-identities-on-the-web-saa09-session-202/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 06:21:27 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[archival community]]></category>
		<category><![CDATA[diversity]]></category>
		<category><![CDATA[SAA2009]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[virtual collaboration]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=683</guid>
		<description><![CDATA[Andrew Flinn, University College London (UCL), was the second speaker during SAA09&#8242;s Session 202 with his presentation &#8216;A History of Our Own, Representing Communities and Identities on the Web&#8217;. Flinn began with the idea that archives are &#8220;a place for creating and re-working memory&#8221;. While independent community archives are constituted around many purposes, Flinn&#8217;s main [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/09/08/representing-communities-and-identities-on-the-web-saa09-session-202/">A History of Our Own, Representing Communities and Identities on the Web (SAA09: Session 202)</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/library_of_congress/2178249475/"><img class="alignright size-full wp-image-696" title="LOC Flickr Commons: Sylvia Sweets Tea Room" src="http://www.spellboundblog.com/wp-content/uploads/2009/09/sylvia-sweets-tea-room.jpg" alt="LOC Flickr Commons: Sylvia Sweets Tea Room" width="367" height="256" /></a><a title="Andrew Flinn" href="http://www.ucl.ac.uk/infostudies/andrew-flinn/">Andrew Flinn</a>, <a title="University College London" href="http://www.ucl.ac.uk/">University College London</a> (UCL), was the second speaker during <a title="SAA09 Session 202" href="http://saa.archivists.org/Scripts/4Disapi.dll/4DCGI/events/eventdetail.html?Action=Events_Detail&amp;Time=2192824&amp;SessionID=5763479740t67v3mg40224c6jc6w174s2g25g1687899940v3qm48167945yiyde&amp;InvID_W=1057">SAA09&#8242;s Session 202</a> with his presentation &#8216;A History of Our Own, Representing Communities and Identities on the Web&#8217;. Flinn began with the idea that archives are &#8220;a place for creating and re-working memory&#8221;. While independent community archives are constituted around many purposes, Flinn&#8217;s main interest is in communities focused on absences and mis-representation of a group or event in history. Communities in which there is a cultural, politcal, or artistic activism. Some of these communities may be considered &#8216;movements&#8217;.</p>
<p><strong>How should/can archivists support local archiving activities?</strong></p>
<p>Part of the challenge of online communities is the need to capture the interactions in order to not loose the full picture. The<a title="UK National Listing of Community Archives" href="http://www.communityarchives.org.uk/"> National Listing of Community Archives in the UK</a>&#8216;s website states that they &#8220;seek to document the history of all manner of local, occupations, ethnic, faith and other diverse communities&#8221;.</p>
<p>The UCL&#8217;s <a title="ICARUS" href="http://www.ucl.ac.uk/infostudies/research/icarus/">International Centre for Archives and Records Management Research and User Studies</a> (ICARUS) &#8220;brings together researchers in user access and description, community archives and identity, concepts and contexts of records and archives, and information policy&#8221;. Flinn is the Principal Investigator on the ICARUS project <a title="Community archives and identities: documenting and sustaining community heritage" href="http://www.ucl.ac.uk/infostudies/research/icarus/community-archives/">Community archives and identities</a> which focuses on in depth interviews of 4 institutions which are &#8220;documenting and sustaining community heritage&#8221;.</p>
<p>These are some example online community sites:</p>
<ul>
<li><a title="Rukus" href="http://www.rukus.co.uk/content/view/12/27/">rukus</a> &#8211; black gblt archives</li>
<li><a title="Moroccan Memories in Britain" href="http://www.moroccanmemories.org.uk/">Moroccan Memories in Britain</a></li>
<li><a title="Eastside Community Heritage" href="http://www.hidden-histories.org.uk/">eside community</a> &#8211; east side working class community in London</li>
</ul>
<p><strong>Main Findings</strong></p>
<ul>
<li>proceed from a position that &#8216;knowing your own history&#8217; is beneficial their communities as well as to the public at large</li>
<li>the quality of the work is done by individual passion and sacrifice, voluntary</li>
<li>there is ambivalence to/about the mainstream archives sector &#8212; keen to work with mainstream archives, but scarred by past bad experiences</li>
<li>good practices now could lead to partnerships in the future</li>
<li>these are living archives &#8212; not static.. still alive and growing</li>
<li>these ideas prompt re-evaluation of conventional archives thinking</li>
<li>lots of access to digital objects &#8211; perhaps movement to online existence</li>
</ul>
<p>We need to understand that these communities evolve and are fluid. They have as broad variety of structures, sizes and methods of working. What are the patterns in participation &amp; ownership?</p>
<p>The site <a title="Urban 75" href="http://www.urban75.com/">urban 75</a> has hosted extended discussions about recent UK history. Efforts include identification of places and people in uploaded photos. The site connects people about issues about housing and local services &#8211; it is very practical but it also has evolved to include this historical documentation. One example post from the Brixton Forum shows a <a title="urban75: Old shop front revealed on Atlantic Road " href="http://www.urban75.net/vbulletin/showthread.php?t=300449">discussion about an Old shop front revealed on Atlantic Road</a>.</p>
<p><strong>A Short Aside</strong></p>
<p>Next Flinn apologized for taking his talk slightly off script. Setting his papers aside, he spoke to the audience about the <a title="eXHulme" href="http://www.exhulme.co.uk/">eXHulme</a> website which he had discovered the evening before while finishing his presentation. Having lived in Hulme, Manchester himself, he felt a great impact from looking through the site. He spent 4 hours looking at it &#8211; including photos such as the <a title="travellers living in their buses parked - otteburn close 1996" href="http://i34.tinypic.com/2z8u9t2.jpg">travellers living in their buses parked &#8211; otteburn close 1996</a> seen at the bottom of <a title="eXHulme Page" href="http://www.exhulme.co.uk/page2.php">this page</a>. His discovery and exploration of this site gave him a greater personal understanding of the impact of these types of community documentation projects. I felt he would have been happy to keep talking about this site and the directions it had sent his thoughts &#8212; but he then got back to his papers and continued.</p>
<p><strong>Building Community Online</strong></p>
<p>Interactions online are the historic record of the community itself. Archives evolve and change as the community builds and edits their online content. These heritage and archive sites work to shift from the idea of visitors to engaging users in interaction &#8212; they need users of the website to feel part of the community.</p>
<p>Examples of sites building community online:</p>
<ul>
<li><a title="My Brighton and Hove" href="http://www.mybrightonandhove.org.uk/index.aspx">My Brighton and Hove</a> &#8211; community history site</li>
<li><a title="Remembering Olive Collective" href="http://rememberolivemorris.wordpress.com/">Remembering Olive Collective</a> &#8211; &#8220;social production of collective knowledge&#8221;</li>
<li><a title="The Newham Story" href="http://www.newhamstory.com/">The Newham Story</a> &#8212; uses social tagging</li>
</ul>
<p>How do you successfully encourage participation (rather than large number of passive observers) which is crucial to the success of these types of initiatives? Lurking without contributing is easy &#8211; even if joining requires action. The rate of uptake may correspond with the sense of ownership. Heritage projects might encourage and sustain such participation. See Elisa Giaccardi &amp; Leysia Palen&#8217;s article  &#8211; <a title="The Social Production of Heritage through Cross-media Interaction: Making Place for Place-making " href="http://x.i-dat.org/~eg/research/pdf/GiaccardiPalen_IJHS08.pdf">The Social Production of Heritage through Cross-media Interaction: Making Place for Place-making</a>.<cite></cite></p>
<p><strong>Suggestions</strong></p>
<ul>
<li>encourage conversation and treat all stories as having value &#8211; value every account</li>
<li>promote a sense of ownership once a story has been shared</li>
<li>allow for multiple ways to engage with and share content and memories</li>
<li>recognize and let users shift from observer to active member</li>
</ul>
<p><strong>Flinn&#8217;s Conclusions</strong></p>
<ul>
<li>What are the challenges and perils facing community archives? Lack of resources. People are doing these things in unsustainable ways</li>
<li>Why should we sustain independent community archives? Benefit to individuals, communities and broader society.</li>
<li>What can professional archivists do? Support and partnership with groups seeking this sort of partnership.</li>
</ul>
<p><strong>My Thoughts</strong></p>
<p>The image I included above is from the Library of Congress&#8217;s Flickr Commons project. If you <a title="Flickr Commons: Sylvia Sweets Tea Room" href="http://www.flickr.com/photos/library_of_congress/2178249475/">read through the comments on this photo</a> you can see a diverse group of individuals come together to document the history of Sylvia Sweets Tea Room. This is just another example of the process of documentation being as interesting as the original image itself.</p>
<p>There is still so much to learn in the arena of building productive online communities. Archivists working through how to archive what online communities create will need to understand how the process of creation is documented via various software tools. As the techniques for encouraging participation evolve &#8211; archivists will need to evolve right along with them. I think it is interesting to envision archivists working in this space and supporting these types of communities &#8212; becoming as much the champions of the community itself as preservers of a community&#8217;s collaborative creations.</p>
<p><em>Image Credit:</em> <a title="Flickr Commons Library of Congress: Sylvia Sweets Tea Room, corner of School and Main streets, Brockton, Mass" href="http://www.flickr.com/photos/library_of_congress/2178249475/">Flickr Commons Library of Congress: Sylvia Sweets Tea Room, corner of School and Main streets, Brockton, Mass</a></p>
<p><em>As is the case with all my session summaries from <a title="SAA2009 Posts" href="../category/saa2009/">SAA2009</a>, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via <a title="Contact Jeanne" href="../contact/">my contact form</a>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/09/08/representing-communities-and-identities-on-the-web-saa09-session-202/">A History of Our Own, Representing Communities and Identities on the Web (SAA09: Session 202)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2009/09/08/representing-communities-and-identities-on-the-web-saa09-session-202/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Archivists and New Technology: When Do The Records Matter?</title>
		<link>http://www.spellboundblog.com/2009/06/06/archivists-and-new-technology-when-do-the-records-matter/</link>
		<comments>http://www.spellboundblog.com/2009/06/06/archivists-and-new-technology-when-do-the-records-matter/#comments</comments>
		<pubDate>Sat, 06 Jun 2009 23:52:28 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[appraisal]]></category>
		<category><![CDATA[at risk records]]></category>
		<category><![CDATA[born digital records]]></category>
		<category><![CDATA[electronic records]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[what if]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=216</guid>
		<description><![CDATA[Navigating the rapidly changing landscape of new technology is a major challenge for archivists. As quickly as new technologies come to market, people adopt them and use them to generate records. Businesses, non-profits and academic institutions constantly strive to find ways to be more efficient and to cut their budgets. New technology often offers the [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/06/06/archivists-and-new-technology-when-do-the-records-matter/">Archivists and New Technology: When Do The Records Matter?</a></p>
]]></description>
			<content:encoded><![CDATA[<p>Navigating the rapidly changing landscape of new technology is a major challenge for archivists. As quickly as new technologies come to market, people adopt them and use them to generate records. Businesses, non-profits and academic institutions constantly strive to find ways to be more efficient and to cut their budgets. New technology often offers the promise of cost reductions. In this age of constantly evolving software and technological innovation, how do archivists know when a new technology is important or established enough to take note of? When do the records generated by the latest and greatest technology matter enough to save?</p>
<p>Below I have include two diagrams that seek to illustrate the process of adopting new technology. I think they are both useful in aiding our thinking on this topic.</p>
<p>The first is the &#8220;<a title="Hype Cycle" href="http://www.gartner.com/pages/story.php.id.8795.s.8.jsp">Hype Cycle</a>&#8220;, as <a title="WordSpy: Hype Cycle" href="http://www.wordspy.com/words/hypecycle.asp">proposed by analyst Jackie Fenn at Gartner Group</a>. It breaks down the phases that new technologies move through as they progress from their initial concept through to broad acceptance in the marketplace. The generic version of the Hype Cycle diagram below is from the <a title="Wikipedia: Hype Cycle" href="http://en.wikipedia.org/wiki/Hype_cycle">Wikipedia entry on hype cycle</a>.</p>
<p><a title="Gartner Hype Cycle (Jeremy Kemp via Wikipedia)" href="http://en.wikipedia.org/wiki/File:Gartner_Hype_Cycle.svg"><img class="aligncenter size-full wp-image-557" title="Gartner Hype Cycle (Wikipedia)" src="http://www.spellboundblog.com/wp-content/uploads/2009/06/559px-gartner_hype_cyclesvg.png" alt="Gartner Hype Cycle (Wikipedia)" width="484" height="314" /></a></p>
<p>Each summer, Gartner comes out with a new update on <a title="Tech Crunch: Where Are We In The Hype Cycle?" href="http://www.techcrunch.com/2008/08/18/where-are-we-in-the-hype-cycle/">Where Are We In The Hype Cycle?</a>. Last summer, microblogging was just entering the &#8216;Peak of Inflated Expectations&#8217;, public virtual worlds were sliding down into the &#8216;Trough of Disillusionment&#8217; and location aware applications were climbing back up the &#8216;Slope of Enlightenment&#8217;. There is even a book about it:<a href="http://www.amazon.com/gp/product/1422121100?ie=UTF8&amp;tag=spellboundblog-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1422121100"> Mastering the Hype Cycle: How to Choose the Right Innovation at the Right Time</a><img style="border:none !important; margin:0px !important;" src="http://www.assoc-amazon.com/e/ir?t=spellboundblog-20&amp;l=as2&amp;o=1&amp;a=1422121100" border="0" alt="" width="1" height="1" />.</p>
<p>The other diagram is the Technology Adoption Lifecycle from Geoffrey Moore&#8217;s <a href="http://www.amazon.com/gp/product/0060517123/?ie=UTF8&amp;tag=spellboundblog-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1422121100">Crossing the Chasm</a>. This perspective on the technology cycle is from the perspective of bringing new technology to market. How do you cross the chasm between early adopters and the general population?</p>
<p><a title="Technology Adoption Lifecycle (Craig Chelius via Wikipedia)" href="http://en.wikipedia.org/wiki/File:Technology-Adoption-Lifecycle.png"><img class="aligncenter size-full wp-image-554" title="Technology Adoption Lifecycle (Wikipedia)" src="http://www.spellboundblog.com/wp-content/uploads/2009/06/800px-technology-adoption-lifecycle.png" alt="Technology Adoption Lifecycle (Wikipedia)" width="515" height="205" /></a></p>
<p>Archivists need to consider new technology from two different perspectives. When to use it to further their own goals as archivists and when to address the need to preserve records being generated by new technology. A fair bit of attention has been focused on figuring out how to get archivists up to speed on new web technology. In August 2008, ArchivesNext posted about <a title="ArchivesNext: Searching for 2.0-related sessions at the SAA Annual Meeting" href="http://www.archivesnext.com/?p=183">hunting for Web 2.0 related sessions</a> at SAA2008 and <a href="http://friendstoldme.blogspot.com/">Friends Told Me I Needed A Blog</a> posted about <a href="http://friendstoldme.blogspot.com/2008/08/hype-cycles-and-saa.html">SAA and the Hype Cycle</a> shortly thereafter.</p>
<p>But how do we know when a technology is &#8216;important enough&#8217; to start worrying about the records it generates? Do we focus our energy on technology that has crossed the chasm and been adopted by the &#8216;early majority&#8217;? Do we watch for signs of adoption by our target record creators?</p>
<p>I expect that the answer (such as there can be one answer!) will be community specific. As I learned in the <a title="SAA2007: Preserving Born Digital Records of the Design Community" href="http://www.spellboundblog.com/2007/09/08/saa2007-preserving-born-digital-records-of-the-design-community-session-106/">2007 SAA session about preserving digital records of the design community</a>, waiting for a single clear technology or software leader to appear can lead to lost or inaccessible records. Archivists working with similar records already come together to support one another through <a title="SAA Round Tables" href="http://saa.archivists.org/Scripts/4Disapi.dll/4DCGI/committees/Roundtables.html?Action=List_Committees&amp;CommWGStatus=Roundtables">round tables</a>, mailing lists and conference sessions. I have noticed that I often find the most interesting presentations are those that discuss the challenges a specific user community is facing in preserving their digital records. The <a title="SAA2008: Preservation and Experimentation with Analog/Digital Hybrid Literary Collections" href="http://www.spellboundblog.com/2008/09/06/saa2008-preservation-and-experimentation-with-analogdigital-hybrid-literary-collections-session-203/">2008 SAA session about hybrid analog/digital literary collections</a> discussed issues related to digital records from authors. Those who worry about records captured in geographic information systems (GIS) were <a title="The Edges of the GIS Electronic Record" href="http://www.spellboundblog.com/2007/01/02/the-edges-of-the-gis-electronic-record/">trying to sort out how to define a single GIS electronic record</a> when last I dipped my toes into their corner of the world in the Fall of 2006.</p>
<p>It is not feasible to imagine archivists staying ahead of every new type of technology and attempting to design a method for archiving every possible type of digital records being created. What we can do is make it a priority for a designated archivist within every &#8216;vertical&#8217; community (government, literary, architecture&#8230; etc) to keep their ear to the ground about the use of technology within that community. This could be a community of practice of its own. A group that shares info about the latest trends they are seeing while sharing their best practices for handling the latest types of records being seen.</p>
<p>The good news is that archivists aren&#8217;t the only ones who want to be able to preserve access to born digital records. Consider <a title="Twitter" href="http://www.twitter.com">Twitter</a>, which only provides easy access to recent <a title="About.com: What is a tweet?" href="http://webtrends.about.com/od/glossary/g/what-is-a-tweet.htm">tweets</a>. A whole raft of <a title="MakeUseOf.com: How To Backup Your Twitter Archive" href="http://www.makeuseof.com/tag/how-to-backup-your-twitter-archive/">third-party tools built to archive data from Twitter</a> are already out there, answering the demand for a way to backup people&#8217;s tweets.</p>
<p>I don&#8217;t think archivists always have the luxury of waiting for technology to be adopted by the majority of people and to reach the &#8216;Plateau of Productivity&#8217;. If you are an archivist who works with a community  that uses cutting edge technology, you owe it to your community to stay in the loop with how they do their work now. Just because most people don&#8217;t use a specific technology doesn&#8217;t mean that an individual community won&#8217;t pick it up and use to the exclusion of more common tools.</p>
<p>The design community mentioned above spoke of working with those creating the tools for their community to ensure easy archiving down the line. In our fast paced world of innovation, a subset of archivists need to stay involved with the current business practices of each vertical being archived. This group can work together to identify challenges, brainstorm solutions, build relationships with the technology communities and then <span class="yedhdr">disseminate best practices throughout the archives community. I did find a web page for the SAA&#8217;s <a title="Technology Best Practices Task Force" href="http://www.archivists.org/saagroups/bptf/tech_best_practices_tf.asp">Technology Best Practices Task Force</a> and its document <a title="Managing Electronic Records and Assets: A Working Bibliography" href="http://www.archivists.org/saagroups/bptf/index.asp">Managing Electronic Records and Assets: A Working Bibliography</a>, but I think that I am imagining something more ongoing, more nimble and more tied into each of the major communities that archivists must support. Am I describing something that already exists?<br />
</span></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/06/06/archivists-and-new-technology-when-do-the-records-matter/">Archivists and New Technology: When Do The Records Matter?</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2009/06/06/archivists-and-new-technology-when-do-the-records-matter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NARA Outgrows ARC: Researching New Catalog Software Options</title>
		<link>http://www.spellboundblog.com/2009/03/19/nara-outgrows-arc-researching-catalog-software/</link>
		<comments>http://www.spellboundblog.com/2009/03/19/nara-outgrows-arc-researching-catalog-software/#comments</comments>
		<pubDate>Thu, 19 Mar 2009 04:00:51 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[digitization]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=388</guid>
		<description><![CDATA[The Archival Research Catalog (ARC) of the US National Archives and Records Administration (NARA) needs to be replaced. NARA has put out an official Request for Information (RFI) and plans a &#8220;Vendor Day&#8221; for April 6th with final responses required by April 24th, 2009. This is exciting for two very different reasons: New catalog software! [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/03/19/nara-outgrows-arc-researching-catalog-software/">NARA Outgrows ARC: Researching New Catalog Software Options</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a title="NARA: ARC" href="http://www.archives.gov/research/arc/"><img class="alignright size-full wp-image-389" title="ARC Logo" src="http://www.spellboundblog.com/wp-content/uploads/2009/03/search-box-logo.gif" alt="ARC Logo" width="165" height="64" /></a>The <a title="NARA: Archival Research Catalog" href="http://www.archives.gov/research/arc/">Archival Research Catalog</a> (ARC) of the US <a title="NARA" href="http://www.archives.gov">National Archives and Records Administration</a> (NARA) <a title="Federal Computer Week: NARA wants new catalog" href="http://fcw.com/articles/2009/03/16/nara-rfi.aspx">needs to be replaced</a>. NARA has put out an official <a title="NARA: RFI about ARC" href="https://www.fbo.gov/index?s=opportunity&amp;mode=form&amp;id=021aa8560a3ad1e0b8080d387afca195&amp;tab=core&amp;_cview=0&amp;cck=1&amp;au=&amp;ck=">Request for Information (RFI)</a> and plans a &#8220;Vendor Day&#8221; for April 6th with final responses required by April 24th, 2009.</p>
<p>This is exciting for two very different reasons:</p>
<ol>
<li>New catalog software!</li>
<li>Getting to read all the gory details about ARC!</li>
</ol>
<p>If this makes you curious, then go give the <a title="RFI" href="https://www.fbo.gov/index?s=opportunity&amp;mode=form&amp;id=021aa8560a3ad1e0b8080d387afca195&amp;tab=core&amp;_cview=0&amp;cck=1&amp;au=&amp;ck=">RFI</a> a read, but here are some juicy ARC tidbits to consider:</p>
<ul>
<li><a title="ARC Logical Data Model" href="https://www.fbo.gov/utils/view?id=0740aaca1594765177ef67aa26636d07">ARC&#8217;s Logical Data Model</a> &#8211; 20 pages worth of data model that I am sorely tempted to print out, tape together and hang on a very large wall</li>
<li>ARC was built as a customization of <a title="OCLC: OLIB" href="http://www.oclc.org/uk/en/olib/default.htm">OLIB</a> back in 2003 and has been upgraded along the way</li>
<li>ARC currently contains 2,478,259 archival descriptions and 8,810,938 authority records</li>
<li>An average of 25,000 archival descriptions are added to ARC each week</li>
</ul>
<p>The RFI states: &#8220;NARA has outgrown the existing ARC system and requires a more robust solution that’s capable of scaling to support at least 250 million archival descriptions and links to upwards of 500 million digital copies over the next 4-7 years.&#8221; Why so many records? Because all of NARA&#8217;s partners are digitizing records so quickly that they are creating a massive backlog of documents and the future only holds more of the same.</p>
<p>This RFI is only for planning purposes, but I will definitely be following this story as it unfolds.</p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/03/19/nara-outgrows-arc-researching-catalog-software/">NARA Outgrows ARC: Researching New Catalog Software Options</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2009/03/19/nara-outgrows-arc-researching-catalog-software/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ArchivesZ Data Challenges: Oregon State University Archives</title>
		<link>http://www.spellboundblog.com/2009/02/22/archivesz-data-challenges-oregon-state-university/</link>
		<comments>http://www.spellboundblog.com/2009/02/22/archivesz-data-challenges-oregon-state-university/#comments</comments>
		<pubDate>Sun, 22 Feb 2009 07:48:45 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[ArchivesZ]]></category>
		<category><![CDATA[EAD]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/?p=344</guid>
		<description><![CDATA[The Oregon State University Archives has generously contributed 356 of their finding aids in EAD format for use in the development of version 2 of ArchivesZ. This is my first post in a what will likely be a series of looks behind the scenes at the challenges facing a project like ArchivesZ on the data [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/02/22/archivesz-data-challenges-oregon-state-university/">ArchivesZ Data Challenges: Oregon State University Archives</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://osulibrary.oregonstate.edu/archives/"><img class="alignright size-full wp-image-346" title="OSU Archives" src="http://www.spellboundblog.com/wp-content/uploads/2009/02/osu_archives_home1.jpg" alt="OSU Archives" width="233" height="179" /></a>The <a title="Oregon State University Archives" href="http://osulibrary.oregonstate.edu/archives/archive/">Oregon State University Archives</a> has generously contributed 356 of their finding aids in EAD format for use in the development of version 2 of <a title="ArchivesZ" href="http://www.archivesz.com/">ArchivesZ</a>. This is my first post in a what will likely be a series of looks behind the scenes at the challenges facing a project like ArchivesZ on the data level.</p>
<p>Version one of ArchivesZ only used finding aids from the University of Maryland and the Library of Congress. This was definitely a case of the path of least resistance. I attend the University of Maryland and the Library of Congress has a very convenient <a title="Library of Congress Finding Aid Source" href="http://lcweb2.loc.gov/faid/source.html">page providing links to all their Finding Aids source XML files</a>. A very key aspect of creating version 2 of ArchivesZ is making sure that the scripts that pull data from EAD XML files is robust enough to handle the encoding practices of a very diverse range of institutions.</p>
<p>Please keep in mind that OSU is likely to bear the brunt of many basic data issues that I would have unearthed with whatever data sets I tried first!</p>
<p>There are 3 crucial data elements on which the visualizations of ArchivesZ depend: subject, inclusive dates, and collection size. Each element presents unique challenges. The script parsing issues I am uncovering with the OSU finding aids are currently worst for collection size. In order to make pretty charts which let people compare the quantity of materials in each collection (or record group  &#8211; please forgive that I use the term &#8216;collection&#8217; to mean any set of records for which a finding aid has been created), we need to be able to assign a single number to represent the size of each collection. Based on the values used in the LOC and UMD finding aids, we chose to go with linear of feet as our standard unit of measurement. So the trick is to translate whatever archivists choose to put into the &lt;physdesc&gt; element of their finding aid into some number of linear feet.</p>
<p>These are the size conversion rules we implemented for version 1 of ArchivesZ:</p>
<ul>
<li> 1 microfilm reel = 1 linear foot</li>
<li> Collections represented only by a number of items will be represented as .25 linear feet</li>
<li> If size only specified in number of boxes, then 1 box = .5 linear feet</li>
<li> When the size is given in some different types of units, they are prioritized in the following order: linear feet &gt; boxes &gt; microfilm reels &gt; items</li>
</ul>
<p>This works reasonably well when the physical description values are simple &#8211; it starts to fall apart when what is entered is more complicated. Here are some examples of the physical descriptions in the OSU finding aids:</p>
<p><a title="OSU Archives: Guide to the Phi Kappa Phi-OSU Chapter Records " href="http://nwda-db.wsulibs.wsu.edu/findaid/ark:/80444/xv95428">Guide to the Phi Kappa Phi-OSU Chapter Records</a>: The display in the &#8216;pretty&#8217; version of the finding aid  online shows this: 5.5 cubic feet (9 boxes, including 2 		  oversize boxes) (3 microfilm reels)</p>
<p>The version in the XML file is this:</p>
<pre>&lt;physdesc&gt;
  &lt;extent&gt;5.5 cubic feet&lt;/extent&gt;
  &lt;extent&gt;9 boxes, including 2 oversize boxes&lt;/extent&gt;
  &lt;extent&gt;3 microfilm reels&lt;/extent&gt;
&lt;/physdesc&gt;</pre>
<p>With the current algorithm, this finding aid would be marked as being 3 linear feet in size. At a bare minimum, I must add &#8216;cubic feet&#8217; as another unit to be converted. More difficult to discern is if I should have a value of  5.5 linear feet (assuming 1 cubic foot = 1 linear foot for the purposes of these comparisons) or a value of 8.5 linear feet (5.5 + 3 linear feet for the 3 microfilm reels). There is never going to be a perfect answer here, but clearly my logic needs to be more sophisticated than it is now.</p>
<p><a title="Harvey L. McAlister Collection" href="http://osulibrary.oregonstate.edu/archives/archive/mss/documents/OREmcalister.pdf">Harvey L. McAlister Collection</a>: The display in the pretty version of this finding aid online is this: 1 cubic foot, including 26 photographs (4 boxes, including 2 oversize boxes, and 1 map folder)</p>
<p>The version in the XML file is this:</p>
<pre>&lt;physdesc&gt;
  &lt;extent encodinganalog="300$a"&gt;1 cubic foot, including 26 photographs&lt;/extent&gt;
  &lt;extent encodinganalog="300$a"&gt;4 boxes, including 2 oversize boxes, and 1 map folder&lt;/extent&gt;
&lt;/physdesc&gt;</pre>
<p>With the current algorithm, this finding aid would be marked as being 1 linear foot in size. From looking at these two examples, it would seem that this would be fine and in fact &#8211; for the purposes of calculating a comparable size &#8211; only looking at the first &lt;extent&gt; value might be the way to go &#8211; at least for OSU finding aids.</p>
<p>There are some other simpler issues relating to standardization in the way that certain values are entered. For example, after ingesting 173 finding aids from OSU (the number I got through before my script flat out choked on a size designation), I ended up with five different repositories added to my REPOSITORIES table. I had expected only one. Each of these was entered as repository name &#8212; and I have included the length of each value to show how extra spaces are causing part of the problem:</p>
<ul>
<li>Oregon State University                Libraries &#8211; length 36</li>
<li>Oregon State University    &#8211; length 23</li>
<li>Oregon State UniversityLibraries    &#8211; length 32</li>
<li>Oregon State University             Libraries  &#8211; length 36</li>
<li>Oregon State University Libraries    &#8211; length 33</li>
</ul>
<p>Some of these I can handle by adding smarter trimming of trailing spaces &#8211; but in this case it is clear that typos and inconsistency are also a challenge. I checked and each of these different &lt;corpname&gt; values, within the &lt;repository&gt; element is used by at least 10 finding aids. Perhaps they have been inherited over time from a template?</p>
<p>I have considered creating a repository definition file that could be used when loading finding aids from one repository at a time. This would remove dependence on perfect replication of these sorts of values while still supplying the data needed to let people limit their searches by a named repository.</p>
<p>The last issue is the most minor. There are many /n and /t characters throughout the XML documents. These I plan to simply strip out as the script parses the XML file.</p>
<p>A big thank you to <a title="Elizabeth Nielsen" href="http://osulibrary.oregonstate.edu/staff/nielseel">Elizabeth Nielsen</a>, Senior Staff Archivist at OSU Archives. Her response to my query about OSU&#8217;s comfort with my taking apart their finding aids in public on my blog was &#8220;Bring it on – we’re tough!&#8221;.</p>
<p>It is fascinating to dig into new finding aids and see how the parsing script handles what it finds. I plan to test the existing script on XML from more sources to see all the things that must be fixed. Then I get to wrap my head around code that someone else wrote (another member of the original ArchivesZ team wrote the version 1 ruby script). For those of you who are not programmers, you can skim through my <a title="Book Review of Dreaming in Code" href="http://www.spellboundblog.com/2007/05/24/book-review-dreaming-in-code-a-book-about-why-software-is-hard/">Book Review of Dreaming in Code</a> to get a handle on why this can be harder than it sounds like it should be.</p>
<p>Want to share your institution&#8217;s EAD finding aids in XML format with the ArchivesZ project? Please drop me a line via <a title="Contact Jeanne" href="http://www.spellboundblog.com/contact/">my contact form</a>.</p>
<p><em>Image Credit: OSU Archives image above from the <a title="OSU Archives" href="http://osulibrary.oregonstate.edu/archives/">OSU Archives Home Page</a>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2009/02/22/archivesz-data-challenges-oregon-state-university/">ArchivesZ Data Challenges: Oregon State University Archives</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2009/02/22/archivesz-data-challenges-oregon-state-university/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

