<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Spellbound Blog &#187; THATCamp2008</title>
	<atom:link href="http://www.spellboundblog.com/category/thatcamp2008/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spellboundblog.com</link>
	<description>Archives, Digital Humanities, Cultural Heritage, Technology</description>
	<lastBuildDate>Mon, 06 Feb 2012 14:49:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>THATCamp 2008: Day 1 Dork Short Lightening Talks</title>
		<link>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/</link>
		<comments>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/#comments</comments>
		<pubDate>Sun, 15 Jun 2008 03:09:28 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[information visualization]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[THATCamp2008]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/</guid>
		<description><![CDATA[During lunch on the first day of THATCamp people volunteered to give lightning talks they called &#8216;Dork Shorts&#8217;. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/">THATCamp 2008: Day 1 Dork Short Lightening Talks</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://flickr.com/photos/thenss/2443187542/" title="Lightning by thenss (Christopher Cacho) via flickr"><img src="http://www.spellboundblog.com/wp-content/uploads/2008/06/2443187542_af1a3fe851.jpg" alt="lightning" align="right" height="247" width="330" /></a>During lunch on the first day of THATCamp people volunteered to give <a href="http://en.wikipedia.org/wiki/Lightning_Talk" title="Wikipedia: lightning talk">lightning talks</a> they called &#8216;Dork Shorts&#8217;. As we ate our lunch, a steady stream of folks paraded up to the podium and gave an elevator pitch length demo. These are the projects about which I managed to type URLs and some other info into my laptop. If you are looking for examples of inspirational and innovative work at the intersection of technology and the humanities &#8211; these are a great place to start!</p>
<ul>
<li><a href="http://www.worlddigitallibrary.org/project/english/index.html" title="World Digital Library">World Digital Library</a> (<a href="http://www.loc.gov" title="Library of Congress">Library of Congress</a> )</li>
<li><a href="http://www.piclens.com/" title="PicLens">PicLens</a> + FireFox + any search results page from the <a href="http://digitalgallery.nypl.org/nypldigital/index.cfm" title="NYPL Digital Gallery">New York Public Library Digital Gallery</a> = a 3D experience of ALL the photos at one time. PicLens uses the RSS feed to retrieve the full set of images along with their captions and will work with any RSS feed of images &#8211; such as RSS image feeds from <a href="http://flickr.com/" title="Flickr">Flickr</a> or <a href="http://smugmug.com/" title="Smugmug">Smugmug</a> .</li>
<li><a href="http://historywired.si.edu/" title="HistoryWired">HistoryWired</a> (<a href="http://americanhistory.si.edu/" title="National Museum of American History">National Museum of American History</a>): A new spin on a <a href="http://www.cs.umd.edu/hcil/treemap/" title="about treemaps">treemap</a> visualization built on top of museum metadata. One box is displayed per item and the box size is based on popularity. The rest of its innovations are just easier to experience than describe.</li>
<li><a href="http://objectofhistory.org/" title="The Object of History">The Object of History</a> (<a href="http://americanhistory.si.edu/" title="National Museum of American History">National Museum of American History</a> + <a href="http://chnm.gmu.edu/" title="CHNM">CHNM</a> )</li>
<li><a href="http://omeka.org/" title="Omeka">Omeka</a> (<a href="http://chnm.gmu.edu/" title="CHNM">CHNM</a> )</li>
<li><a href="http://exhibitions.nypl.org/eminent/" title="Eminent Domain">Eminent Domain</a> (<a href="http://www.nypl.org/" title="New York Public Library">NYPL</a>Online Exhibition): built on Omeka</li>
<li><a href="http://nocoma.grainger.uiuc.edu/" title="American Social History Online">American Social History Online</a> (<a href="www.diglib.org" title="Digital Library Federation">Digital Library Federation</a>): Zotero enabled. They are on the <a href="http://wiki.dlib.indiana.edu/confluence/display/DLFAquifer/Collection+Submission" title="Collection Submission Guidelines">hunt for more MODS records</a>. Built on Ruby On Rails (RoR) and will be put out as open source software within a couple of months.</li>
<li><a href="http://www4.ncsu.edu/~dmrieder/typographia/" title="Typographia">Typographia</a>(David Rieder, NC State University)</li>
</ul>
<p>Have more links to projects I missed including? Please add them in the comments below.</p>
<p><em>Image credit: <a href="http://flickr.com/photos/thenss/2443187542/" title="Lightning by thenss (Christopher Cacho) via flickr">Lightning</a> by <a href="http://flickr.com/people/thenss/" title="Flickr: thenss">thenss</a> (Christopher Cacho) via flickr</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/">THATCamp 2008: Day 1 Dork Short Lightening Talks</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/06/14/thatcamp-2008-day-1-dork-short-lightening-talks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</title>
		<link>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/</link>
		<comments>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 02:25:07 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[digitization]]></category>
		<category><![CDATA[interface design]]></category>
		<category><![CDATA[metadata]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[oral history]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[THATCamp2008]]></category>
		<category><![CDATA[transcription]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[virtual collaboration]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/</guid>
		<description><![CDATA[The THATCamp session officially titled &#8216;Crowdsourcing&#8217; on the schedule was actually aimed at discussing the intersection of crowdsourced transcription and collaborative annotation. The group was small &#8211; just six of us and Ben Brumfield got us going by giving us an overview of transcription software and projects: The FamilySearch Indexing Project is an LDS church [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/">THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a title="Free pencils by zone41" href="http://flickr.com/photos/zone41/2302365649/"></a></p>
<p style="text-align: center"><a title="Free pencils by zone41" href="http://flickr.com/photos/zone41/2302365649/"><img src="http://www.spellboundblog.com/wp-content/uploads/2008/06/2302365649_6facc7e838.jpg" alt="Free Pencils by zone41 on Flickr" width="351" height="263" /></a></p>
<p>The <a title="THATCamp" href="http://thatcamp.org">THATCamp</a> session officially titled &#8216;Crowdsourcing&#8217; on the <a title="THATCamp Schedule" href="http://thatcamp.org/schedule/">schedule</a> was actually aimed at discussing the intersection of <a title="THATCamp Blog: Session Idea - Crowdsourcing Transcriptions" href="http://thatcamp.org/2008/05/crowdsourcing-transcriptions/">crowdsourced transcription</a> and <a title="THATCamp Blog: Session Idea - Collaborative Annotation" href="http://thatcamp.org/2008/05/collaborative-annotation/">collaborative annotation</a>. The group was small &#8211; just six of us and <a title="Ben Brunfield: Manuscript Transcription Blog" href="http://manuscripttranscription.blogspot.com">Ben Brumfield</a> got us going by giving us an overview of transcription software and projects:</p>
<ul>
<li>The <a title="FamilySearch Indexing Project" href="http://www.ldsindexing.org/">FamilySearch Indexing Project</a> is an <a title="The Church of Latter-day Saints" href="http://www.lds.org">LDS church</a> project put out by the <a title="FamilySearch Labs" href="http://labs.familysearch.org/">FamilySearch Labs</a>. Their goals: &#8220;Volunteers extract family history information from digital images of historical documents to create searchable indexes that assist everyone in finding their ancestors.&#8221;</li>
<li>The <a title="Manuscript Transcription Assistant" href="http://www.wpi.edu/Academics/Depts/IGSD/Projects/Venice/Center/Projects/MQP/Transcription/download.html">Manuscript Transcription Assistant</a> is based at <a title="Worcester Polytechnic Institute" href="http://www.wpi.edu">Worcester Polytechnic Institute</a> (WPI) and is described as &#8220;a tool to assist transcribers in creating transcriptions, and incorporate meta-data about each image and transcription that can then be used to search through an electronic library of transcriptions&#8221;. I found mention in the <a title="Manuscript Transcription Project: FAQ" href="http://www.wpi.edu/Academics/Depts/IGSD/Projects/Venice/Center/Projects/MQP/Transcription/faq.html#22">FAQ</a> of the desire to create a community so that &#8220;transcribers will be able to collaborate their work by rating the quality of other user&#8217;s transcriptions. By ranking the transcriptions, specific versions of transcriptions will emerge as an authority for that manuscript. &#8221; Unfortunately, a lot of the links on that site are broken and my attempt to register gave me an error. It is not clear to me that this project is actually still active.</li>
<li><a title="Soldier Studies" href="http://www.soldierstudies.org">Soldier Studies</a> is a website dedicated to posting transcriptions of civil war letters and diaries. This is not a tool for transcribing, but is clearly a repository targeting specifically transcriptions (see their <a title="Soldier Studies: Mission Statement" href="http://www.soldierstudies.org/index.php?action=mission">Mission Statement</a> for more information).</li>
<li><a title="Oh No Robot" href="http://ohnorobot.com/">Oh No Robot</a> is a comics transcription and search tool. It provides a page to <a title="Oh No Robot: comics that need transcriptions" href="http://www.ohnorobot.com/helpout.pl">find comics needing transcription</a> and a great page to explain <a title="Oh No Robot: Transcription Explained" href="http://www.ohnorobot.com/transcriptionexplained.pl">how transcription works</a> on their site.</li>
</ul>
<p>After examining what was out there, Ben concluded that what he wanted didn&#8217;t exist &#8211; so he started to build it himself. He gave us a demo of his &#8220;very beta&#8221; software. His goal is to build a web based tool to support collaborative manuscript transcription and annotation by individuals without a strong technical background. In its current (and private beta) state the software supports transcription, an innovative approach to linking individual words or phrases to collection defined subjects and some basic community tools to let his virtual team discuss transcription issues. Ben is working hard on the software &#8211; if you are interested in his project, definitely <a title="Ben Brumfield" href="http://thatcamp.org/camper/benwbrum/">get in touch with him</a>.</p>
<p><a title="Travis Brown" href="http://thatcamp.org/camper/travis/">Travis Brown</a> showed us his creation: <a title="ecomma" href="http://ecomma.cwrl.utexas.edu/0.2.0/">eComma</a>. eComma aims to &#8220;enable groups of students, scholars, or general readers to build collaborative commentaries on a text and to search, display, and share those commentaries online&#8221;. He showed us how users could tag or add comments on individual words or phrases of a loaded text. Take a look at the <a title="eComma: Sonnet 18" href="http://ecomma.cwrl.utexas.edu/0.2.0/texts/comments/4">eComma page for Sonnet 18 by William Shakespeare</a>. The words highlighted in blue are those which are tagged or have comments associated with them. If you highlight &#8216;the eye of heaven&#8217; in line 5 you will see that it is tagged as a metaphor. Travis reported that he will have 2 other programmers working on eComma with him this summer and has his eye on improving some interface issues and adding a few more features.</p>
<p>We also talked about ways to display transcription. <a title="Elena Razlogova" href="http://elenarazlogova.org/">Elena Razlogova</a> guided us over to the <a title="DoHistory" href="http://dohistory.org">DoHistory</a> website. There she showed us the <a title="DoHistory: Magic Lens" href="http://dohistory.org/diary/exercises/lens/index.html">Magic Lens</a> interface. This interface displays the transcription of a handwritten diary page via a lens style overlay that you can move with your mouse. This reminded me of the <a title="Gilder Lehrman Battle Lines: Letters from America's Wars" href="http://www.gilderlehrman.org/collection/battlelines/index_good.html">Gilder Lehrman Battle Lines: Letters from America&#8217;s Wars</a> interface that I found when doing research for my <a title="Communicating Context in Online Collections Poster" href="http://www.spellboundblog.com/poster/">Communicating Context in Online Collections Poster</a>. If you haven&#8217;t seen it before &#8211; go examine the page showing the transcription of (turn down your speaker if a reader&#8217;s voice will disturb those around you)  <a title="Nathanael Green's letter to Catherine Greene dated July 17, 1778" href="http://www.gilderlehrman.org/collection/battlelines/chapter3/chapter3_1a.html">Nathanael Green&#8217;s letter</a> to Catherine Greene dated July 17, 1778.</p>
<p>While on the DoHistory site I also found the <a title="DoHistory: Try Transcribing" href="http://dohistory.org/diary/exercises/tryTranscribing.html">Try Your Hand At Transcribing page</a>. This page shows the challenge of transcribing handwritten documents by giving you the chance to try it yourself and then lets you check your transcription with the click of a button.</p>
<p>We talked a bit about the technology behind eComma (forgive me Travis for not having enough details in my notes to explain your current architecture here) and the challenges inherent in wanting to annotate overlapping sets of words. Though he isn&#8217;t using it in the current implementation of eComma, Travis mentioned the <a title="LMNL" href="http://lmnl.net">Layered Markup Annotation Language</a> (LMNL) which the <a title="LMNL: Tutorial" href="http://lmnl.net/prose/tutorial/index.html">tutorial page</a> explains as:</p>
<blockquote><p>&#8230;LMNL documents contain character data which is marked up using named and occasionally overlapping ranges. Ranges can have annotations, which can themselves be annotated and can have structured content. To support authoring, especially collaborative authoring, markup is namespaced and divided into layers, which might reflect different views on the text.</p></blockquote>
<p>I can definitely see how LMNL might be an interesting framework for building transcription and annotation software.</p>
<p><a title="Krissy O'Hare" href="http://storytelling.concordia.ca/staff/krissy/">Krissy O&#8217;Hare</a> brought up the challenges of transcribing audio and video that she has faced working on <a title="Concordia University: Oral History" href="http://storytelling.concordia.ca/oralhistory/index.html">oral history projects at Concordia University</a>. This led to Travis (I think?) mentioning the <a title="Texas German Dialect Project" href="http://www.tgdp.org">Texas German Dialect Project</a> (TGDP) and the <a title="CMU Sphinx Group Speech Recognition Engine" href="http://cmusphinx.sourceforge.net/html/cmusphinx.php">CMU Sphinx Group Speech Recognition Engine</a>. TGDP has an online archive of recorded interviews along with their transcriptions and translations. CMU Sphinx&#8217;s introduction explains that their software tools are targeted at expert users wanting to build speech-using applications.</p>
<p>This was a great session. The small group gave everyone a chance to contribute and take over the keyboard in order to show off their favorite sites. It was immediately after the <a title="THATCamp 2008: Text Mining Session" href="http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/">Text Mining</a> session, so our minds were already full of all the great things one could do with text once it is transcribed.</p>
<p>I am excited to watch the evolution of group transcription and annotation software. If you know of other transcription or annotation tools or projects &#8211; please post them to the comments.</p>
<p><em>Image credit: <a title="Free pencils by zone41 via flickr" href="http://flickr.com/photos/zone41/2302365649/">Free pencils by zone41 via flickr</a><a title="Free pencils by zone41 via Flickr" href="http://flickr.com/photos/zone41/2302365649/"></a></em></p>
<p><em>As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via</em> <a title="contact Jeanne Kramer-Smyth" href="http://www.spellboundblog.com/contact/"><em>my contact form</em></a><em>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/">THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/06/05/crowdsourced-transcription-collaborative-annotation/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>THATCamp 2008: Text Mining and the Persian Carpet Effect</title>
		<link>http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/</link>
		<comments>http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/#comments</comments>
		<pubDate>Sun, 01 Jun 2008 04:58:24 +0000</pubDate>
		<dc:creator>Jeanne</dc:creator>
				<category><![CDATA[digitization]]></category>
		<category><![CDATA[historical research]]></category>
		<category><![CDATA[information visualization]]></category>
		<category><![CDATA[learning technology]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[THATCamp2008]]></category>

		<guid isPermaLink="false">http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/</guid>
		<description><![CDATA[I attended a THATCamp session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible &#8211; but please forgive the fact that I did not catch the names of everyone who was part of this session. What Is Text Mining? [...]<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/">THATCamp 2008: Text Mining and the Persian Carpet Effect</a></p>
]]></description>
			<content:encoded><![CDATA[<p><a href="http://flickr.com/photos/alarch/308587800/" title="Drift of Harrachov Mine by alarch via flickr"><img src="http://www.spellboundblog.com/wp-content/uploads/2008/06/308587800_c8d0417f1e.jpg" alt="alarch: Drift of Harrachov mine (Flickr)" align="right" height="225" width="300" /></a>I attended a <a href="http://www.thatcamp.org" title="THATCamp">THATCamp</a> session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible &#8211; but please forgive the fact that I did not catch the names of everyone who was part of this session.</p>
<p><strong>What Is Text Mining?</strong></p>
<p>Text mining is an umbrella phrase that covers many different techniques and types of tools.</p>
<p>The <a href="http://chnm.gmu.edu/" title="CHNM">CHNM</a> NEH-funded text mining initiative defined text mining as needing to support these three research functions:</p>
<ul>
<li>Locating or finding: improving on search</li>
<li>Extraction: once you find a set of interesting documents, how do you extract information in new (and hopefully faster) ways? How do you pull data from unstructured bulk into structured sets?</li>
<li>Analysis: support analyzing the data, discovery of patterns, answering questions</li>
</ul>
<p>The group discussed that there were both macro and micro aspects to text mining. Sometimes you are trying to explore a collection. Sometimes you are trying to examine a single document in great detail. Still other situations call for using text mining to generate automated classification of content using established vocabularies. Different kinds of tools will be important during different phases of research.</p>
<p><strong>Projects, Tools, Examples &amp; Cool Ideas</strong></p>
<p><a href="http://thatcamp.org/camper/aeastmanmullins/" title="Andrea Eastman-Mullins">Andrea Eastman-Mullins</a>, from <a href="www.alexanderstreet.com" title="Alexander Street Press">Alexander Street Press</a>, mentioned the <a href="http://humanities.uchicago.edu./orgs/ARTFL/" title="University of Chicago: ARTFL Project">University of Chicago&#8217;s ARTFL Project</a> and these two tools:</p>
<ul>
<li><a href="http://philologic.uchicago.edu/" title="PhiloLogic">PhiloLogic</a>: An XML/SGML based full-text search, retrieval and analysis tool</li>
<li><a href="http://philologic.uchicago.edu/philomine/" title="PhiloMine">PhiloMine</a>: a extension being developed for PhiloLogic to provide support for &#8220;a variety of machine learning, text mining, and document clustering tasks&#8221;.</li>
</ul>
<p><a href="http://www.dancohen.org" title="Dan Cohen">Dan Cohen</a> directed us to his post about <a href="http://www.dancohen.org/2006/08/08/mapping-what-americans-did-on-september-11/" title="Mapping What Americans Did on September 11">Mapping What Americans Did on September 11</a> and to <a href="http://twistori.com" title="Twistori">Twistori</a> which text mines Twitter.</p>
<p>Other Projects &amp; Examples:</p>
<ul>
<li><a href="http://www.monkproject.org/" title="MONK Project">MONK project</a> (Metadata Offer New Knowledge)</li>
<li><a href="http://www.opencontentalliance.org/" title="Open Content Alliance">Open Content Alliance</a>(OCA)</li>
<li>Library of Congress <a href="http://www.loc.gov/chroniclingamerica/" title="Library of Congress: Chronicling America">Chronicling America</a> &#8211; newspaper pages from 1897-1910</li>
<li>Tanya Clement&#8217;s project <a href="http://www.mith2.umd.edu/events/911-digital-dialogue-tanya-clement-using-digital-tools-to-not-read-gertrude-steins-the-making-of-americans" title="Using Digital Tools to Not-Read Gertrude Stein’s The Making of Americans">&#8220;Using Digital Tools to Not-Read Gertrude Stein’s The Making of Americans&#8221;</a> at University of Maryland, College Park</li>
<li>Two other University of Maryland, College Park projects that were not mentioned during the session, but may be of interest are <a href="http://www.cs.umd.edu/hcil/textvis/featurelens/" title="FeatureLens">FeatureLens</a> and <a href="http://www.cs.umd.edu/hcil/textvis/basketlens/" title="BasketLens">BasketLens</a></li>
<li><a href="http://docs.google.com/" title="Google Docs">Google Docs</a> now includes <a href="http://en.wikipedia.org/wiki/Flesch-Kincaid_Readability_Test" title="Wikipedia: Flesch-Kincaid Readability Test">Flesch-Kincaid Readability Tests</a> and <a href="http://en.wikipedia.org/wiki/Automated_Readability_Index" title="Wikipedia: Automated Readability Index">Automated Readability Index</a> in the same window in which it shows you your Word Count</li>
<li><a href="http://en.wikipedia.org/wiki/Spam_filter" title="Wikipedia: Spam Filters">Spam filters</a> &#8211; such as <a href="http://en.wikipedia.org/wiki/Bayesian_spam_filtering" title="Wikipedia: Bayesian Spam Filtering">Bayesian spam filtering</a> using text mining to identify spam e-mails</li>
<li>Clustering &#8211; see my post on this: <a href="http://www.spellboundblog.com/2008/05/14/clustering-data-generating-organization-from-the-ground-up/" title="Clustering Data: Generating Organization from the Ground Up">Clustering Data: Generating Organization from the Ground Up</a> and also take a look at <a href="http://clusty.com/" title="Clusty.com">Clusty.com</a> and their &#8216;remix clusters&#8217; option.</li>
</ul>
<p>Some neat ideas that were mentioned for ways text mining could be used (lots of other great ideas were discussed &#8211; these are the two that made it into my notes):</p>
<ul>
<li>Train a tool with collections of content from individual time periods, then use the tool to assist in identification of originating time period for new documents. Also could use this same setup to identify shifts in patterns in text by comparing large data sets from specific date ranges</li>
<li>If you have a tool that has learned how to classify certain types of content well… then watch for when it breaks &#8211; this can give you interesting trails to things to investigate.</li>
</ul>
<p><strong>Barriers to Text Mining</strong></p>
<p>All of the following were touched upon as being barriers or challenges to text mining:</p>
<ul>
<li>access to raw text in gated collections (ie, collections which require payment to permit access to resources) such as <a href="http://www.jstor.org/" title="JSTOR">JSTOR</a> and <a href="http://muse.jhu.edu/" title="Project MUSE">Project MUSE</a> and others.</li>
<li>tools that are too difficult for non-programmers to use</li>
<li>questions relating to the validity of text mining as a technique for drawing legitimate conclusions</li>
</ul>
<p><strong>Next Steps</strong></p>
<p>These ideas were ones put forward as important to move forward the field of text mining in the humanities:</p>
<ul>
<li>develop and share best practices for use when cultural heritage institutions make digitization and transcription deals with corporate entities</li>
<li>create frameworks that enable individuals to reproduce the work of others and provide transparency into the assumptions behind the research</li>
<li>create tools and techniques that smooth the path from digitization to transcription</li>
<li>develop focused, easy-to-use tools that bridge the gap between computer programmers and humanities researchers</li>
</ul>
<p><strong>My thoughts<br />
</strong>During the session I drew a parallel between the information one can glean in the field of archeology from the air that cannot be realized on the ground. I discovered it has a name:</p>
<blockquote><p>&#8220;Archaeologists call it the <strong>Persian carpet effect</strong>. Imagine you&#8217;re a mouse running across an elaborately decorated rug. The ground would merely be a blur of shapes and colors. You could spend your life going back and forth, studying an inch at a time, and never see the patterns. Like a mouse on a carpet, an archaeologist painstakingly excavating a site might easily miss the whole for the parts.&#8221; <em>from Airborne Archaeology, Smithsonian magazine, December 2005 (emphasis mine)</em></p></blockquote>
<p>While I don&#8217;t see any coffee table books in the near future of text mining (such as <a href="http://www.amazon.com/gp/product/0892368756?ie=UTF8&amp;tag=spellboundblog-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=0892368756">The Past from Above: Aerial Photographs of Archaeological Sites</a><img src="http://www.assoc-amazon.com/e/ir?t=spellboundblog-20&amp;l=as2&amp;o=1&amp;a=0892368756" style="border: medium none ; margin: 0px" border="0" height="1" width="1" />), I do think that this idea captures the promise that we have before us in the form of the text mining tools. Everyone in our session seemed to agree that these tools will empower people to do things that no individual could have done in a lifetime by hand. The digital world is producing <a href="http://en.wikipedia.org/wiki/Terabyte" title="Wikipedia: Terabyte">terabytes</a> of text. We will need text mining tools just to find our way in this blizzard of content. It is all well and good to know that each snowflake is unique &#8211; but tell that to the 21st century historian soon to be buried under the weight of blogs, tweets, wikis and all other manner of web content.</p>
<p><em>Image credit: <a href="http://flickr.com/photos/alarch/308587800/" title="Drift of Harrachov Mine by alarch via flickr">Drift of Harrachov Mine by </a><a href="http://flickr.com/photos/alarch/308587800/" title="Drift of Harrachov Mine by alarch via flickr">alarch via flickr</a></em></p>
<p><em>As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via</em> <a href="http://www.spellboundblog.com/contact/" title="contact Jeanne Kramer-Smyth"><em>my contact form</em></a><em>.</em></p>
<p>This post is from from: <a href="http://www.spellboundblog.com">Spellbound Blog</a>.<br/><br/><a href="http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/">THATCamp 2008: Text Mining and the Persian Carpet Effect</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.spellboundblog.com/2008/06/01/thatcamp-2008-text-mining-and-the-persian-carpet-effect/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

