- Spellbound Blog - https://www.spellboundblog.com -

THATCamp 2008: Text Mining and the Persian Carpet Effect

alarch: Drift of Harrachov mine (Flickr) [1]I attended a THATCamp [2] session on Text Mining. There were between 15 and 20 people in attendance. I have done my best to attribute ideas to their originators wherever possible – but please forgive the fact that I did not catch the names of everyone who was part of this session.

What Is Text Mining?

Text mining is an umbrella phrase that covers many different techniques and types of tools.

The CHNM [3] NEH-funded text mining initiative defined text mining as needing to support these three research functions:

The group discussed that there were both macro and micro aspects to text mining. Sometimes you are trying to explore a collection. Sometimes you are trying to examine a single document in great detail. Still other situations call for using text mining to generate automated classification of content using established vocabularies. Different kinds of tools will be important during different phases of research.

Projects, Tools, Examples & Cool Ideas

Andrea Eastman-Mullins [4], from Alexander Street Press [5], mentioned the University of Chicago’s ARTFL Project [6] and these two tools:

Dan Cohen [9] directed us to his post about Mapping What Americans Did on September 11 [10] and to Twistori [11] which text mines Twitter.

Other Projects & Examples:

Some neat ideas that were mentioned for ways text mining could be used (lots of other great ideas were discussed – these are the two that made it into my notes):

Barriers to Text Mining

All of the following were touched upon as being barriers or challenges to text mining:

Next Steps

These ideas were ones put forward as important to move forward the field of text mining in the humanities:

My thoughts
During the session I drew a parallel between the information one can glean in the field of archeology from the air that cannot be realized on the ground. I discovered it has a name:

“Archaeologists call it the Persian carpet effect. Imagine you’re a mouse running across an elaborately decorated rug. The ground would merely be a blur of shapes and colors. You could spend your life going back and forth, studying an inch at a time, and never see the patterns. Like a mouse on a carpet, an archaeologist painstakingly excavating a site might easily miss the whole for the parts.” from Airborne Archaeology, Smithsonian magazine, December 2005 (emphasis mine)

While I don’t see any coffee table books in the near future of text mining (such as The Past from Above: Aerial Photographs of Archaeological Sites [27]), I do think that this idea captures the promise that we have before us in the form of the text mining tools. Everyone in our session seemed to agree that these tools will empower people to do things that no individual could have done in a lifetime by hand. The digital world is producing terabytes [28] of text. We will need text mining tools just to find our way in this blizzard of content. It is all well and good to know that each snowflake is unique – but tell that to the 21st century historian soon to be buried under the weight of blogs, tweets, wikis and all other manner of web content.

Image credit: Drift of Harrachov Mine by [1]alarch via flickr [1]

As is the case with all my session summaries from THATCamp 2008, please accept my apologies in advance for any cases in which I misquote, overly simplify or miss points altogether in the post above. These sessions move fast and my main goal is to capture the core of the ideas presented and exchanged. Feel free to contact me about corrections to my summary either via comments on this post or via my contact form [29].

Comments Disabled (Open | Close)

Comments Disabled To "THATCamp 2008: Text Mining and the Persian Carpet Effect"

#1 Pingback By To take away » Blog Archive » THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation On June 6, 2008 @ 6:02 pm

[…] and take over the keyboard in order to show off their favorite sites. It was immediately after the Text Mining session, so our minds were already full of all the great things one could do with text once it is […]

#2 Pingback By THATCamp 2008: Crowdsourced Transcription and Collaborative Annotation – SpellboundBlog.com – spellbound by archival science and information technology in the digital age On September 25, 2008 @ 9:42 pm

[…] Matt « THATCamp 2008: Text Mining and the Persian Carpet Effect THATCamp 2008: Day 1 Dork Short Lightening Talks […]

#3 Pingback By Recent Links Tagged With “textmining” – JabberTags On October 14, 2008 @ 11:04 pm

[…] links >> textmining The Text Mining Can of Worms : Beyond Search Saved by xxc on Tue 14-10-2008 THATCamp 2008: Text Mining and the Persian Carpet Effect Saved by meximese on Mon 13-10-2008 The Text Mining Handbook: Advanced Approaches in Analyzing […]

#4 Pingback By IRC log from THATCamp 2008 | THATCamp On August 6, 2012 @ 9:43 am

[…] […]

#5 Pingback By THATCamp News » Blog Archive On November 21, 2012 @ 9:20 am

[…] […]