reCAPTCHA: crowdsourcing transcription comes to life
With a tag-line like ‘Stop Spam, Read Books’ – how can you not love reCAPTCHA? You might have already read about it on Boing Boing , NetworkWorld.com or digitizationblog – but I just couldn’t let it go by without talking about it.
Haven’t heard about reCAPTCHA yet? Ok.. have you ever filled out an online form that made you look at an image and type the letters or numbers that you see? These ‘verify you are a human’ sorts of challenges are used everywhere from on-line concert ticket purchase sites who don’t want scalpers to get too many of the tickets to blogs that are trying to prevent spam. What reCAPTCHA has done is harness this user effort to assist in the transcription of hard to OCR text from digitized books in the Internet Archive. Their website has a great explanation about what they are doing – and they include this great graphic below to show why human intervention is needed.
reCAPTCHA shows two words for each challenge – one that it knows the transcription of and a second that needs human verification. Slowly but surely all the words OCR doesn’t understand get transcribed and made available for indexing and search.
I have posted before about ideas for transcription using the power of many hands and eyes (see Archival Transcriptions: for the public, by the public) – but my ideas were more along the lines of what the genealogists are doing on sites like USGenWeb. It is so exciting to me that a version of this is out there – and I LOVE their take on it. Rather than find people who want to do transcription, they have taken an action lots of folks are already used to performing and given it more purpose. The statistics behind this are powerful. Apparently 60 million of these challenges are entered every DAY.
Want to try it? Leave a comment on this post (or any post in my blog) and you will get to see and use reCAPTCHA. I can also testify that the installation of this on a WordPress blog is well documented, fast and easy.
Related Posts:Posted on 28th May 2007
Under: access, digitization, open source, transcription | 10 Comments »
|
Print This Post















This is so cool! I had heard about this last week, but I’m excited to be trying it out. One question: since we’re supposed to be helping with the OCR of scanned texts, do we need to worry about being case-sensitive? One of my two words below is capitalized; I’ve typed it in with the capital, but it’s not clear to me if that is required.
June 6th, 2007 at 6:57 am
Stephen,
That is a great question – and I do not know the answer. My hunch is that it is case-sensitive, because the text results of OCR definitely includes capital letters – but I will post any definitive answer I can locate after some research.
Jeanne
June 7th, 2007 at 10:57 am
Hi Jeanne,
Thanks for cross-posting about this great use of human power. I’ve been enjoying your blog for only a few weeks now, but I am impressed and excited about many of the projects that you have mentioned (both your own and others’).
July 10th, 2007 at 11:26 am
Glad you are enjoying the blog. Positive feedback is a wonderful thing — thank you!
July 10th, 2007 at 11:46 am
This captcha is a good thing, but honestly some of these tests I have failed a couple of times. Maybe, my vision isn’t what it used to be. I think having words and font faces that we know cause ocr errors *is* the way to go here. At least we can more easily interpret the captchas.
April 14th, 2008 at 11:36 pm
kinda frustating as well when you are trying to buy tickets at 10:01 am for a concert
May 18th, 2008 at 11:09 am
[...] it in my blog post: Archival Transcriptions: for the public, by the public. While I do love what reCaptcha does at the word level and Footnote.com does with locations, names and dates – I still think there [...]
May 23rd, 2008 at 9:46 pm
I just found your blog, and I from this first post I can’t wait to catch up on the rest of it. As for crowdsourcing, I’m a little scared of it. Seems like an emotional response, tho. Hackers? Goof-offs? Or maybe I”m a stick in the mud. Maybe it just seems too good to be true, too obvious. I’m posting so I can try it out.
Thanks, and looking forward to more…
Lauren
August 25th, 2008 at 6:07 am
This is a wonderful example of the power of the Interwebs – lots of people giving a little each to combine into a greater good. I work for the feds, so I can’t put it on our website, but I can go to the reCAPTCHA website for a few minutes each day and help out. I don’t know where I’ve been for the last year, but thanks for catching me up. – Jill.
August 26th, 2008 at 1:23 pm
I struggle with the captchas sometimes, as well.
September 11th, 2008 at 5:07 pm