Skip to content

reCAPTCHA your reading

Have you ever filled out a form on the Web with a “verification” type field, where one of the fields requires you to decipher some odd looking letters and/or numbers? This is the technology of CAPTCHA, which endeavors to foil spambots with a randomly generated image only human eyes can decipher. Key in the wrong letters or numbers, and you have to try another combination (and another) until you get it right.

The smart people at reCAPTCHA have shown that with this technology it is possible to kill two birds with one stone. As well as slowing down spammers, reCAPTCHA harnesses the skill of humans to decipher blocks of digitally unreadable OCR text.

OCR text is text that has been scanned (say, from a book) into a digital format which is then translated word for word by Optical Character Recognition software. It saves a LOT of typing!

The problem has always been the accuracy of the translation. Some words can be difficult to translate if for example the print is faint, or there are stray marks. The human mind can distinguish the shapes of letters and other characters far more easily than software (nice to know we still have our uses).

There are a lot of books and texts being digitally converted for public access over the Internet, and if you’ve ever had to struggle through an academic text which has been rendered through OCR you will appreciate the value of quality assurance!

reCAPTCHA works by offering two randomly generated words, one has already been deciphered so this is used to verify you as human, not spam. The other is a word that OCR software has been unable to translate – so YOU translate it. Check it out! http://recaptcha.net/