Newegg, CAPTCHA, browsers = reCAPTCHA
Something curious I noticed today. Login to Newegg using Firefox and you are forced to use a CAPTCHA, use IE and its not there. I’m going with IE here because I couldnt figure out if it was showing me a “Z” or and “N” and neither would work!
Now while I am generally a fan of CAPTCHA, I am an even bigger fan of using the computer for good; enter reCAPTCHA. Essentially, a Carnegie Mellon team led by Luis von Ahn noticed that during the process of digitizing books there are words during scanning process which cannot be converted by OCR software to text. “Each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA.”
Here’s how it works:
“Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.”
By reCAPTCHA’s estimates about 60 million CAPTCHAs are solved by humans every day, equating to more than 150,000 hours of work each day worldwide. Thats a lot of potentially useful computing power for the general good by taking the effort spent solving CAPTCHAs online into “reading” books.

3 Comments, Comment or Ping
adam
I had looked at using reCAPTCHA for a web project I’m working on, but it’s hard to be comfortable outsourcing such a critical piece of the puzzle to a third party (non-profit at that!). It’d be neat if you could download a bulk ‘work piece’ of the words, like SETI@Home or the protein folding stuff, so that your web servers had a decent chunk of authentication capability should the reCAPTCHA service go offline for whatever reason.
Mar 26th, 2008
Ben Maurer
Hi Adam,
I’m one of the engineers on the reCAPTCHA team. I want to assure you that reCAPTCHA is a serious project and that we realize our users are trusting us with a critical piece of their website.
One of the first things we realized (long before the project launched) is that “this needs to be reliable”. Therefore, from the start we’ve designed a redundant, distributed system. We host our service in multiple datacenters so that the failure of a single ISP will not affect us. We have redundant monitoring that pages the team if there ever was an issue. Finally, our software is designed to handle various types of failure without human intervention.
reCAPTCHA is currently used by a number of large sites including Facebook, Ticketmaster, Bebo, and Imeem. We’ve earned the trust of these users by providing enterprise class uptime.
Mar 26th, 2008
adam
Ben,
I knew the service was being used by a number of big names - I didn’t realize they were using ‘the standard’ service, I figured they would have done something like I suggested in my previous post.
I do think the project is pretty cool, and a very ingenious way to accomplish it’s goals. It’s nice to see that there is a decent engineering effort behind it as well - is this detailed anywhere? I wasn’t able to find anything about it the last time I looked - just a description of how to use the site.
Thanks for the response!
Mar 26th, 2008
Reply to “Newegg, CAPTCHA, browsers = reCAPTCHA”