Member Sign In
Not a member?

A Wired.com user account lets you create, edit and comment on Webmonkey articles. You will also be able to contribute to the Wired How-To Wiki and comment on news stories at Wired.com.


It's fast and free.

Sign in with OpenID
Sign In
Webmonkey is a property of Wired Digital.
processing...
Join Webmonkey

Please send me occasional e-mail updates about new features and special offers from Wired/Webmonkey.
Yes No

Please send occasional e-mail offers from Wired/Webmonkey affiliated web sites and publications, and carefully selected companies.
Yes No

I understand and agree that registration on or use of this site constitutes agreement to Webmonkey's User Agreement and Privacy Policy.
Webmonkey is a property of Wired Digital.
processing...

Retrieve Sign In

Please enter your e-mail address or username below. Your username and password will be sent to the e-mail address you provided us.

or
Webmonkey is a property of Wired Digital.
processing...

Welcome to Webmonkey

A private profile page has been created for you.
As a member of Webmonkey, you can now:
  • edit articles
  • add to the code library
  • design and write a tutorial
  • comment on any Webmonkey article
Close
Webmonkey is a property of Wired Digital.

Sign In Information Sent

An e-mail has been sent to the e-mail address registered in this account.
If you cannot find it in your in-box, please check your bulk or junk folders.
Sign In
Webmonkey is a property of Wired Digital.

Looking Behind the Captcha-Cracking Scenes

Gimpy
So how does one break a Captcha? Plain old OCR software, like what comes with your scanner, doesn’t do the job.

I couldn’t get any answers from the Russian underground community that’s doing a lot of the cracking, but there are some white hats who reveal aspects of their methods.

PWNtcha is a Captcha-cracking project that reports excellent success against some of the major ones. Its methods are guarded lest they slip out into the wild, but it seems to involve manually analyzing a Captcha’s style, its font choice, character position, deformation pattern, rotation, background, etc., and then using image-correction tools to transform the image into normal-looking text, which can be successfully machine-read.

Jeff Yan and Ahmad Salah El Ahmad have published a PDF describing a segmentation-then-recognition approach. In many cases, evidently, a mere pixel count suffices to guess what a letter’s supposed to be: I has more fewer pixels than M.

aiCaptcha dates from 2005.

Here is a discussion of the Shape Contexts approach to cracking EZ-Gimpy, done by Mori and Malik back in 2002.

I’d be very curious to learn details of the techniques used against the much harder Google and Microsoft tests.

See Also:

Post Comment Comments Permalink Print
Reddit Digg

 
Subscribe now

Special Offer For Webmonkey Users

WIRED magazine:
The first word on how technology is changing our world.

Subscribe for just $10 a year