Looking Behind the Captcha-Cracking Scenes

So how does one break a Captcha? Plain old OCR software, like what comes with your scanner, doesn’t do the job.
I couldn’t get any answers from the Russian underground community that’s doing a lot of the cracking, but there are some white hats who reveal aspects of their methods.
PWNtcha is a Captcha-cracking project that reports excellent success against some of the major ones. Its methods are guarded lest they slip out into the wild, but it seems to involve manually analyzing a Captcha’s style, its font choice, character position, deformation pattern, rotation, background, etc., and then using image-correction tools to transform the image into normal-looking text, which can be successfully machine-read.
Jeff Yan and Ahmad Salah El Ahmad have published a PDF describing a segmentation-then-recognition approach. In many cases, evidently, a mere pixel count suffices to guess what a letter’s supposed to be: I has more fewer pixels than M.
aiCaptcha dates from 2005.
Here is a discussion of the Shape Contexts approach to cracking EZ-Gimpy, done by Mori and Malik back in 2002.
I’d be very curious to learn details of the techniques used against the much harder Google and Microsoft tests.
See Also:

