Cracking weak CAPTCHA implementations
Asking users to prove that they are not a machine by requiring them to recognise words formed by distorted characters, known as a CAPTCHA, has grown in popularity in recent years; so too have the many implementations that can be easily cracked by a bot intent on automating the submission of swathes of forms or spam comments.
One of these easy to crack implementations that I sometimes come across is where the solution to the CAPTCHA is being stored client-side in the form of a hash, which as we'll see, can be easily brute-forced.
When CAPTCHAs are served to users, the user is generally unaware that the CAPTCHA's image (containing the distorted words) is accompanied by some form of reference to the original words that are stored on the server. In most popular implementations, such as reCAPTCHA, the reference is often a unique number or session ID. This is used by the server when the user submits their guess as a means of locating the original words. The usage of this reference means the words aren't exposed to the user.
There are a number of less effective implementations that I have come across, that instead of using a reference unrelated to the words distorted in the CAPTCHA's image, will instead use a hash (often MD5) of the words. Once the user has submitted their guess then the server compares the precomputed hash that accompanied the CAPTCHA with a hash of the user's guess to see if they match. This then confirms whether the user's guess was correct.
The technique of validating a CAPTCHA with a hash opens up two possible weaknesses. Firstly, the original word(s) can be derived by brute-forcing the hash, which is remarkably quick given the limited complexity of the words. Secondly, in a few cases you can just replace the hash with your own, then supply the words that you used to create the hash. Below are examples of both of these techniques...
Replacing the hash
Replacing the CAPTCHA's hash with one created from a previous challenge, of which the solution is known, is often the quickest way to circumvent these poorly implemented CAPTCHAs.
Brute-forcing a hash
Creating a hash of every combination of characters used by the CAPTCHA until you produce the same hash as the one exposed by the CAPTCHA (known as brute-forcing) is also effective. It can also be extremely quick provided that the CAPTCHA doesn't use too many different characters. There are many tools/libraries available that can be used for brute-forcing different hashing algorithms, the most popular of which is John the Ripper.