🚀 CloudSEK has raised $19M Series B1 Round – Powering the Future of Predictive Cybersecurity
Read More
Protect your organization from external threats like data leaks, brand threats, dark web originated threats and more. Schedule a demo today!
Schedule a Demo
Internet service providers generally face the risk of authentication-related attacks, spam, Denial-of-Service attacks, and data mining bots. Completely Automated Public Turing test, to tell Computers and Humans apart, popularly known as CAPTCHA, is a challenge-response test created to selectively restrict access to computer systems. As a type of Human Interaction Proof, or a human authentication mechanism, CAPTCHA generates challenges to identify users. In essence, a CAPTCHA test can tell machines/ computers and humans apart. This has caused a heightened adoption of CAPTCHAs across various online businesses and services.
The concept of CAPTCHA depends on human sensory and cognitive skills. These skills enable humans to read a distorted text image or choose specific images from several different images. Generally, computers and computer programs such as bots are not capable of interpreting a CAPTCHA as they generate distorted images with text or numbers, which most Optical Character Recognition (OCR) technologies fail to make sense of. However, with the help of Artificial Intelligence, algorithms are getting smarter and bots are now capable of cracking these tests. For instance, there are bots that are capable of solving a text CAPTCHA through letter segmentation mechanisms. That said, there aren’t a lot of automated CAPTCHA solving algorithms available.Â
This article outlines the various methods of generating and verifying CAPTCHAs, their application, and multiple ways to bypass CAPTCHAs.
Web developers deploy CAPTCHAs on websites to ensure that they are protected against bots. CAPTCHAs are generally used to prevent:
The image below represents the common method of generating and verifying CAPTCHAs:
Google reCAPTCHA is a free service offered to prevent spam and abuse of websites. It uses advanced risk analysis techniques and allows only valid users to proceed.Â
Browser extensions such as Buster help solve CAPTCHA verification challenges. Buster, for instance, uses speech recognition software to bypass reCAPTCHA audio challenges. reCAPTCHA allows users to download audio files. Once it is downloaded, Google’s own Speech Recognition API can be used to solve the audio challenge.
Online CAPTCHA solving services offer human based services. Such services involve actual human beings hired to solve CAPTCHAs.Â
The jQuery real person CAPTCHA plugin prevents automated form submissions by bots. These plugins offer text-based CAPTCHAs in a dotted font. This solves the problem of fake form submissions.Â
The following steps can be used to solve real person CAPTCHAs:
In this one-time process:
After successfully completing process A, set up a process to:
Example:Â
from selenium import webdriver
import time
dataset = {' * * * * * ******* ': 'J',
'******* * * * * * *': 'L',
'******** * ** * ** * ** * ** * * ** ** ': 'B',
'* * * **** * * * ': 'Y',
'* * * ******** * * ': 'T',
' ***** * ** ** ** ** * * * ': 'C',
'******** * ** * ** * ** ** ** *': 'E',
'******** ** ** ** ** * ***** ': 'D',
'* ** ** ********* ** ** *': 'I',
' ***** * ** ** ** ** * ***** ': 'O',
'******* * * * * * *******': 'M',
'******* * * * * * *******': 'N',
'******** * * * * * * * * ': 'F',
' ** * * * ** * ** * ** * ** * * * ** ': 'S',
' ***** * ** ** ** * ** * **** *': 'Q',
'******* * * * * * * * * * * *': 'K',
' ** ** ** * * * ** * ** **': 'A',
'****** * * * * ******* ': 'U',
'******* * * * * * *******': 'H',
'** ** ** * ** ** ** ': 'V',
'* ** *** * ** * ** * *** ** *': 'Z',
'******** * * * * * * * * * ** ': 'P',
'* * * * * * * * * * * * *': 'X',
' ***** * ** ** ** * ** * * * ** ': 'G',
'******** * * * * * * ** * * * ** *': 'R',
'******* * * * * * *******': 'W'}
def group_captcha_string(word_pos):
captcha_string = ''
for i in range(len(word_pos[0])):
temp_list = []
temp_string = ''
for j in range(len(word_pos)):
val = word_pos[j][i]
temp_string += val
if val.strip():
temp_list.append(val)
if temp_list:
captcha_string += temp_string
else:
captcha_string += 'sp'
return captcha_string.split("spsp")
# create client
client = webdriver.Chrome()
client.get("http://keith-wood.name/realPerson.html")
time.sleep(3)
# indexing text
_get = lambda _in: {index: val for index, val in enumerate(_in)}
# get text from html tag
captcha = client.find_element_by_css_selector('form [class="realperson-text"]').text.split('\n')
word_pos = list(map(_get, captcha))
# group text
text = group_captcha_string(word_pos)
# get text(test)
captcha_text = ''.join(list(map(lambda x: dataset[x] if x else '', text)))
print("captcha:", captcha_text)
Text-based/ text-in-image CAPTCHAs are the most commonly deployed kind and they use distorted text rendered in an image. There are two types of text-based CAPTCHAs:
Simple CAPTCHAs can be bypassed using the Optical Character Recognition (OCR) technology that recognizes the text inside images, such as scanned documents and photographs. This technology converts images containing written text into machine-readable text data.
Example:
import pytesseract
import sys
import argparse
try:
import Image
except ImportError:
from PIL import Image
from subprocess import check_output
def resolve(path):
print("Resampling the Image")
check_output(['convert', path, '-resample', '600', path])
return pytesseract.image_to_string(Image.open(path))
if __name__=="__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('path', help = 'Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text', captcha_text)
# command to run script
python3 captcha_resolver.py cap.jpg
These text-in-image CAPTCHAs are too complex to be solved using the OCR technology. Instead the following measures can be considered:
Â
This unique challenge involves solving mathematical problems, particularly, finding the sum of integers.
To bypass this challenge, one can:
In distributed denial-of-service attacks, cyber criminals target network resources and render them inaccessible to users. These attacks temporarily or indefinitely slows down the target resource by flooding the target with incoming traffic from several hosts. To prevent such attacks, businesses use CAPTCHAs.Â
The following methods or programs can be used to bypass DDoS protected sites:
Discover how CloudSEK's comprehensive takedown services protect your brand from online threats.
What is shadow IT and how do you manage shadow IT risks associated with remote work?
GraphQL 101: Here’s everything you need to know about GraphQL
Take action now
CloudSEK Platform is a no-code platform that powers our products with predictive threat analytic capabilities.
Digital Risk Protection platform which gives Initial Attack Vector Protection for employees and customers.
Software and Supply chain Monitoring providing Initial Attack Vector Protection for Software Supply Chain risks.
Creates a blueprint of an organization's external attack surface including the core infrastructure and the software components.
Instant Security Score for any Android Mobile App on your phone. Search for any app to get an instant risk score.
min read
How to bypass CAPTCHAs easily using Python and other methods
Internet service providers generally face the risk of authentication-related attacks, spam, Denial-of-Service attacks, and data mining bots. Completely Automated Public Turing test, to tell Computers and Humans apart, popularly known as CAPTCHA, is a challenge-response test created to selectively restrict access to computer systems. As a type of Human Interaction Proof, or a human authentication mechanism, CAPTCHA generates challenges to identify users. In essence, a CAPTCHA test can tell machines/ computers and humans apart. This has caused a heightened adoption of CAPTCHAs across various online businesses and services.
The concept of CAPTCHA depends on human sensory and cognitive skills. These skills enable humans to read a distorted text image or choose specific images from several different images. Generally, computers and computer programs such as bots are not capable of interpreting a CAPTCHA as they generate distorted images with text or numbers, which most Optical Character Recognition (OCR) technologies fail to make sense of. However, with the help of Artificial Intelligence, algorithms are getting smarter and bots are now capable of cracking these tests. For instance, there are bots that are capable of solving a text CAPTCHA through letter segmentation mechanisms. That said, there aren’t a lot of automated CAPTCHA solving algorithms available.Â
This article outlines the various methods of generating and verifying CAPTCHAs, their application, and multiple ways to bypass CAPTCHAs.
Web developers deploy CAPTCHAs on websites to ensure that they are protected against bots. CAPTCHAs are generally used to prevent:
The image below represents the common method of generating and verifying CAPTCHAs:
Google reCAPTCHA is a free service offered to prevent spam and abuse of websites. It uses advanced risk analysis techniques and allows only valid users to proceed.Â
Browser extensions such as Buster help solve CAPTCHA verification challenges. Buster, for instance, uses speech recognition software to bypass reCAPTCHA audio challenges. reCAPTCHA allows users to download audio files. Once it is downloaded, Google’s own Speech Recognition API can be used to solve the audio challenge.
Online CAPTCHA solving services offer human based services. Such services involve actual human beings hired to solve CAPTCHAs.Â
The jQuery real person CAPTCHA plugin prevents automated form submissions by bots. These plugins offer text-based CAPTCHAs in a dotted font. This solves the problem of fake form submissions.Â
The following steps can be used to solve real person CAPTCHAs:
In this one-time process:
After successfully completing process A, set up a process to:
Example:Â
from selenium import webdriver
import time
dataset = {' * * * * * ******* ': 'J',
'******* * * * * * *': 'L',
'******** * ** * ** * ** * ** * * ** ** ': 'B',
'* * * **** * * * ': 'Y',
'* * * ******** * * ': 'T',
' ***** * ** ** ** ** * * * ': 'C',
'******** * ** * ** * ** ** ** *': 'E',
'******** ** ** ** ** * ***** ': 'D',
'* ** ** ********* ** ** *': 'I',
' ***** * ** ** ** ** * ***** ': 'O',
'******* * * * * * *******': 'M',
'******* * * * * * *******': 'N',
'******** * * * * * * * * ': 'F',
' ** * * * ** * ** * ** * ** * * * ** ': 'S',
' ***** * ** ** ** * ** * **** *': 'Q',
'******* * * * * * * * * * * *': 'K',
' ** ** ** * * * ** * ** **': 'A',
'****** * * * * ******* ': 'U',
'******* * * * * * *******': 'H',
'** ** ** * ** ** ** ': 'V',
'* ** *** * ** * ** * *** ** *': 'Z',
'******** * * * * * * * * * ** ': 'P',
'* * * * * * * * * * * * *': 'X',
' ***** * ** ** ** * ** * * * ** ': 'G',
'******** * * * * * * ** * * * ** *': 'R',
'******* * * * * * *******': 'W'}
def group_captcha_string(word_pos):
captcha_string = ''
for i in range(len(word_pos[0])):
temp_list = []
temp_string = ''
for j in range(len(word_pos)):
val = word_pos[j][i]
temp_string += val
if val.strip():
temp_list.append(val)
if temp_list:
captcha_string += temp_string
else:
captcha_string += 'sp'
return captcha_string.split("spsp")
# create client
client = webdriver.Chrome()
client.get("http://keith-wood.name/realPerson.html")
time.sleep(3)
# indexing text
_get = lambda _in: {index: val for index, val in enumerate(_in)}
# get text from html tag
captcha = client.find_element_by_css_selector('form [class="realperson-text"]').text.split('\n')
word_pos = list(map(_get, captcha))
# group text
text = group_captcha_string(word_pos)
# get text(test)
captcha_text = ''.join(list(map(lambda x: dataset[x] if x else '', text)))
print("captcha:", captcha_text)
Text-based/ text-in-image CAPTCHAs are the most commonly deployed kind and they use distorted text rendered in an image. There are two types of text-based CAPTCHAs:
Simple CAPTCHAs can be bypassed using the Optical Character Recognition (OCR) technology that recognizes the text inside images, such as scanned documents and photographs. This technology converts images containing written text into machine-readable text data.
Example:
import pytesseract
import sys
import argparse
try:
import Image
except ImportError:
from PIL import Image
from subprocess import check_output
def resolve(path):
print("Resampling the Image")
check_output(['convert', path, '-resample', '600', path])
return pytesseract.image_to_string(Image.open(path))
if __name__=="__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument('path', help = 'Captcha file path')
args = argparser.parse_args()
path = args.path
print('Resolving Captcha')
captcha_text = resolve(path)
print('Extracted Text', captcha_text)
# command to run script
python3 captcha_resolver.py cap.jpg
These text-in-image CAPTCHAs are too complex to be solved using the OCR technology. Instead the following measures can be considered:
Â
This unique challenge involves solving mathematical problems, particularly, finding the sum of integers.
To bypass this challenge, one can:
In distributed denial-of-service attacks, cyber criminals target network resources and render them inaccessible to users. These attacks temporarily or indefinitely slows down the target resource by flooding the target with incoming traffic from several hosts. To prevent such attacks, businesses use CAPTCHAs.Â
The following methods or programs can be used to bypass DDoS protected sites: