How to crawl a website without getting blocked or misled (cloaked)?
Steve M avatar
Written by Steve M
Updated over a week ago

What are the risks?

When a crawler is detected by the proxy IP, it automatically

  • Misleads the IP by displaying incorrect information

  • Suppress the data response rate

  • Blocks the IP

How is crawling activity detected by the target websites?

When an IP visits the target website, the website will automatically analyze the activity and logs it in the records. The target websites can detect unusual activities mentioned below:

  1. It detects and analyzes the number of request per second from the IP and checks if it is more than a human can achieve in the interval.

  2. Identifies the IP address and checks whether the proxy crawled on the website before. 

How to avoid the detection?

  1. Limit the number of connections request from an IP per second. This may affect the crawling speed, however, the IP will be protected from detection.

  2. Rotate the IPs regularly to prevent the target website from identifying the IP.

