Home » Security & Spam Etc » Spider Blocking » Spider Trap - Detects and blocks bad bots
|Spider Trap - Detects and blocks bad bots [message #13]
||Wed, 17 August 2005 08:37
Registered: May 2005
The following addition to your site will ensure that any robots (AKA spiders) that do not follow the instructions in your robots.txt file, as required by agreed web standards, will be blocked, and will get 5000 dud email addresses for their trouble (for the spam email harvesters out there..).|
Good robots will read your robots.txt file, and will do as you ask and ignore the trap. When a bad robot follows your hidden 1 pixel link, it lands in the trap.
The trap spawns random email addresses on the page requested, and updates your .htaccess file with a block on the source IP address of the spider. The IP addresses are blocked by being added automatically to the top of your .htaccess file, and can be deleted any time you wish if in error. You are emailed the IP and user agent for each IP that is blocked by the trap.
In robots.txt add the following. Allow a few days after this before adding the trap to avoid trapping nice spiders. If your site does not yet have a robots.txt file, simply create one with the following in it, and upload it to the root directory of your website.
Basically, the instruction below is for all robots/spiders to stay out of this file, which is what the good bots will do (google, yahoo, etc.).
/getout.php is the file and directory to your own trap file. You may wish to change this to another name, and put it in a directory Eg. /welcome/index.php, or whatever you have decided is the file/directory you want to put your trap in. You must ensure that the file ends in .php though...
Once you are confident good bots have read this file and are abiding by it (allow at least 2-3 days), make the following additions:
Add this to the very top of your .htaccess file in site root:
SetEnvIf Request_URI "^(/403.*\.htm|/robots\.txt)$" allowsome
deny from env=getout
allow from env=allowsome
* Note: the above 1st line contains a pipe "|" after "htm", and not a small letter "L".
For your trap file, in this case getout.php, the contents are:
//////* CONFIGURATION START */////
$filename = '/home/username/public_html/.htaccess';// Change username to your hosting account username to suit the path to your .htaccess file
$emailalert = 'firstname.lastname@example.org';// Change to your email address
$emailfrom = 'as_above';// Change to alternative email address that you want the alert to appear from, or leave as 'as_above'
$qtyemails = 5000;// How many dud emails do you want to generate?
/////* CONFIGURATION END */////
// Do not adjust below here! //
if ($emailfrom == 'as_above') $emailfrom = $emailalert;
$content = "SetEnvIf Remote_Addr ^".str_replace(".","\.",$_SERVER["REMOTE_ADDR"])."$ getout # ".$_SERVER["HTTP_USER_AGENT"]."\r\n";
$handle = fopen($filename, 'r');
$content .= fread($handle,filesize($filename));
$handle = fopen($filename, 'w+');
"The following ip just got banned because it accessed the spider trap.\r\n\r\n".$_SERVER["REMOTE_ADDR"]."\r\n".$_SERVER["HTTP_USER_AGENT"]."\r\n".$_SERVER["HTTP_REFERER"]
// start free emails for spider
$page = '';
for ( $i = 0; $i < $qtyemails; $i++ )
$page .= new_email();
$page .= "Goodbye!";
$email = '';
$letters_array = array('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
's', 't', 'u', 'v', 'w', 'x', 'y', 'z');
for ( $i = 0; $i < 17; $i++ )
$email .= ( $i!== 10 )? $letters_array[ mt_rand( 0, 25) ] : '@';
$email .= '.com.au';
$email = '<a href="mailto:' . $email . '">' . $email . "</a>\n";
* Note: You MUST configure the above file for your installation - adjust the values for $filename & $emailalert. You may also adjust the other values, though this is not required.
Finally, you need to add a tiny link to your site for the bots to recklessly follow when they ignore your robots.txt file. In the following example, I use a 1 pixel transparent image (an invisible dot to your users) on the website:
<a href="http://www.mydomain.com.au/getout.php"><img src="http://www.mydomain.com.au/images/pixel_trans.gif" border=0></a>
If you want mine, you can get it here. Just right-click and choose "save as": 1 Pixel transparent image
That's it! You will now be alerted when a robot follows the link that you have instructed them not to, and it will be banned from your site thereafter! Bad Bot!
[Updated on: Wed, 21 December 2005 11:31]
30 day money-back guarantee with all Portability hosting
Web site addresses from $29.00 AUD
Current Time: Sun Nov 29 08:56:46 EST 2015
Total time taken to generate the page: 0.96742 seconds