PDA

View Full Version : Combating Referer Spam


the_pm
April 7th, 2006, 23:30
When I came online this afternoon, I noticed we had about 35 guests on IWDN. On top of that, when I checked who was online, I was shocked to discover hardly any of them were recognized spider, they all had different IPs, and they were all looking at different stuff.

Sounds like we might have gotten a nice SE boost, right? So, I go to check on the last 300 visitors to see where they're all coming from. Debt consolidation sites (and a few from AssTraffic).

I did a little search on blocking referers. I know how it's done sort of informally, but I've never tried the markup myself. This was the best I could find:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^example.com$ [NC]
RewriteCond %{REMOTE_ADDR} ^(.*)$ [NC]
RewriteRule ^(.*)$ http://%1 [R=301,L]Now, I can get it so I can block individual domains, but when you're being spammed with hundreds of variations, this gets old, fast.

How do you write the REMOTE_ADDR line so that you can specify a single word with wildcards on either side, effectively blocking any domain name that contains that word? When I tried to do this, I blocked all traffic from the sitefor about 20 seconds :oops:

Martin
April 8th, 2006, 00:13
Some reading on the subject (http://diveintomark.org/archives/2003/02/26/how_to_block_spambots_ban_spybots_and_tell_unwante d_robots_to_go_to_hell)

May help. :)

Cameron
April 8th, 2006, 00:17
Well, that explains why I was being redirected to my IP :D

However, you would more then likely want to match REMOTE_HOST, and not REMOTE_ADDR (REMOTE_ADDR is the IP Address, while REMOTE_HOST should be the given hostname e.g. www.iwdn.net).


RewriteCond %{REMOTE_HOST} ^(.*).com$ [NC]

So something like that.

Also: Mod_rewrite Cheat Sheet (.PNG) (http://www.ilovejackdaniels.com/mod_rewrite_cheat_sheet.png) (PDF (http://www.ilovejackdaniels.com/mod_rewrite_cheat_sheet.pdf))

the_pm
April 8th, 2006, 00:50
I have a basic understanding of how it works. The one little bit of information I can't grasp is the how to effectively apply the wildcard to both the front and the back of the term in question. I would hvae thought just drop a * on both ends, but that was causing problems.

I've read, something isn't clicking. Could someone give me that one line of markup, and set it so it blocks any referer with the word "debt" in it? If I can see it done properly, I'll understand how to format it.

Cameron
April 8th, 2006, 01:01
RewriteCond %{REMOTE_HOST} ^(.*(debt)*).com$ [NC]

the_pm
April 8th, 2006, 01:37
Got it! Thanks Cameron. The part I was missing was the second instance of parentheses around debt. Ok, let see if we can't lick this referer spam issue, once and for all...

Cameron
April 8th, 2006, 02:18
No Problem.

the_pm
April 8th, 2006, 04:02
Just to see the results with my own eyes, I found a handy little script that allows me to spoof referers. I logged into the server through shell and did a series of telnet connections using various combinations of words within my Referer string. I also tested to make sure IWDN was accepting the telnet connection and specified port.

Everything works great! :)

the_pm
April 9th, 2006, 00:08
Ok, it appears this is not working :(

It defeats my own tests, but it is not stopping the bots. Here's what I have in my htaccess:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^iwdn.net$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(debt)*).com$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(consolidation)*).com$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(asstraffic)*).com$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(wrongsideoftown)*).com$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(mortgage)*).com$ [NC]
RewriteCond %{REMOTE_HOST} ^(.*(loans)*).com$ [NC]
RewriteRule ^(.*)$ http://%1 [R=301,L](I removed a few of the more inappropriate ones :lol: )

Here's an example of one of the many bots that is bypassing the filters:
Host: 205.213.111.55

/index.php?p=4132
Http Code: 200 Date: Apr 08 16:29:40 Http Version: HTTP/1.1 Size in Bytes: 22216
Referer: http://equity-loans.mortgage-certificates.com/
Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Hotbar 3.0)It is not being filtered for having either the word "loans" or the word "mortgage" in it. Any thoughts?

Pauly
April 10th, 2006, 16:21
I'm no expert in .htaccess but I have a moderate understanding of it. I've always seen this done differently though (Not ever done it myself so I don't know the differences or if it works).

Wildcard Versions:

RewriteCond %{HTTP_REFERER} (mywildcardword) [NC,OR]
RewriteCond %{HTTP_REFERER} (my?cardword) [NC]
RewriteRule .* - [F]

Specifc:

RewriteCond %{HTTP_REFERER} (myurl\.com) [NC]
RewriteRule .* - [F]

I also believe when you have multiple ones, it should be [NC,OR] and only the last before the rewrite rule should be [NC]. I also found this interesting snippet, so if you accidentally block a real user rather than a bot, they can have a brief description why, and a link (Directions how) to continue.

RewriteRule .* bad_referrer.php [L]

Instead of the rewrite rule in the above examples. Again I'm not sure if it'll work that well or if it's even correct :( Great help huh? :lol:

Pauly
April 10th, 2006, 17:06
The wildcard part where it says ? might actually be -? I'm not entirely sure.