Blocking spam domain referrals with .htaccess

After a few weeks of relative quiet on the web spam front over the holidays, I see that there’s been a huge uptick in new spam referrers turning up in the server logs here. I suppose the spam operators have come back from the holidays as well.

Referrer spam is fairly pointless on my site, since I don’t publish automatic lists of site referrers. However, it can chew up a lot of bandwidth and CPU cycles.

The first line of defense here is using .htaccess to completely block the top level domains and IP addresses that are known to be problematic. Additional filtering is done by individual applications such as Wordpress (Akismet et al) but if you block unwanted traffic with .htaccess you don’t even generate the web page. This is of particular interest to Dreamhost users, where they seem to have started tracking CPU use more closely.

The method shown here will work on Dreamhost and most other hosted services that allow user defined .htaccess files. Here’s the general approach. In .htaccess, we define the “bad referrer” patterns, which will match the incoming HTTP Referer field. If the BadReferrer variable is set, then we block that request, sending back an HTTP 403 response.

SetEnvIfNoCase Referer ".*.baddomain.com" BadReferrer
SetEnvIfNoCase Referer ".*anotherbaddomain.com" BadReferrer

order deny,allow
deny from env=BadReferrer

The patterns used in .htaccess are regular expressions, which can be a little hard to read. It’s important to precede the “.” in domain names with a “\”. It’s a good idea to save a working copy of .htaccess before you start editing it, since you can make your entire site inaccessible if you accidentally create a pattern that blocks all referrers rather than the unwanted ones. The first case shown above will block “www.baddomain.com”, while the second will block “anotherbaddomain.com”. See the Apache documentation for more complete information on using .htaccess.

Here’s the htaccess blocklist rules I’m currently using to keep out spam domain referrals. The patterns don’t exactly match real domains in all cases, as wildcards are used occasionally to catch variations of the same domain name. (Warning — many of the domain names are predictably offensive.)

If have specific IP addresses that you want to block, you can also define that here:

deny from 10.0.0.1

Blocking individual IP addresses is less useful, since most spam traffic seems to originate from networks of hijacked PCs, with IP addresses all over the world that change frequently. That said, there also seem to be individual spammers who are running spamming applications on their personal computers, which are pretty easy to block in .htaccess.

Here my previous notes on referrer spam:

2 Responses to “Blocking spam domain referrals with .htaccess”

  1. veridicus Says:

    Be aware blocking by domain name requires reverse DNS lookup to be on. Otherwise apache only has the IP address to work with. Based on your experience it appears Dreamhost has it turned on. That’s surprising because it eats CPU cycles and network traffic. For efficiency most big hosts turn it off.

    If a host has it turned off and you’re using PHP to run your site you can do reverse DNS lookups through PHP code and handle all cases there. Of course this also eats up CPU and network traffic, but for small enough sites it may be worth it.

  2. Hacks and Gadgets by HJL » Blog Archive » Blizzard of comment and trackback spam Says:

    […] This round of spam seems better implemented than previous ones. The incoming comments and trackbacks are distributed across the entire site, rather than on one or two posts, and are also coming in from a broad set of IP addresses and user agents, which makes it hard to use .htaccess rules for blocking it. […]

Leave a Reply