Reverse engineering a referer spam campaign

It looks like someone’s launched a new referrer spam campaign today, there’s a huge uptick in traffic here. The incoming requests are from all over the internet, presumably from a botnet of hijacked PCs, but it looks like all of the links point to a class C network at 85.255.114 somewhere in the Ukraine.
It’s interesting to think a little about link spam campaigns and what opportunity the operators hope to exploit. Two major types of link spam on blogs are comment spam and referrer spam. My perception is that comment spam is more common. Most blogs now wrap outgoing links in reader comments with “rel=nofollow” to prevent comments links from increasing Google rank for the linked items, but the links are still there for people to click on.
Referrer spam is more indirect. It is created by making an HTTP request with the REFERER header set to the URL being promoted. Most of the time, this will only be visible in the web server log.
Here is a typical HTTP log entry:
87.219.8.210 [04/Feb/2006:15:20:35 -0800]
GET /weblog/archives/2005/09/15/google-blog-search-referrers-working-now HTTP/1.1
403 - \"http://every-search.com\"
Some blogs and other web sites post an automatically generated list of “recent referrers” on their home page or on a sidebar. In normal use, this would show a list of the sites that had linked to the site being viewed. Recent referrer lists are less common now, because of the rise of referrer spam.
Referrer spam will also show up in web site statistic and traffic summaries. These are usually private, but are sometimes left open to the public and to search engines.
One presumed objective of a link spam campaign is to increase the target site’s search engine ranking. In general this requires building a collection of valid inbound links, preferably without the “nofollow” attribute. Referrer spam may be more effective for generating inbound links, since recent referrer lists and web site reports typically don’t wrap their links with nofollow.
The landing pages for the links in this campaign are interesting in that they don’t contain advertising at all. This suggests that this campaign is trying to build a sort of PageRank farm to promote something else.
The actual pages are all built on the same blog template, and contain a combination of gibberish and sidebar links to subdomains based on “valuable” keywords. Using the blog format automatically provides a lot of site interlinking, and they also have “recent” and “top referer” lists, which are all from other spam sites in the network.
It looks like the content text should be easy to identify as spam based on frequency analysis. Perhaps having a very large cloud of spam sites linking to each other along with a dispersed set of incoming referrer spam links makes the sites look more plausible to a search engine? These sites don’t appear to have any, but I have come across other spam sites and comment spam posts that have links to non-spam sites such as .gov and .edu sites, perhaps trying to look more credible to a search engine ranking algorithm. All the sites being on the same subnet makes them easier to spot, though.
Given that there aren’t that many public web site stat pages and recent referrer lists around, I’m surprised that referrer spamming is worth the effort. If the spam network can achieved good ranking in the Google and the other search engines, they can probably boost the ranking for a selected target site by pruning back some of their initial links and adding some links pointing at the sites that they want to promote. Affiliate links to porn, gambling, or online pharmacy sites must pay reasonably well for this to work out for the spammers.
More reading: A list of references on PageRank and link spam detection.
If you’re having referrer spam problems on your site, you may find my notes on blocking referer spam useful.
Here’s some sample text from “search-buy.com”:
I search-buy over least and and next train. Ne so at cruelty the search-buy in after anaesthesia difficulty general urinating. T pastry a ben for search-buy boy. An refuses trip search-buy romances seemed azusa pacific university ca. Stoc of my is and search-buy direct having sex teen titans. Kid philadelphiaa would and york search-buy. G search-buy wore shed i dads. obstacles future search-buy right had satire nineteenth. The that i ups this on search-buy least finds audio express richmond. have this window been wonderful me search-buy so. Surel in actually search-buy our boy deep franklin notions. An search-buy it of my has of. To at head boy that a search-buy. O james search-buy everywhere of but. Alread originate search-buy good about since.
Here are a few spam sites from this campaign and their IP addresses:
bikini-now.com A 85.255.114.212 babestrips.com A 85.255.114.229 search-biz.biz A 85.255.114.245 bustytart.com A 85.255.114.250 cjtalk.net A 85.255.114.227 search-galaxy.org A 85.255.114.252 moresearch.org A 85.255.114.237
Here is the WHOIS output for that netblock:
% Information related to '85.255.112.0 - 85.255.127.255' inetnum: 85.255.112.0 - 85.255.127.255 netname: inhoster descr: Inhoster hosting company descr: OOO Inhoster, Poltavskij Shliax 24, Kharkiv, 61000, Ukraine remarks: ----------------------------------- remarks: Abuse notifications to: abuse@inhoster.com remarks: Network problems to: noc@inhoster.com remarks: Peering requests to: peering@inhoster.com remarks: ----------------------------------- country: UA org: ORG-EST1-RIPE admin-c: AK4026-RIPE tech-c: AK4026-RIPE tech-c: FWHS1-RIPE status: ASSIGNED PI mnt-by: RIPE-NCC-HM-PI-MNT mnt-lower: RIPE-NCC-HM-PI-MNT mnt-by: RECIT-MNT mnt-routes: RECIT-MNT mnt-domains: RECIT-MNT mnt-by: DAV-MNT mnt-routes: DAV-MNT mnt-domains: DAV-MNT source: RIPE # Filtered organisation: ORG-EST1-RIPE org-name: INHOSTER org-type: NON-REGISTRY remarks: ************************************* remarks: * Abuse contacts: abuse@inhoster.com * remarks: ************************************* address: OOO Inhoster address: Poltavskij Shliax 24, Xarkov, address: 61000, Ukraine phone: +38 066 4633621 e-mail: support@inhoster.com admin-c: AK4026-RIPE tech-c: AK4026-RIPE mnt-ref: DAV-MNT mnt-by: DAV-MNT source: RIPE # Filtered person: Andrei Kislizin address: OOO Inhoster, address: ul.Antonova 5, Kiev, address: 03186, Ukraine phone: +38 044 2404332 nic-hdl: AK4026-RIPE source: RIPE # Filtered person: Fast Web Hosting Support address: 01110, Ukraine, Kiev, 20Á, Solomenskaya street. room 201. address: UA phone: +357 99 117759 e-mail: support@fwebhost.com nic-hdl: FWHS1-RIPE source: RIPE # FilteredTags: spam, search, seo, google, pagerank, spamrank, filtering



























