The Long Tail of Invalid Clicks and other Google click fraud concepts
Some fine weekend reading for search engineers, SEOs, and spam network operators:
A 47-page independent report on Google Adwords / Adsense click fraud, filed yesterday as part of a legal dispute between Lane’s Gifts and Google, provides a great overview of the history and current state of click fraud, invalid clicks of all types, and the four-layered filtering process that Google uses to detect them.
Google has built the following four “lines of defense” against invalid clicks: pre-filtering, online filtering, automated offline detection and manual offline detection, in that order. Google deploys different detection methods in each of these stages: the rule-based and anomaly-based approaches in the pre-filtering and the filtering stages, the combination of all the three approaches in the automated offline detection stage, and the anomaly-based approach in the offline manual inspection stage. This deployment of different methods in different stages gives Google an opportunity to detect invalid clicks using alternative techniques and thus increases their chances of detecting more invalid clicks in one of these stages, preferably proactively in the early stages.
An interesting observation is that most click fraud can be eliminated through simple filters. Alexander Tuzhilin, author of the report, speculates on a Zipf-law Long Tail of invalid clicks of less common attacks, and observes:
Despite its current reasonable performance, this situation may change significantly in the future if new attacks will shift towards the Long Tail of the Zipf distribution by becoming more sophisticated and diverse. This means that their effects will be more prominent in comparison to the current situation and that the current set of simple filters deployed by Google may not be sufficient in the future. Google engineers recognize that they should remain vigilant against new possible types of attacks and are currently working on the Next Generation filters to address this problem and to stay “ahead of the curve” in the never-ending battle of detecting new types of invalid clicks.
He also highlights the irreducible problem of click fraud in a PPC model:
- Click fraud and invalid clicks can be defined conceptually, but the only working defintion is an operationally defined one
- The operational definition of invalid clicks can not be fully disclosed to the general public, because it will lead to massive click fraud.
- If the operational definition is not disclosed to some degree, advertisers can not verify or dispute why they have been charged for certain clicks
The court settlement asks for an independent evaluation of whether Google’s efforts to combat click fraud are reasonable, which Tuzhulin believes they are. The more interesting question is whether they will continue to be sufficient as time progresses and the Long Tail of click fraud expands.
Links:
- Official Google Blog: Findings on invalid clicks
- Matt Cutts: Independent report on invalid clicks released
- The Lane’s Gifts v. Google Report (PDF)




























July 29th, 2006 at 7:57 am
Matt’s group at Google is entirely devoted to SEO spamming. There’s really no reason why PPC click fraud is any different. The financial motive is present in both cases, statistical models and loopholes should be similar and so on. The idea that you penalize your competitor is even there. The big differences are 1) Advertisers who are able to actually track conversion rate have the power to simply stop advertising, thus 2) Google has a great deal of incentive to identify and eliminate spammers.
As to the long tail, if an advertiser can accurately measure conversion (that is, how much you actually make in dollars from a PPC referral), it’s pretty easy to prune the under-performing ads from the tail. Our sites have a fairly simple system to identify adwords that are not meeting our conversion targets, and we just nuke them. I know of several other companies who have much more sophisticated methods that go one step further and adjust bids on specific adwords to optimize their performance. I would hope the biggest advertisers are all doing or moving to such a model — this alone should make click fraud a tenuous business model, at best.
Indeed, Google is moving towards a more cooperative model with their natural search webmasters, with Sitemaps. I bet the AdWords groups could work with larger advertisers, or even just monitor which ads they are killing or that are underperforming, as another data point in their statistical models.
Tom