The Long Tail of Invalid Clicks and other Google click fraud concepts

Some fine weekend reading for search engineers, SEOs, and spam network operators:

A 47-page independent report on Google Adwords / Adsense click fraud, filed yesterday as part of a legal dispute between Lane’s Gifts and Google, provides a great overview of the history and current state of click fraud, invalid clicks of all types, and the four-layered filtering process that Google uses to detect them.

Google has built the following four “lines of defense” against invalid clicks: pre-filtering, online filtering, automated offline detection and manual offline detection, in that order. Google deploys different detection methods in each of these stages: the rule-based and anomaly-based approaches in the pre-filtering and the filtering stages, the combination of all the three approaches in the automated offline detection stage, and the anomaly-based approach in the offline manual inspection stage. This deployment of different methods in different stages gives Google an opportunity to detect invalid clicks using alternative techniques and thus increases their chances of detecting more invalid clicks in one of these stages, preferably proactively in the early stages.

An interesting observation is that most click fraud can be eliminated through simple filters. Alexander Tuzhilin, author of the report, speculates on a Zipf-law Long Tail of invalid clicks of less common attacks, and observes:

Despite its current reasonable performance, this situation may change significantly in the future if new attacks will shift towards the Long Tail of the Zipf distribution by becoming more sophisticated and diverse. This means that their effects will be more prominent in comparison to the current situation and that the current set of simple filters deployed by Google may not be sufficient in the future. Google engineers recognize that they should remain vigilant against new possible types of attacks and are currently working on the Next Generation filters to address this problem and to stay “ahead of the curve” in the never-ending battle of detecting new types of invalid clicks.

He also highlights the irreducible problem of click fraud in a PPC model:

  • Click fraud and invalid clicks can be defined conceptually, but the only working defintion is an operationally defined one
  • The operational definition of invalid clicks can not be fully disclosed to the general public, because it will lead to massive click fraud.
  • If the operational definition is not disclosed to some degree, advertisers can not verify or dispute why they have been charged for certain clicks

The court settlement asks for an independent evaluation of whether Google’s efforts to combat click fraud are reasonable, which Tuzhulin believes they are. The more interesting question is whether they will continue to be sufficient as time progresses and the Long Tail of click fraud expands.

Links:

More tea leaves from Google’s analyst day presentation

It seems that a lot of the interesting content from last week’s analyst event at Google is in the speaker notes from the PowerPoint slide deck. Greg Linden and others have already pointed out the notes about Google’s storage plans (GDrive, Lighthouse on slide 19).

This afternoon there’s another blip on CNBC about accidental communications in the slides.

The previously undisclosed notes stated that Google’s core advertising business was expected to grow by nearly 60 percent to $9.5 billion in 2006 but that profit margins in its mainstay AdSense business could be squeezed this year and beyond.

I didn’t remember seeing a revenue forecast in there, so I went back and looked to see what it actually said (slide 14).

Our ads business for the moment is healthy and growing and we’re on a strong trajectory
projected to grow from $6bn this year to $9.5bn next year based purely on trends in traffic and monetization growth

But strong competitors are attempting to aggregate traffic
AdSense margins will be squeezed in 2006 and beyond
Y! and MSN will do un-economic things to grow share
The ad network will be commoditized over time
So, we need to build a more complete ads system that is characterized by two words: wider and deeper. That is, cast the net wider to attract new customer types) and deeper to enhance our relationship with existing customers.

Reuters says these particular notes were supposedly left in accidentally from internal planning discussions in late 2005.

“These notes were not created for financial planning purposes, and should not be regarded as financial guidance. Consistent with past practice, Google is not providing revenue guidance,” Google said in the filing.

I liked “Y! and MSN will do un-economic things to grow share”.

Don’t think we’ll be getting PowerPoint files from Google investor relations next time around. There’s a PDF file up now.

Update 03-08-2006 21:34 PDT: Paul Kedrosky has posted a copy of the original PPT slides.

Follow the Money – Microsoft Windows Live, Google, and Web 2.0


Some thoughts following the Microsoft splash this week:

The big PR launch for Windows Live last Tuesday announced a set of web services initiatives. It probably drives a lot of Microsoft people crazy to have the technology and business resources that they do, and to have so little mindshare in the “web 2.0″ conversations that are going on. I haven’t read through or digested all the traffic in my feed reader, but it looks like a lot of people are unimpressed by the Microsoft pitch. Been there, done that. Which is true, as far as I can see. The more interesting question is whether this starts to change the flow of money and opportunities around developing for and with Microsoft products and technologies.

If I do a quick round of free association, I get something like this:

Microsoft:

  • corporate desktop
  • security update
  • vista delayed
  • who’s departed this week

Microsoft is a huge, wildly profitable company. It initially got there by being “good enough” to make a new class of applications and solution developers successful in addressing and building new markets using personal computers, doing things that previously required a minicomputer and an IT staff. Startup companies and individual developers that worked with Microsoft products made a lot of money, doing things that they couldn’t do before. All you needed was a PC and some relatively inexpensive development tools, and you could be off selling applications and utilties, or full business solutions built on packages like dBase or FoxPro.

Microsoft made a lot of money, but the software and solutions developers and other business partners and resellers also made a lot of money, and the customers got a new or cheaper capability than what they had before. Along the way, a huge and previously non-existent consumer market for IT equipment and services also emerged. Meanwhile, the market for expensive, low end minicomputers and applications disappeared (Wang, Data General, DEC Rainbow, HP 98xx) or moved on to engineering workstations (Sun, SGI, HP, DEC/MIPS) where they could still make money.

The current crop of lightweight web services and “web 2.0″ sites feels a little like the early days of PC software. In addition to recognizable software companies, individual developers would build yet-another-text editor or game and upload it to USENET or a BBS somewhere, finding an audience of tens or hundreds of people, occasionally breaking out into mass awareness. Bits and pieces are still around, like ZIP compression, but most of it has disappeared or been absorbed and consolidated into other software somewhere. I have a CD snapshot of the old SIMTEL archive from years ago that’s full of freeware and shareware applications that all had a modest following somewhere or another. Very few people made any money from that way. In the days before the internet, distribution of software was expensive, and payment meant writing and mailing a check, directly from the end user to the developer.

Google has become a huge, wildly profitable company so far by building a better search engine to draw in a large base of users, and using their platform to do a better job of matching relevant advertising to the content it’s indexing. Now, a small application can quickly find an audience by generating buzz on the blogging circuit, or through search engines, and receive two important kinds of feedback

  • Usage data – what are the users doing and how is the application behaving
  • Economic data (money) – which advertising sponsors and affiliates provide the best return

Google’s Adsense and other affiliate sales programs are effectively providing a form of micropayments that are providing incentives and funding for new content and applications, with no investment in direct sales or payment processing by the developers, and no committment from the individual end user.

It’s simply a lot easier for a small consumer targeted startup to come up with a near term path to profitability based on maximizing the number of possible clients (=cross platform, browser based), being able to scale out easily by adding more boxes (not hassling with tracking and paying for additional licenses), and with a short path to revenue (i.e. Adsense, affiliate sales). A developer who might have coded a shareware app in the 80’s can now build a comparable web site or service and find an audience, and actually make a little (or a lot of) money. Google makes a lot of money from paid search ($675MM from Adsense partner sites in 3Q05), but now some of that money is flowing to teams building interesting web applications and content.

In contrast, in the corporate environment (where it’s effectively all Microsoft desktops now), things are different. Most organizations won’t let individuals or departments randomly throw new applications onto the network and see what happens. This is a space that usually requires deep domain expertise, and/or C-level friends, in order to get close enough to the problems to do something about it. But the desktops all have browsers, and the IT managers don’t want to pay for any more Windows or Oracle licenses than they are forced to, so there’s some economic pressure to move away from Windows. But there’s also huge infrastructure pain, if your company is built on Exchange. There’s less impetus here for new features, the issue is to keep it secure, keep it running, and make it cost less. Network management, security, and application management are all doing OK in the enterprise, along with line-of-business systems, but these are really solutions and consulting businesses in the end. The fastest way to get “web 2.0″ into these environments is for Microsoft to build these capabilities into their products, preferably in as boring but useful a way as possible. Not a friendly place for trying out a whizzy new idea, and generally a hard place for a lightweight software project to crack.

On another front, Microsoft also has most of the consumer desktop market, but by default rather than by corporate policy. Mass market consumers are likely to use whatever came with their computer, which is usually Windows. They’re also much more likely to actually click on the advertisements. Jeremy Zawodny posted some data from his site showing that most of his search traffic comes from Google, but the highest conversion rates come from MSN and AOL. MSN users also turn out to be the most valuable on an individual basis, in terms of the effective CPM of those referrals on his site.

So let’s see:

  • Many new application developers are following the shortest path to money, presently leading away from Microsoft and toward open source platforms, with revenue generation by integrating Google and other advertising and affiliate services
  • Microsoft has access to corporate desktops, as well as mainstream consumer desktops, where it’s been increasingly difficult for independent software developers to make any money selling applications
  • Microsoft is launching a lot of new me-too services in terms of technical capability, but which will have some uptake by default in the corporate and mass market
  • Microsoft’s corporate users and MSN users are likely to be later adopters, but may be more likely to be paying customers for the services offered by advertisers.
  • Microsoft could attract more new web service development if there were some technical or economic incentives to do so; at present it costs more to build a new service on Microsoft products, and there’s little alignment of financial incentives between Microsoft, prospective web application developers, and their common customers and partners.

Mike Arrington at TechCrunch has a great set of play-by-play notes from the presentation and a followup summary. He thinks the desktop gadgets and VOIP integration are exciting.

what really got me today was the Gadget extensibility and the full VOIP IM integration.

In the past, Microsoft grew and made a lot of money by helping a lot of other people make money. Today, the developers are following the money and heading elsewhere, mostly to Google. This could quickly change if Microsoft comes up with a way to steer some of their valuable customers and associated indirect revenue toward new web application developers. They are the incumbent, with huge market share and distribution reach. I don’t think they’ll ever have the “cool” factor of today’s web2.0 startups, and I don’t think they’ll regain the levels of market share they have had in the past with Windows, Office, and Internet Explorer. But they could be getting back in the game, and if they come up with a plan to make some real money for 3rd party web developers we’ll know they’re serious.

Google Search Result Page Changes?

google alternate search results page

Google seems to be trying out some alternate layouts for the search results pages. This morning, I got one page with just a small Google logo next to the text box, which keeps more results on the screen, and a couple of pages with a larger box of text ads at the top, which was bad, because it pushed the useful results down the page.

I hope they keep the small logo, without the big text ads at the top. The text ads at the top would probably generate some incremental revenue for Google, but hurts the usability. For me, this is partially because I’ve gotten used to Google’s page layout, so I can’t scan the results page as quickly.

Adsense

I’m doing a little experimenting with AdSense. So far most of my pages come up with ads for “Start your blog now” or “Sexy Girls & Sexy Guys”. It’s interesting to see which posts trigger a keyword match. I have observed a few posts that have switched from generic blog ads to a topical ad after a followup visit from the Mediapartners-Google crawler. You’d think that a post on the Blackdog Linux Server, the Yahoo-Alibaba deal, or visiting the Mona Lisa at the Louvre would trip a keyword or two.

The banners are only on the single post templates at the moment, so you’ll need to click on a post to see them. There’s also a set of vertical text ads at the bottom of the sidebar. I can tell I’m probably going to end up starting on a round of site revisions by the time I’m done with this, although I’m just interested in getting a better handle on the advertising and affiliate space at the moment.

Update: 08-15-2005 23:58 – At least this post has gotten tagged with Adsense ads. It will be interesting to see which pages actually trigger clickthroughs, vs which pages get reasonable keyword tags from Adsense.