Search referrals – July 2006 snapshot


Here’s a quick snapshot of incoming search engine referrals for the past few weeks. Compare this with another post last year on search engine referral share, recently referenced in a post at Alexa noting the discrepancy between the published search engine traffic reports and anecdotal observations by webmasters.

Is it just me, or are these charts a bit goofy? Does Yahoo really still have 23% of the search market? Is Google at less than half the search market?

I don’t believe it. Any webmaster will tell you that Google represents almost ALL of the search engine traffic. Yahoo is nowhere near 23%. Just read the blogs, here, here, here and here and on countless other blogs.

Already at 82% last October, Google has increased to even more of the incoming search traffic (92%) here, largely at the expense of “Other”. In the fall, it looked like those were mostly miscellaneous Chinese search engines, so perhaps my site is not getting indexed or ranked well there anymore, or Google is picking up market share, or both.

Some of the commenters at the Alexa post noted increasing traffic from Microsoft / MSN / Live search, including one who got most of their traffic through MSN search. I’m a little surprised that I don’t see more traffic from Yahoo and Microsoft search here, but that may also be a function of who’s likely to be searching for a given topic.

See also Greg Linden’s comments on the competitiveness of Yahoo and Microsoft search efforts

Filtering, aggregating, searching, and monetizing the Long Tail

David Hornik asks: Where’s the Money in the Long Tail?

It is certainly the case that in the aggregate, Long Tail content is extraordinarily valuable. The question for VCs and entrepreneurs is “for whom?”

The real money is in aggregation and filtering and those will continue to be interesting businesses for the foreseeable future.

He points out that aggregators are building convenient one-stop shopping for people looking for topically-focused content, and derive economic value even when the content publishers do not.

David Beisel follows the money a little further:

…in the long run, the value of the network is not only determined by the number of nodes in it, but in the ability for the network to monetize those nodes.
…in calculating the value of a network, any equation describing it should contain a variable with the monetization rate (or proxied by the value to the user which can be monetized in the future). So while the number of nodes in a network surely is a fundamental (if not the majority, in many cases) driver of value, the value of the network itself to the user is also a very important component to the overall total.

Being the provider of a filtered view of online content is somewhat analogous to being an editor at a magazine or newspaper, a program director at a radio station, or an A&R rep at a record label. It usually doesn’t make sense to pursue some topics or styles as there’s either no audience, or a very low value audience, or an audience that’s too hard to reach.

Conversely, some publications do well on a very small base (financial newsletters and independent musicians come to mind). When the individual publisher (writer, musician, artist) develops their own audience, they are able to capture more of the value placed on their content by the consumers of content (readers, listeners, viewers) than when they are simply one of many aggregated content producers. People seek out their favorite writers in newspapers and magazines, talk show hosts on television, or musicians in local concerts. The content producers gain relative power over the distributors and a few can become their own branded media empire. (Think “Oprah”.)

From an investment point of view, it’s difficult to justify betting on any particular content producer becoming an online media star, for the same reasons aspiring writers/musicians/actors don’t get VC investment. (How are you going to know when you’ve got the next J.K. Rowling or Dan Brown on your doorstep looking for seed funding to write their book? )

In contrast, search, filtering and aggregation services can be built for specific audiences. The trick though is not just to find an audience, but to provide a service that is valuable over time to the audience, service provider, and content publishers. The Alexa Web Search Platform announcement this week is interesting not because it’s the best general purpose search engine, but because it may drop the effective cost of building some targeted filtering and aggregation services low enough to uncover some new interesting niches, in addition to the areas that are already being addressed by vertical search startups. Many of these niches may be profitable short term projects for a small team (or single person) but not durable enough to be investable, though.

Greg Linden adds:

Massive selection isn’t enough. To make the long tail accessible, irrelevant items should be hidden. Interesting items should be emphasized. Millions of poor choices should be reduced to tens of good ones. The value is in surfacing the gems from the sea of noise.

David Beisel has some suggestions:

Where’s my “social portal” for me as a skier enthusiast? Better yet, where’s the “About.com of social portals?” Or why isn’t About.com more social?

I suspect that someone will have that social portal for skiing enthusiasts in limited beta somewhere real soon now…

See also: The Home Pages of this New Era

Deconstructing search at Alexa

Wow! Although the basic idea is straightforward, crawling and indexing for a general purpose search engine requires huge resources. Web crawlers are effectively downloading copies of the entire internet over and over, turning them over to indexing applications which scan the contents for structure and meaning.

The sheer scale of the task is a substantial barrier to entry for anyone wanting to develop a new indexing or retrieval application. Some projects have narrowed the problem domain, which can reduce the problem scope to a manageable level, but this announcement from Alexa looks like it may offer an exciting alternative for building new search applications.

John Batelle writes:

Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it (details are not up yet, but soon). Alexa has about 5 billion documents in its index – about 100 terabytes of data.

Anyone can also use Alexa’s servers and processing power to mine its index to discover things – perhaps, to outsource the crawl needed to create a vertical search engine, for example. Or maybe to build new kinds of search engines entirely, or …well, whatever creative folks can dream up. And then, anyone can run that new service on Alexa’s (er…Amazon’s) platform, should they wish.

The service will be priced on a usage basis: $1 per CPU hour, $1 per GB stored or uploaded, $1 per 50GB data processed.

There’s no announcement posted on the Alexa or Amazon sites yet, it’s apparently due out overnight. (Updated 12-13-2005 00:25 – the site is up now)

Not every search and retrieval application is necessarily going to fit onto the way Alexa has built their crawler and indexing infrastructure, or onto any other search engine platform, for that matter. But opening up access to more of the platform should make it possible for a lot of new ideas to be tried out quickly without having to build yet another crawler for each project. Up to this point, many search ideas can’t be evaluated without working at one of the major search engines. I suspect most development teams would prefer to get access to Google’s crawl and index data, but I’m certainly looking forward to seeing what’s available at Alexa when they get their documentation online in the morning.

More from Om Malik, TechCrunch, ReadWrite Web