Deconstructing search at Alexa

Wow! Although the basic idea is straightforward, crawling and indexing for a general purpose search engine requires huge resources. Web crawlers are effectively downloading copies of the entire internet over and over, turning them over to indexing applications which scan the contents for structure and meaning.

The sheer scale of the task is a substantial barrier to entry for anyone wanting to develop a new indexing or retrieval application. Some projects have narrowed the problem domain, which can reduce the problem scope to a manageable level, but this announcement from Alexa looks like it may offer an exciting alternative for building new search applications.

John Batelle writes:

Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it (details are not up yet, but soon). Alexa has about 5 billion documents in its index – about 100 terabytes of data.

Anyone can also use Alexa’s servers and processing power to mine its index to discover things – perhaps, to outsource the crawl needed to create a vertical search engine, for example. Or maybe to build new kinds of search engines entirely, or …well, whatever creative folks can dream up. And then, anyone can run that new service on Alexa’s (er…Amazon’s) platform, should they wish.

The service will be priced on a usage basis: $1 per CPU hour, $1 per GB stored or uploaded, $1 per 50GB data processed.

There’s no announcement posted on the Alexa or Amazon sites yet, it’s apparently due out overnight. (Updated 12-13-2005 00:25 – the site is up now)

Not every search and retrieval application is necessarily going to fit onto the way Alexa has built their crawler and indexing infrastructure, or onto any other search engine platform, for that matter. But opening up access to more of the platform should make it possible for a lot of new ideas to be tried out quickly without having to build yet another crawler for each project. Up to this point, many search ideas can’t be evaluated without working at one of the major search engines. I suspect most development teams would prefer to get access to Google’s crawl and index data, but I’m certainly looking forward to seeing what’s available at Alexa when they get their documentation online in the morning.

More from Om Malik, TechCrunch, ReadWrite Web

4 comments to Deconstructing search at Alexa

  • [...] And this news is tickling the tech communities belly like you would not believe. Eveyone and their mother is commenting on it. All the big hitters are weighing in. There it is atop memeorandum, Michael Arrington at Tech Crunch says “Amazon Gets It,” Richard MacManus, thinks this’ll make the Big Three, Yahoo, Google and Microsoft, cast their blood shot eyes Amazons way, Om Malik, goes Shakspear and sees this as Amazon’s attempt to inflict death to the Big Three by a thousand small cuts. Dan Farber, Mashable, John Ho Lee, Social Patterns, Conversation Rater, The Stalwart, and The Tech Beat all weigh in as well. And it’s not even the afternoon yet. [...]

  • Amazon Opens Up Alexa Search API, Revolutionizes Search Overnight

    Well, I hate to say I was right. But I was right! Sort of. Amazon is turning its Alexa search engine into a web service. It’s called the Alexa Web Search Platform. John Battelle writes:
    Alexa, an Amazon-owned search company started by Br…

  • Amazon gibt Alexa Database frei (Alexa Websearch Platform)

    The Alexa Web Search Platform provides public access to the vast web crawl collected by Alexa Internet. Users can search and process billions of documents—even create their own search engines – using Alexa’s search and publication to…

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>