Newsweek on white hat and black hat search engine optimization

via Seomoz:

This week’s Newsweek (December 12, 2005) features an article on white hat vs black hat search engine optimization. Among other things, it’s interesting that the topic has made it into the mainstream media.

A “black hat” anecdote:

Using an illicit software program he downloaded from the Net, he forcibly injected a link to his own private-detectives referral site onto the site of Long Island’s Stony Brook University. Most search engines give a higher value to a link on a reputable university site.

The site in question appears to be “www.private-detectives.org”, still currently #1 at MSN and #4 at Yahoo for searches on “private detectives”. It appears to have been sandboxed on Google.

Another interesting post at Seomoz features comments from “randfish” and “EarlGrey”, the two SEO consultants interviewed by Newsweek on the merits of “White Hat” vs “Black Hat” search engine optimization, and gives further perspective on the motivation and outlook of the two approaches.

In some ways one can think of the difference between search engine optimization approaches as a “trading” approach vs a “building” approach to investment. The “Black Hat” approach articulated in the Seomoz article tends to focus purely on a tactical present cash return to the operator, while the “White Hat” approach presumes that the operator will realize ongoing future value by developing a useful information asset and making it visible to the search engines. This makes an implicit assumption that the site itself offers some unique and valuable information content, which can’t usually be the case in the long run.

From an information retrieval point of view, I’m obviously in the latter camp of thinking that identifying the most relevant results for the search user is a good thing. However, the black hat approach makes perfect sense if you consider it in terms of optimizing the short term value return to the publisher (cash as information), while possibly still presenting a useable information return to the search user. This is especially the case for commodity information or products, in which the actual information or goods are identical, such as affiliate sales.

I’m a little curious about the link from Stony Brook University. I took a quick look but wasn’t able to turn up a backlink. One of the problems with simply relying on trusted link sources is that they can be gamed, corrupted, or hacked.

See also: A reading list on PageRank and search algorithms

Update 12-12-2005 00:30 PST: Lots of comments on Matt Cutt’s post, plus Slashdot

A reading list on PageRank and search algorithms

If you’re subscribed to the full feed, you’ll notice I collected some background reading on PageRank, search crawlers, search personalization, and spam detection in the daily links section yesterday. Here are some references that are worth highlighting for those who have an interest in the innards of search in general and Google in particular.

  • Deeper Inside PageRank (PDF) – Internet Mathematics Vol. 1, No. 3: 335-380 Amy N. Langville and Carl D. Meyer. Detailed 46-page overview of PageRank and search analysis. This is the best technical introduction I’ve come across so far, and it has a long list of references which are also worth checking out.
  • Online Reputation Systems: The Cost of Attack of PageRank (PDF)
    Andrew Clausen. A detailed look by at the value and costs of reputation and some speculation on how much it costs to purchase higher ranking through spam, link brokering, etc. Somewhere in this paper or a related note he argues that raising search ranking is theoretically too expensive to be effective, which turned out not to be the case, but the basic ideas around reputation are interesting
  • SpamRank – Fully Automatic Link Spam Detection – Work in progress (PDF)
    András A. Benczúr, Károly Csalogány, Tamás Sarlós, Máté Uher. Proposes a SpamRank metric based on personalized pagerank and local pagerank distribution of linking sites.
  • Detecting Duplicate and near duplicate files – William Pugh presentation slides on US patent 6,658,423 (assigned to Google) for an approach using shingles (sliding windowed text fragments) to compare content similarity. This work was done during an internship at Google and he doesn’t know if this particular method is being used in production (vs some other method).

I’m looking at a fairly narrow search application at the moment, but the general idea of using subjective reputation to personalize search results and to filter out spammy content seems fundamentally sound, especially if a network of trust (social or professionally edited) isn’t too big.