Mobile Search = US$1 billion 411 calls per year

IMG_4937

Today, mobile search in the US = $1 billion per year in 411 calls.

Well, that’s a gross oversimplification, but it gets to one of the main points from this evening’s sold-out, standing-room-only joint Search SIG and Mobile Monday session on Mobile Search, held at Google this evening.

The panel discussion was moderated by David Weiden from Morgan Stanley, with panelists

  • Elad Gil (Google)
  • Mihir Shah (Yahoo)
  • Mark Grandcolas (Caboodle)
  • Ted Burns (4info)
  • Jack Denenberg (Cingular)

Jack Denenberg from Cingular was the lone representative from the carrier world. During the panel, he made the observation that 411 “voice search” was at least 2-3x the volume of SMS and WAP-based search, and that Cingular (US) is doing around 1 million 411 calls per day at an average billing cost of between $1.25 to $1.40. All US carriers combined do around 3 million 411 calls per day.

This works out to more than $1 billion per year in 411 fees!

Other comments from Jack: Wireless 411 use is still rising. Wireline 411 use is starting to decline. Today mobile search is based on user fees (airtime) and search fees (411). In the future, we may see some movement toward advertiser listing fees. The carrier provides a channel for business to communicate with prospective customers.

Pithy comment from the audience: “I see 4 guys trying to make the best of a bad situation, and 1 guy creating a bad situation.” More comments on why no location based services, why SMS is still limited to 160 characters, and partner-unfriendly pricing. Why $1.40 for an address when Google is free (except for $.20 for data fees).

Mihir from Yahoo mentioned that they have been running trials of paid mobile search listings using Overture back end on Vodafone in the UK, and it’s going well, so the mobile paid listings are starting to happen already.

Lot of comments about user interfaces being too complicated. Mark from Caboodle says that for each click into the menu system, 50% of the users will give up trying to buy something, such as a ringtone. They have a system for simplifying this, but unfortunately they weren’t able to get his demo onto the big screen so we never got to see it live.

Some discussion on the carriers generally having a preference to simplify the user experience by giving them a single, (carrier-branded) aggregated search, taking advantage of the proprietary clickstream and data traffic information available to them through the data billing system.

The Yahoo mobile applications seemed the most plausibly useful. The send-to-phone feature allows you to send driving directions and other info from Yahoo Local to your phone. The mobile shopping application could be used for price comparison while shopping in person (although this doesn’t do much for the online merchants today). Some of their SMS search result messages allow you to reply to get an update, so you can send an SMS with “2″ in it to get an update of a previous weather forecast.

For his demo, Jack ran an impromptu contest among 3 audience volunteers, to see who could find the address of the New York Metropolitan Museum of Art the fastest, using voice (411), 2-way SMS search, or WAP search. All of them got the answer, although 411 was the quickest by perhaps a minute or a bit less.

IMG_4942

The 4info demo looked interesting. They use a short code (44636) SMS with the query text in the body, geared toward sports, weather, addresses. They also provide recipes and pointers to local bars if you key in the name of a drink. Someone in the audience pointed out that searching 4info for “Linux” returned a drink recipe, which Ted reproduced on the big screen. Not sure what the drink promo or the Linux recipe was about.

During open mike time, someone (mumble) from IBM did a 15 second demo of their speech-activated mobile search, in which he looked up the address of the New York Metropolitan Museum of Art by speaking at the phone, and the results were returned in a text page. Very slick.

Lots of interesting data-oriented mobile search projects are building for the future. But $1 billion in 411 calls right now is pretty interesting too. Who makes those 411 calls? Are they happy paying that much?

I avoid calling 411 because I feel ripped off afterwards, and often get a wrong number or address anyway. But it’s not always possible to key in a search.

Photos at Flickr

Image Search Referrals Increasing Lately

In addition to the huge volume of referrer spam in the past week, I’ve also noticed I’m starting to get referrals from Google image search. Not sure if something in the index changed, or if this is just a typical delay between content being spidered and turning up in the search results. I’ve noticed that blog postings can turn up in the main Google search results as well as in Google Blog Search in under a day. I’m not sure how quickly images from the posts get added to the image search index, though. I don’t recall seeing any traffic via Google image search before around the end of September 2005. The posts that are picking up the most traffic are from July/August 2005, and are a bit random.

Potter Puppet Pals

Blackdog Linux

Mona Lisa at the Salle des Etats

Quick Take on Google Reader

My quick notes on trying out Google Reader:

Summary:

  • The AJAX user interface is whizzy and fun, and is similar to an e-mail reader.
  • Importing feeds is really slow.
  • Keyboard navigation shortcuts are great.
  • Searching through your own feeds or for new feeds is convenient using Google
  • I hate having a single item displayed at a time.
  • “Blog This” action is handy, if you use Blogger. They could easily make this go to other blogging services later.
  • This could be a good “starter” service for introducing someone to feed readers, but
  • No apparent subscription export mechanism
  • Doesn’t deal well with organizing a large number of feeds.

More notes:
I started importing the OPML subscription file from Bloglines into Google Reader on Friday evening. I have around 500 subscriptions in that list, and I’m not sure how long it ended up taking to import. It was more than 15 minutes, which was when I headed off to bed, and completed sometime before this afternoon.

I love having keyboard navigation shortcuts. The AJAX-based user interface is zippy and “fun”. Unfortunately, Google Reader displays articles one at a time, a little like reading e-mail. I’m in the habit of scanning sections of the subscription lists to see which sections I want to look at, then scanning and scrolling through lists of articles in Bloglines. Even though this requires mousing and clicking, it’s a lot faster than flashing one article at a time in Google Reader.

I don’t think the current feed organization system works on Google Reader, at least for me. My current (bad) feed groupings from Bloglines show up on Google Reader as “Labels” for groups of feeds, which is nice. It’s hard to just read a set of feeds, though. Postings show up in chronological order, or by relevance. This is totally unusable for a large set of feeds, especially when several of them are high-traffic, low-priority (e.g. Metafilter, del.icio.us, USGS earthquakes). If I could get the “relevance” tuned by context (based on label or tag?) it might be useful.

When you add a new feed, it starts out empty, and appears to add articles only as they are posted. It would be nice to have them start out with whatever Google has cached already. I’m sure I’m not the first subscriber to most of the feeds on my list.

On the positive side, this seems like a good starting point for someone who’s new to feed readers and wants a web-based solution. It looks nice, people have heard of Google, and the default behaviors probably play better with a modest number of feeds. Up to this point, I’ve been steering people at Bloglines in the past, and more recently pointing them at Rojo.

I wish the Bloglines user interface could be revised to make it quicker to get around. I really like keyboard navigation. I can also see some potential in the Google Reader’s listing by “relevance” rather than date listing, and improved search and blogging integration. I’m frequently popping up another window to run searches while reading in Bloglines.

Google Reader doesn’t seem like it’s quite what I’m looking for just now, but I’ll keep an eye on it.

Wishful thinking:
I think I want something to manage even more feeds than I have now, but where I’m reading a few regularly, a few articles from a pool of feeds based on “relevance”, and articles from the “neighborhood” of my feeds when they hit some “relevance” criteria. I’d also like to search my pool of identified / tagged feeds, along with some “neighborhood” of feeds and other links. I think a lot of this is about establishing context, intent, and some sort of “authoritativeness”, to augment the usual search keyword matching.

Ungoogleable to #1 in six months

Despite being online for a very long time by today’s standards (~1980), I have been difficult to find in search engines until fairly recently.

This basically has 4 reasons:

  1. The components of my name, “Ho”, “John”, and “Lee” are all short and common in several different contexts, so there are a vast number of indexed documents with those components.
  2. Papers I’ve published are listed under “Lee, H.J.” or something similar, lumping them together with the thousands of other Korean “Lee, H.J.”s. Something like 14% of all Koreans have the “Lee” surname, and “Ho” and “Lee” are both common surnames in Chinese as well. Various misspellings, manglings and transcriptions mean that old papers don’t turn up in searches even when they do eventually make it online.
  3. Much of the work that I’ve done resides behind various corporate firewalls, and is unlikely to be indexed, ever. A fair amount of it is on actual paper, and not digitized at all.
  4. I’ve generally been conscious that everything going into the public space gets recorded or logged somewhere, so even back in the Usenet days I have tended to stay on private networks and e-mail lists rather than posting everything to “world”.

Searching for “Ho John Lee” (no quotes) at the beginning of 2005 would have gotten you a page full of John Lee Hooker and Wen Ho Lee articles. Click here for an approximation. With quotes, you would have seen a few citations here and there from print media working its way online, along with miscellaneous RFCs.

Among various informal objectives for starting a public web site, one was to make myself findable again, especially for people I know but haven’t stayed in contact with. After roughly six months, I’m now the top search result for my name, on all search engines.

As Steve Martin says in The Jerk (upon seeing his name in the phone book for the first time), “That really makes me somebody! Things are going to start happening to me now…”

Wired this month on people who are Ungoogleable:

As the internet makes greater inroads into everyday life, more people are finding they’re leaving an accidental trail of digital bread crumbs on the web — where Google’s merciless crawlers vacuum them up and regurgitate them for anyone who cares to type in a name. Our growing Googleability has already changed the face of dating and hiring, and has become a real concern to spousal-abuse victims and others with life-and-death privacy needs.

But despite Google’s inarguable power to dredge up information, some people have succeeded — either by luck, conscious effort or both — in avoiding the search engine’s all-seeing eye.

Yahoo Site Explorer

Yahoo Search Blog announces Yahoo Site Explorer a handy alternative to searching with “site:” or “link:” to see what’s getting indexed and linked at Yahoo Search. It’s billed as a work in progress, at the moment you can:

  • Show all subpages within a URL indexed by Yahoo!, which you can see for stanford.edu, here. You can also see subpages under a path, such as for Professor Knuth’s pages.
  • Show inlinks indexed by Yahoo! to a URL, such as for Professor Knuth’s pages, or for an entire site like stanford.edu.
  • Submit missing URLs to Yahoo

There is also a web service API for programmatic queries.

Discussion at Search Engine Watch, Webmaster World.

Danny Sullivan at Search Engine Watch posted a synopsis on the SEW Forum:

I’ve done a summary of things over here on the blog, which also links to a detailed look for SEW paid members.

Here are my top line thoughts:

You can see all pages from all domains, one domain, or a directory/section within a domain. Thumbs up!

You can NOT pattern match to find all URLs from a domain. That would be nice.

You can see all links to a specific page or a domain. Thumbs up!

You can NOT exclude your own links, very unfortunately. Two thumbs down!

You can export data, but only the first 50 items, unfortunately. Thumbs down!

More wish list stuff:

Search commands such as link: aren’t supported, and I hope that might come.

You can get a feed of your top pages, but I want a feed of backlinks to inform me of new ones that are found. Site owners deserve just as much fun as blog owners in knowing about new links to them!

Some of the other posts discuss interesting things you can do with the existing “advanced search” options. I’ll have to try some out, both through Yahoo Site Explorer and using some of the suggested link queries which apparently can’t be done yet through Site Explorer.

Search Attenuation and Rollyo

“Search attenuation” is a new term to me, but seems like a good description of the process of filtering feeds and search results to a manageable size. As more content becomes available in RSS, I tend to subscribe to anything that looks interesting, but am looking for improved methods for searching and filtering content within that set.

Catching up a little on the feed aggregator, I see an article at O’Reilly about Rollyo, a new “Roll Your Own Search Engine” site from Dave Pell of Davenetics.

ROLLYO is the latest mind warp from Dave Pell. Rollyo affords anyone the ability to roll their own Yahoo!-powered search engine, attenuating results to a set of up to 25 sites. And while the searchrolls (as they’re called) you create are around a particular topic (e.g. Food and Dining), they are also attached to a real person (e.g. Food and Dining is by Jason Kottke). The result is a topic-specific search created and maintained by a trusted source.

Rolly’s basic premise is one I’ve been preaching of late: attenuation is the next aggregation …

Recently, I’ve been looking at this from a related angle, which is how to infer topical relevance among people or sources you trust based on links, tagging, and search, and named entity discovery. People are already linking, tagging, and searching, so some data is available as a byproduct of work that they’re already doing. On the other hand, if enough people you trust take the additional step of explicitly declaring the sources they think are relevant, this would help a lot.

See also Memeorandum, Findory, Personal Bee.

More on this from TechCrunch

Dredging for Search Relevancy

I am apparently a well trained, atypical search user.

Users studied in a recently published paper users clicked on the top search result almost half the time. Not new, but in this study they also swapped the result order for some users, and people still mostly clicked on the top search result

I routinely scan the full page of search results, especially when I’m not sure where I’m going to find the information I’m looking for. I often randomly click on the deeper results pages as well, especially when looking for material from less-visible sites. This works for me because I’m able to scan the text on the page quickly, and the additional search pages also return quickly. This seems to work especially well on blog search, where many sites are essentially unranked for relevancy.

This approach doesn’t work well if you’re not used to scanning over pages of text, and also doesn’t work if the search page response time is slow.

On the other hand, I took a quick try at some of the examples in the research paper, and my queries (on Google) generally have the answer in the top 1-2 results already.

From Jakob Nielsen’s Alertbox, September 2005:

Professor Thorsten Joachim and colleagues at Cornell University conducted a study of search engines. Among other things, their study examined the links users followed on the SERP (search engine results page). They found that 42% of users clicked the top search hit, and 8% of users clicked the second hit. So far, no news. Many previous studies, including my own, have shown that the top few entries in search listings get the preponderance of clicks and that the number one hit gets vastly more clicks than anything else.

What is interesting is the researchers’ second test, wherein they secretly fed the search results through a script before displaying them to users. This script swapped the order of the top two search hits. In other words, what was originally the number two entry in the search engine’s prioritization ended up on top, and the top entry was relegated to second place.

In this swapped condition, users still clicked on the top entry 34% of the time and on the second hit 12% of the time.

For reference, here are the questions that were asked in the original study (182KB, PDF)

Navigational

  • Find the homepage of Michael Jordan, the statistician.
  • Find the page displaying the route map for Greyhound buses.
  • Find the homepage of the 1000 Acres Dude Ranch.
  • Find the homepage for graduate housing at Carnegie Mellon University.
  • Find the homepage of Emeril – the chef who has a television cooking program.

Informational

  • Where is the tallest mountain in New York located?
  • With the heavy coverage of the democratic presidential primaries, you are excited to cast your vote for a candidate. When are democratic presidential primaries in New York?
  • Which actor starred as the main character in the original Time Machine movie?
  • A friend told you that Mr. Cornell used to live close to campus – near University and Steward Ave. Does anybody live in his house now? If so, who?
  • What is the name of the researcher who discovered the first modern antibiotic?

Tagging and Searching: How transparent do you want to be?

This note captures some thoughts in progress, feel free to chip in with your comments…

Here’s a feature wish list for link tagging:

  • Private-only links – only I can see them at all
  • Group-only links – only members of the group can see them
  • Group-only tags – only members of the group can see my application of a set of tags
  • Unattributed links – link counts and tags are visible to the public, but not the contributor or comments

Tagged bookmarking services such as del.icio.us allow individuals to save and organize their own collection of web links, along with user-defined short descriptions and tags. This is already convenient for the individual user, but the interesting part comes from being able to search the entire universe of saved bookmarks by user-defined tags as an alternative or adjunct to conventional search engines.

Bits of collective wisdom embodied in a community can be captured through aggregating user actions representing their attention, i.e. the click streams, bookmarks, tags, and other incremental choices that are incidental to whatever they happened to be doing online. The result of a tag search are typically much smaller, but are often more focused or topically relevant than a search on Google or Yahoo.

It’s also interesting to browse the bookmarks of other people who have tagged or saved similar items. To some extent the bookmark and tag collection can be treated as a proxy for that person’s set of interests and attention.

In a similar fashion, clicking on a link (or actually purchasing an item), can be treated as a indication of interest. This is part of what makes Google Adsense, Yahoo Publisher Network, and Amazon’s Recommendations work. The individual decisions are incidental to any one person’s experience, and taken on their own have little value, but can be combined to form information sets which are mutually beneficial to the individual and the aggregator. Web 2.0 thrives on the sharing of “privately useless but socially valuable” information, the contribution of individuals toward a shared good.

In the case of bookmarking services, the exchange of values is: I get a convenient way to save my links, and del.icio.us gets my link and tag data to be shared with other users

One problem I run into regularly is that everything is public on del.icio.us. For most links I add, I am happy to share them, along with the fact that I looked at them, cared to save it, and any comments and tags I might add. Del.icio.us starts out with the assumption that everyone who bookmarked something there would want to share. As I use it more regularly, though, I sometimes find situations where I want to save something, but not necessarily in public. Typically either

      a) don’t want to make the URL visible to the public, or
      b) don’t mind sharing the link, but don’t want to leave a detailed trail open to the public.

The first case, in which I’d like to save a link for my private use, is arguably just private information and shouldn’t actually be in a “social bookmarks” system to begin with. However, there is a social variant of the private link, which is when I’d like to share my link data with a group, but not all users. This might be people such as members of a project team, or family or friends. It’s analogous to the various photo sharing models, in which photos are typically shared to the public, or with varying systems of restrictions.

The second case, in which I’m willing to share my link data, but would like to do so without attribution, is interesting. In thinking about my link bookmarking, I find that I’m actually willing to share my link, and possibly my tag and comment data, but don’t want to have someone browse my bookmark list and find the aggregated collection there, as it probably introduces too much transparency into what I’m working on. At some point in time, it’s also likely that I would be happy to make the link data fully visible, tags, comments, and all, perhaps after some project or activity is completed and the presence of that information is no longer as sensitive.

The feature wish list above would address some of the not-quite-public link data problems, while continuing to accrete community contributed data. In the meantime, I’m still accumulating links back behind the firewall.

Another useful change to existing systems would be to aggregate tag or search results based on a selected set of users to improve relevance. This is along the lines of Memeorandum, which uses a selected set of more-authoritative blogs as a starting point to gauge relevance of blog posts. In the tagged search case, it would be interesting if I could select a number of people as “better” or “more relevant” at generating useful links, and return search results with ranking biased toward search nodes that were in the neighborhood of links that were tagged by my preferred community of taggers.

It’s possible to subscribe to specific tags or users on del.icio.us, but what I had in mind was more like being able to tag the users as “favorites” or by topic and then rank my search results based on their link and tag neighborhoods. I don’t actually want to look at all of their bookmarks all the time.

Something similar might also work with search result page clickthroughs. These sorts of approaches seem attractive, but also seem too messy to scale very well.

Unattributed links may be too vulnerable to spamming to be useful. One possible fix could be to filter unattributed links based on the authority of the source, without disclosing the source to the public.

I was at the Techcrunch meetup last night, didn’t have a chance to talk with the del.icio.us folks who were apparently around somewhere, but Ofer Ben-Shachar from Raw Sugar did mention that they were looking at providing some sort of group-only access option for their tagging system.

A lot of this could be hacked onto the existing systems to solve the end user problem easily, but some of the initial approaches that come to mind start to break the social value creation, and I think those could be preserved while making better provisions for “private” or “group” restrictions by working on the platform side.

Google Secure Access

via Om Malik:

Google seems to have developed a secure WiFi VPN software tool – Google Secure Access Client. The information can be found here. Google Rumors has all the details. To sum it up, what they are doing is giving away a VPN tool that takes some of the security risks out of open WiFi. Companies like JiWire and Boingo also have these type of secure WiFi software solutions. While on paper this sounds like a perfectly good deal, Inside Google says not so fast, and writes, “Google Secure Access has the same benefits for Google as Web Accelerator did, with fewer of the things that scared away people the first time.” They dig deep into the GSA privacy policy …

Another take at Inside Google:

Located at wifi.google.com, GSA connects you to a Google-run Virtual Private Network. Your internet traffic becomes encrypted when you send it out, decrypted by Google, the requested data downloaded by Google, encrypted and sent to you, and decrypted on your machine. This has the effect of protecting your traffic data from others who may want to access it. GSA’s FAQ describes it as a Google engineer’s 20% project

Google Secure Access FAQ

Google Blog Search – Referrers Working Now

Looks like Google Blog Search took out the redirects that were breaking the referrer headers.

Now the search keywords are visible again. Here’s a typical log entry:

xxx.xxx.xxx.xxx – - [15/Sep/2005:15:58:13 -0700]
“GET /weblog/archives/2005/09/15/podcasting-and-audio-search-at-sdforum-searchsig-september-2005/
HTTP/1.1″ 200 26981 “http://blogsearch.google.com/blogsearch?hl=en&q=odeo&btnG=Search+Blogs&scoring=d”
“Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.10) Gecko/20050716
Firefox/1.0.6″

Blogger Buzz says the redirect was in place during development to help keep the project under wraps.

Podcasting and Audio Search at SDForum SearchSig September 2005

IMG_4360 IMG_4361

Discussion and demos on podcasting and audio search last night at the inaugural SDForum Search SIG meeting.

I want to like podcasting, but to date, I haven’t really gotten into it, either as a listener or as a producer. In theory, I should be all over this, since it combines some of my favorite topics: media technology (mostly audio, but eventually video), internet publishing and distribution, and search (which seems to be under everything on the internet). In practice, I haven’t found a good fit in my daily routine yet, partly because I don’t have a long drive to work. I sometimes listen to music while I’m working, but often need to be on the phone. I’m sort of interested in some of the conversational programming, such as those on IT Conversations, but I can read similar content in a fraction of the time required to listen to a podcast segment, and I find that I can’t listen to them while I’m working as I either get distracted from my work or totally tune out the content. I tend not to use the iPod much either, for some of the same reasons.

All that aside, the demos of Loomia, Odeo, and Audioblog showed how rapidly the tools and services for creating and distributing podcasts and personal media are improving. They all provide directories and search services for finding podcasts, and are moving toward providing tools and hosting services for individuals to create and publish their own audio podcasts.

David Marks showed Loomia, which provides an extensive, personalizable directory of podcast feeds. The site features an inlined Flash audio player, so you can play the feeds directly in the browser page, which I’m trying out while I’m writing this. He also mentioned that their site makes use of the Dojo open source library for implementing AJAX features.

Photo by Niall Kennedy

Ev Williams showed the Odeo Studio application, which isn’t yet available on their web site. It puts a simple audio production app in the web page, allowing you to record with a computer microphone and mix in audio such as theme music, applause, and sound effects. Looks like fun, and removes another barrier for potential podcast creators who might not have the inclination to go find and learn to use an audio editor.

Eric Rice showed Audioblog‘s video clip publishing service, along with how to “dial in” and create a podcast by leaving a voice message. Although their site is called “Audioblog”, they are developing a lot of similar video capabilities. Their site will also transcode video from a variety of formats, including 3GP; this allows them to accept video uploads from mobile phones. They will host the media data, and it sounded like they were looking into handling media rights clearances with ASCAP and other artists agencies on behalf of their publishers sometime in the future.

The opening panel discussion, moderated by Doug Kaye, should turn up on IT Conversations in a while, I’ll try checking it out later.

As a bonus, I also met Munjal Shah and Tara Hunt in person afterwards. I’m looking forward to trying the Ojos image search alpha when they get it cooked enough.

Update 09-15-2005 20:01 PDT: added a photo of Odeo Studio from Niall Kennedy, plus a followup on iTunes and video podcasting

Update 10-20-2005 21:26 PDT: Links to the audio at IT Conversations are posted at Yahoo Search Blog

Google Blog Search – No Referrer Keywords?

Feature request to Google Blog Search team: please add search query info to the referrer string.

Lots of coverage this morning from people trying out Google Blog Search. (Search Engine Watch, Anil Dash, lots more)

I’m seeing some traffic from Google Blog Search overnight, but it looks like they don’t send the search query in the referrer. Here’s a sample log entry:

xxx.xxx.xxx.xxx – - [14/Sep/2005:00:51:09 -0700] “GET /weblog/archives/2005/09/14/google-blog-search-launches/ HTTP/1.1″ 200 22964 “http://www.google.com/url?sa=D&q=http://www.hojohnlee.com/weblog/archives/2005/09/14/google-blog-search-launches/” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4″

So there’s no way to know the original search query. I have a pretty good idea how the overnight traffic looking for the Google post got here, but there are also people landing on fairly obscure pages here and I’m always curious how they found them. I’m sure the SEO crowd will be all over this shortly.

There have been a number of comments that Google Blog Search is sort of boring, but I’m finding that there’s good novelty value in having really fast search result pages. Haven’t used it enough to get a sense of how good the coverage is, or how fast it updates, but it will be a welcome alternative to Technorati and the others.

Update 09-14-2005 14:01 PDT: These guys think Google forgot to remove some redirect headers.

Update 09-14-2005 23:25 PDT: Over at Blogger Buzz, Google says they left the redirect in by accident, will be taking them out shortly:

“After clicking on a result in Blog Search, I’m being passed through a redirect. Why?”
Sadly, this wasn’t part of an overly clever click-harvesting scheme. We had the redirects in place during testing to prevent referrer-leaking and simply didn’t remove them prior to launch. But they should be gone in the next 24 hours … which will have the advantage of improving click-through time.

Google Blog Search Launches

Google’s entry into blog search launched this evening, go try it out or read their help page.

This will be interesting competition for the existing blog search companies. It definitely responds fast at the moment, let’s see how it holds up when the next flash news crowd turns up…

via Niall Kennedy and Kevin Burton

Fun del.icio.us visualizations

SpaceNav del.icio.us visualizer
Revealicious (via Social Software Weblog):

Revealicious is a set of graphic visualisations for your del.icio.us account that allow you to browse, search and select tags, as well as viewing posts matching them.

There are three different visualization modes, SpaceNav, TagsCloud, and Grouper, which depict the relationship of tag use and frequency among your del.icio.us bookmark collection.

The Revealicious page also has links to a previous project by one of the authors, DeliciousSoup, and a post elsewhere with an extensive list of del.icio.us tools.

Del.icio.us is interesting / frustrating in that it has almost no user interface, but exposes enough of an API for 3rd parties to try building their own applications on top of the data.

I don’t find these visualizations particularly useful on my own bookmarks, but they point toward interesting ways of exploring large sets of tags and other link relationships. Plus they look cool.

Lazy Sheep considered harmful?

Rashmi just posted some thoughts about the Lazy Sheep bookmarklet.

From the Lazy Sheep page:

Using the tags and descriptions shared by other del.icio.us users, Lazy Sheep makes tagging a page a one-click operation. In order to best suit any user, Lazy Sheep also includes a comprehensive set of options that can be configured to your exact specifications.

Rashmi’s comments:

It makes some sense at the individual level – I can gain from the wisdom of the others, without doing any work. But even at the individual level, there are disadvantages. First, the auto-tags might not capture my idiosyncratic associations (reducing findability when I look for the article later on). Second, it replaces the self-knowledge with social knowledge. Instead of a moment of reflection on my current interests, I simply find out how others think about the topic. Social knowledge in the context of self-knowledge is a beautiful thing, mere social knowledge just encourages the sheep mentality (which is the point of the bookmarklet I guess).

At the social level (which is what worries me more), if enough people started doing this, the value of del.icio.us would be diluted. We would loose some of the richness of the longtail, and just reinforce what the majority is saying. The first few people who tagged the article would set the trend – others would merely follow.

I seem to be having a lot of conversations with people lately about tagging and group search. I think of the auto tagging embodied in Lazy Sheep as an amplifier for the biases of the first few taggers. A less problematic solution would be to only use your own tags as input to the Lazy Sheep, or perhaps to select some “similar-thinking” taggers as a starting point.

I’ve been thinking about something like the latter for building a better personal search and tagging system. I’d like to be able to bias the search results based on the attention choices of people I think might be relevant, not the entire world. On the other hand, I don’t want to give up my entire clickstream for public consumption.

An aside on the tagging bias issue: Hal Abelson mentioned to me the other day that “IRC” and “Mouse” are closely related in some tag relatedness searches, because “IRC” associated with “Chat”, and “Chat” in French is “Cat”, which related to “Mouse”.

In my case, I consciously tend not to look at what tags have already been applied, because I’m hoping in the future to apply some sort of clustering or other relatedness filters on my own bookmarks to improve searches if I eventually accumulate enough data and motivation.

I think auto tagging can be very helpful, but it might be like using PowerPoint templates: after a while everything starts turning out the same way if you’re not careful.

Google Purge – Destroying all Unindexed Information

Google Announces Plan To Destroy All Information It Can’t Index. (via Batelle’s Searchblog)

MOUNTAIN VIEW, CA—Executives at Google, the rapidly growing online-search company that promises to “organize the world’s information,” announced Monday the latest step in their expansion effort: a far-reaching plan to destroy all the information it is unable to index.

I haven’t looked at the Onion in a long time. Good fun…

The Inevitability of Blog Outsourcing

The blog outsourcing topic has rolled along while I’ve been spending the day at the Blog Business Summit, listening to discussions on commercializing blogs. There’s now a post about it (Outsourcing bloggers in China) at CNET, which turned up a few other skeptics, and it’s looking like the Blogoriented guys are probably a hoax.

Despite that, I also think it’s inevitable that we’ll see at least a couple of real projects along these lines within a year, not aimed at simulating teenaged girls, but rather at building blog networks, filled and buzzed by creating inexpensive original content and editing search feeds that target specific niches.

David Sifry at Technorati has a good summary on the growing problems of spam blogs and fake blogs, and all the search engines are likely to make progress against what are essentially the next generation of link farms. Unfortunately, as discussed in this afternoon’s sessions on web advertising and affiliate models, if you can get traffic, there’s potential for a lot of money to be made by simple manipulations of the system, at least until the search engines improve. Content picked up by the blog search engines gets indexed immediately, leaving a way around some of the the sandboxing and other mechanisms used by Google and others, and makes profitable links visible immediately.

It’s cheap and apparently effective to implement spam and fake blogs. I’ve noticed the volume of junk e-mail is decreasing, while the number of spam blogs in search results seems to be increasing. It’s going to take cooperation among multiple parties to fix this, but everyone recognizes this as a problem, so it’s going to get better. (Here’s Mark Cuban’s take.)

I think that a follow on issue is that genuinely “original” content, in the “first author” sense, rather than in the “new idea” sense, can be probably be reliably cranked out through a well defined process. Think of something like an Indian call center or coding shop crossed with a daily news bureau, supervised by an editor who picked topics with some guidance from Wordtracker, Google and others. You’d get low cost, original writing, around an editorially consistent, topically relevant set of themes, and perhaps even with some interesting domain expertise, all tuned to be informative and keyworded to be search engine friendly.

Many of the same processes used at Wipro, Infosys, and other software and BPO outsourcers could be adapted to this application. Why cheat the search engine rankings when you can just reduce the cost of production and actually receive ranking benefit when the search engines get better at filtering for contextually better results and get rid of the “really fake” blogs? The Weblogs Inc blog network model seems to be working so far – Jason Calcanis says they’ve just hit a $1M annual ad revenue rate. Reducing the content production costs can’t hurt. I’m sure they could apply some of these ideas, if they haven’t already, and if they don’t, some other new blog network will certainly try.

This approach to farming out the process-oriented writing tasks should apply equally to a number of periodicals, such as magazines and newspapers. The difference between the news content in many newspapers is already often just the local editor’s preferences on the AP or Reuters newsfeeds and what fit in between the committed ad inches.

I don’t think this sort of blog or content outsourcing would be “bad” or “evil” in the sense of creating lower quality content, at least in some topic domains, since a pool of skilled professionals already exists offshore, and is growing rapidly. If you got a good editor in place, it might even improve the overall quality of online content. It’s not misrepresentation, unless you tried to pass off your authors as being something they’re not. But I wouldn’t even bother with attempting the nuances of local US culture with a staff of offshore bloggers, despite the availability of cultural indoctrination programs they run call center trainees through. That would work about as well having US bloggers cover cricket or Bollywood gossip or Korean K-pop singers for their respective local audiences.

This seems to leave American pop culture as a secure niche for a while. Unfortunately, I’m incredibly bad at celebrity gossip. Although, now that I think about it, I did meet Cher once at her house in Malibu…

Putting on my evil genius hat, here’s a hypothetical approach for building an astroturfing blog empire, filled with posts from simulated teenaged (18-35) girls. Start by extracting common phrases, topics, and contexts from some LiveJournal and MySpace blogs. Next, build some auto-blogging agents resembling Weisenbaum’s Eliza program crossed with some modern chatterbots. Finally, set it loose on LiveJournal, Xanga, and MySpace and have it start forming its own blogrings and online cliques, responding to filtered inputs from comments, selected feeds, and topical news, biased for the current hot keywords and with statistically plausible content and linkage…any Emacs Lisp and SQL hackers want to take this on?

See also: Outsource your Blog, Reasons I Still Read Newspapers

Update 08-19-2005 12:32 – some discussion at My Heart’s in Accra

Update 08-27-2005 00:10 – See also Goofy algorithm generates web page about “Prostitute Phobia” (at BoingBoing), which comments on this site, which is one of a collection of automatically generated pages.

Google Search Result Page Changes?

google alternate search results page

Google seems to be trying out some alternate layouts for the search results pages. This morning, I got one page with just a small Google logo next to the text box, which keeps more results on the screen, and a couple of pages with a larger box of text ads at the top, which was bad, because it pushed the useful results down the page.

I hope they keep the small logo, without the big text ads at the top. The text ads at the top would probably generate some incremental revenue for Google, but hurts the usability. For me, this is partially because I’ve gotten used to Google’s page layout, so I can’t scan the results page as quickly.

Amazon A9 Maps with Block Photo View

A first version of Amazon A9′s photo mapping project is open for business at maps.a9.com.

The block-by-block view is available for selected US metro areas, and provides a street-level view of storefronts, houses, parks, and whatever else happened to be in view when they drove by.

Here are a few sample locations to try:

  • MIT Great Dome
  • 59th street side of Central Park, New York
  • Union Square, San Francisco
  • Unfortunately, there’s no easy way to bookmark a location yet, so saving a particular location requires a bit of trial and error on the street address once you come across an interesting view.

    via Batelle’s Searchblog

    Also, at Search Engine Watch Gary Price comments on the early coverage of Fargo, North Dakota:

    So, why Fargo? A couple of weeks ago A9′s CEO, Udi Manber, told Danny:

    “The reason we have Fargo is one of the engineers lives there. He took the equipment home and did the whole place in a day.”

    So, I’m thinking that if you don’t have an A9 engineer living in your small town, don’t expect to see Block View imagery anytime soon. (-:

    Adsense

    I’m doing a little experimenting with AdSense. So far most of my pages come up with ads for “Start your blog now” or “Sexy Girls & Sexy Guys”. It’s interesting to see which posts trigger a keyword match. I have observed a few posts that have switched from generic blog ads to a topical ad after a followup visit from the Mediapartners-Google crawler. You’d think that a post on the Blackdog Linux Server, the Yahoo-Alibaba deal, or visiting the Mona Lisa at the Louvre would trip a keyword or two.

    The banners are only on the single post templates at the moment, so you’ll need to click on a post to see them. There’s also a set of vertical text ads at the bottom of the sidebar. I can tell I’m probably going to end up starting on a round of site revisions by the time I’m done with this, although I’m just interested in getting a better handle on the advertising and affiliate space at the moment.

    Update: 08-15-2005 23:58 – At least this post has gotten tagged with Adsense ads. It will be interesting to see which pages actually trigger clickthroughs, vs which pages get reasonable keyword tags from Adsense.

    Page 3 of 512345