Randomly exploring the long tail of search results

I sometimes click on a random “deep” search result page to see if anything interesting turns up, because of the limitations of popularity and PageRank for some queries.

Paul Kedrosky points at a recent paper from CMU which suggests randomly mixing in some low ranking pages may improve search results over time.

Unfortunately, the correlation between popularity and quality
is very weak for newly-created pages that have few
visits and/or in-links. Worse, the process by which new,
high-quality pages accumulate popularity is actually inhibited
by search engines. Since search engines dole out
a limited number of clicks per unit time among a large
number of pages, always listing highly popular pages at
the top, and because users usually focus their attention on
the top few results, newly-created but high-quality
pages are “shut out.”

We propose a simple and elegant solution to
this problem: the introduction of a controlled
amount of randomness into search result ranking
methods. Doing so offers new pages a chance
to prove their worth, although clearly using too
much randomness will degrade result quality and
annul any benefits achieved. Hence there is a
tradeoff between exploration to estimate the quality
of new pages and exploitation of pages already
known to be of high quality. We study this tradeoff
both analytically and via simulation, in the context
of an economic objective function based on
aggregate result quality amortized over time. We
show that a modest amount of randomness leads
to improved search results.

Link:
Shuffling a Stacked Deck: The Case for Partially
Randomized Ranking of Search Engine Results
,

Will Google grow at this rate forever? No? Then DIE!!

Today was a moderately exciting or irritating day to be a investor in public technology companies. Google’s CFO, George Reyes, apparently forgot that he was webcasting to a public group of investors rather than conferencing with an in-house team at the Googleplex during the Q&A session at the Merrill Lynch Internet, Advertising, Information, & Education conference: (Yahoo/AP News)

Q: Looking back to Q3 2005, was there anything in there that was maybe sort of one-time in nature that accounted for such strong revenue growth…?

A: So we went through a period of probably 18 months where we thought we had…well, let me characterize it…we had what was called a RevForce initiative–Revenue Force–which was really a team of really very bright technical engineers that were trying to tweak and optimize the ad system, and not–you know in very very responsible ways [Don't Be Evil!]–and that sort of paid off nicely with the fruits of that labor.

And what’s happened since then is that we got so good and so efficient at that back then that really most of what’s left is just organic growth, which means you have to grow your traffic and your have to grow your monetization.

But so, I think, we’re now, clearly our growth rates are slowing. And you see that each and every quarter. And we’re going to have to find other ways, you know, to monetize the business.

Later in the Q&A there’s something about the “law of large numbers” ultimately limiting growth due to running out of people to look at advertising. These are high class problems to have, and these sound like perfectly intelligent comments for an internal coffeetalk or private discussion. But when your stock is trading at 72x earnings, it’s a bad thing when the CFO says “growth is slowing” to a room of investors looking for extreme growth. The response is going to be “shoot first and figure it out later”, which is what happened this morning.

Reminds me of a scene in Ghostbusters:

Gozer: Are you a God?
Ray: No.
Gozer: Then — DIE!!

Winston: Ray, when someone asks if you’re a God”, you say YES!


How big is the growth rate? Pulling some data from Google’s IR site, this graph shows GOOG’s quarterly gross revenue growth for 2003-2005. The maroon line is Adsense sites, the light blue line is for Google-owned sites, and the dark blue line is the total.

One simplistic lower bound for future growth at Google would be to assume that it tracks the overall growth of internet use. I’ve inserted an additional blue line just above 4%, which is a rough estimate of the overall growth rate of the internet. I haven’t tried to find detailed data, this is from Jakob Nielsen’s Alertbox, which cites an 18% annualized growth rate from 2002 through 2005.

“We are getting to the point where the law of large numbers start to take root,” Reyes said Tuesday. “At the end of the day, growth will slow. Will it be precipitous? I doubt it.”

Google issued a press statement late in the afternoon:

As we have stated before, monetization improvements will continue to be a key factor in driving future revenue growth. We still see significant opportunities to improve monetization and intend to continue to focus our efforts in this area.

Moreover, as we have stated in our SEC filings, our revenue growth rate has generally declined over time and we expect that it will continue to do so as a result of the difficulty of maintaining growth rates on a percentage basis as our revenues increase to higher levels.

Hey, how’s that GBuy project going, anyway…

Webcast of the conference presentation (registration required)

Henry Blodget has a number of interesting posts on Google, including why he doesn’t own it, approaches to valuation, the most recent earnings, and today’s adventures.

The Google analyst day coming up this Thursday should be pretty interesting. Might be worth trying to catch the webcast. Bet George is getting some extra practice in.

Harmony and Disharmony – Organizational issues in Al-Qaida and startups

There’s an interesting new report out today from the Combating Terrorism Center at West Point (the US Military Academy), titled “Harmony and Disharmony: Exploting Al-Qa’ida’s Organizational Vunerabilities“, which has some useful insights for entrepreneurs and corporate managers as well as for those dealing with global jihadist movements or with a general interest in global security issues.

The report is based on a collection of captured documents which have been recently declassified, and examines some of the strengths and weaknesses of the Al-Qa’ida organizational structure. The merits of a 21st-century, networked, mobile, internet-enabled insurgency have been observed elsewhere at length, as summarized by James Na at Korea Liberator:

Martin van Creveld of Hebrew University, the author of the highly influential Transformation of War who has been lauded (including by me) as a leading prophet of military transformation, even went on to suggest that the small/weak would always beat the big/strong in a long war. (The stronger side is more constrained in methods; it also loses morale more rapidly from inability to defeat the weak completely over a long period of time; on the other hand, the weaker side often enjoys a more flexible, networked organization, and has a faster decision making cycle, i.e. the OODA loop).

The captured documents (available online in both original Arabic and translated English) have a remarkably familiar feel to them. Take out the parts about politics, religion, and carrying out jihad, and it looks kind of like an odd startup, with position descriptions (“must have work experience of no less than 5 years and have complete military operational experience in the battlefront and bases”), employment contracts (“vacation requests must be submitted two and a half months before the travel date”), and bylaws (“Goals – To spread the feeling of Jihad throughout the Muslim nation”).

Part of what makes the report interesting is that it’s based on Al-Qa’ida’s own self-assessment of what’s working and what isn’t working. Here are some sample items from a post-mortem summary of Al-Qa’ida’s experience in Syria:

  1. Absence of an advanced comprehensive plan and strategy
  2. The faithful mujahideen were spread among numerous organizations
  3. Failure to explain the mujahid revolutionary theory and clarify it’s objectives on an ideological level
  4. Low level of religious instruction and scarcity of revolutionary and political awareness
  5. Dependence on quantity after the 1st blow did away with the quality
  6. Weak public relations campaign both inside and out
  7. Dependence of the mujahideen on outside sources for support instead of being self-sufficient
  8. Getting bogged down in long term gang warfare unsuitable for the country
  9. Moving out of the country for an extended period of time, losing touch with the masses, and the decline of the religious and revolutionary level among the members
  10. Not benefiting from the Islamic and international gang warfare experiences
  11. Dealing with the neighboring regimes as if they were permanent supporters of jihad
  12. Operating publicly was a grave error
  13. Deficiency of military operations on the outside and failure to deter the enemy and their friends
  14. No planning for the aftermath of the regime
  15. Not rallying around the religious scholars and benefiting from them

A lot of this looks like the “before” part of a management consulting project.

Some items here remind me of Noel Tichy’s views on management, on the need for aligning ideas and values to achieve effective action within the organization. At the same time, many of their operational problems are linked to “agency” problems. This is when individuals or affliates have an incentive to do something in their own interest rather than those of the organization, and which get worse in the presence of personal risk and operational secrecy. This tends not to happen as much in companies, but there are still spectacular failures from time to time (think Enron’s SPEs).

If you’re interested in thinking about startup organizations and competition from a very different perspective, check it out.

Update 02-14-2006 23:37 PST: You may also be interested in “Unrestricted Warfare“, on asymmetric warfare, a 1999 paper by senior Chinese PLA officers, and Scott Maxwell’s recent series of posts, “How David can beat Goliath“.

Update 03-08-2006 10:58 PST: You may be interested in “Stealing Al-Qaida’s Playbook” which reviews other writings from active jihadists, also from the Combating Terrorism Center, although it’s probably less useful in a business context than the ideas on asymmetric warfare.

25 years of the HP12C

hp12c
Today’s Wall Street Journal has an ad from HP noting the 25th anniversary of the HP12C calculator.

Unlike most contemporary personal computing technology, the old HP calculators have been nearly indestructable and are utterly reliable. This may have limited the market for HP calculators, in that there aren’t any consumables and there isn’t much of a replacement cycle either, but it’s a relic of the old-school HP that also made indestructable electronic bench equipment and atomic clocks (and mostly turned into Agilent). HP still seems to sell enough new units to keep them in production.

I’m not sure exactly how old my calculator is at this point, but it dates back to some time in the early 80′s, in the days before personal computers and ubiquitous internet access on college campuses, when being able to run repeated calculations without heading to the computer lab was both a luxury and a competitive advantage. At the time I also had an HP 15C and 16C, which were well-used in various projects before going on “permanent loan” years ago.

At this point my remaining 12C has been around the world several times, and the batteries haven’t been changed since sometime around the dot-com boom.

Some very good calculator software applications (including emulations of various HP calculators) are now freely available for PCs, and nearly-disposable plastic calculators are often distributed as promotional novelties.

I suspect that calculators like the 12C may be turning into something like fine pens. There’s little intrinsic, functional rationale for them at this point, but I enjoy using it nonetheless. It turns on and off instantly without a fuss, it is dense and substantial without being too heavy, has the best keyclick feel ever, and is a much better at being a calculator than a cellphone, PDA, or notebook computer is ever likely to be (…once you learn RPN). Like everyone else, I often write with a word processor of some sort, but I like to draft on paper from time to time, because writing with a good pen can make you think differently than typing into a display. I find that working with calculator and paper can have a similar feel. Sometimes computer productivity tools are better at creating the appearance of substance than at facilitating the creation of actual substance.

Google and magazine covers as a contrary indicator

Is Google headed for a downturn? Not only is it featured in a generally negative cover article in this week’s Barron’s, but now it’s featured on the cover of Time as well. These magazines cater to very different audiences, so turning up on both at the same time could be considered a sign that Google is reaching a peak of sorts on both the financial and general cultural fronts.

There’s a long tradition of things going badly for companies and people after getting this sort of high profile magazine cover treatment. If Google turns up next on the cover of People or Entertainment Weekly they’re probably doomed…

Update 02-12-2006 18:31 PST: John Battelle suggests that having made the cover of Time, Google has “jumped the shark”, while Matt Cutts offers a recent historical perspective of Google’s non-shark-jumping behavior while simultaneously demonstrating effective link baiting technique.

I don’t consider myself an expert on shark-jumping, but I do think that hitting the covers of Barrons and Time is qualitatively different than the counter-examples that Matt offers. Google is transitioning out of being loved for being better, new, and whizzy, and into a stage where people expect it to “just work”. Google has gotten large enough that people are developing a love/hate relationship with it (and web services in general) like they have with e-mail, and where the discussion about privacy, media, and commerce is just starting to get some critical attention from people outside tech land.

Dae Han Min Kook!

dae-han-min-kook costa-rican-section

Yesterday I went to see an exhibition match between the Korean and Costa Rican National Teams at the Oakland Coliseum. These are basically training games for the World Cup series starting later this year.

The Korean team did unexpectedly well in the last World Cup series in 2002, making it all the way to the semifinals, which precipitated huge street celebrations and instant celebrity status for the entire team. My wife, who generally has no interest in organized sports, was getting up at 3 in the morning to watch the games on Telemundo, which is representative of the level of interest among the general Korean community.

It’s fascinating to me to see that many Koreans in one place. As you can see in the pictures, the Korean fans mostly wore red (the team is called the “Red Devils”). Many people also had those red plastic things which seem to be mostly for the clapping part of the cheer “Dae Han Min Kook – clapclap clapclapclap”. It’s extremely loud when it gets going, and very impressive. I enjoyed the fact that everyone from young kids to elderly halmonis and harobogis were there and having an enthusiastic time together. Judging from the vehicles in the parking lot yesterday, some of the Korean churches in the area organized carpools for their members to the game in church vans.

There was a much smaller section of Costa Rican fans. Costa Rica won the match, 1-0, which gave them something to cheer about too, but the Korean side appeared to play better overall, with about 10 attempts on goal (of which two bounced off the post) vs 1 for the Costa Ricans, and seemed to have the ball most of the time.

Next week the Korean team is playing the Mexican National Team in Los Angeles. I suspect there will be a larger turnout on behalf of the Mexican team down there, although there are also many more Koreans in L.A. than here in the Bay Area. The good news is, the match is being carried live on Telemundo, so we get to watch it up here.

Future of Web Apps workshop


I had been trying to arrange my schedule to get to the Future of Web Apps workshop this week in London, but sadly things didn’t work out. Actually, I didn’t even manage to get to last night’s SearchSIG to see edgeio’s first public demo here in the Bay Area, so perhaps it’s not surprising I couldn’t get a trip to the UK sorted out.

The good news is, there’s a conference wiki with lots of presentation notes, including comments on del.icio.us, discussions on how Flickr evolved, some thoughts on approaches to building discoverable URLs for data, the merits of Ruby on Rails. and a detailed discussion on the implementation approach and specific costs for the DropSend service.

A Digital Pantheon with D&D character alignments

This randomly turned up while I was looking into something else and will make absolutely no sense to you unless you have played Dungeons and Dragons at some point in your life.

Digital Pantheon

Lawful Good: Steve Jobs (Apple / Pixar)

Neutral Good: Larry Page/Sergey Brin (Google)

Chaotic Good: Linus Torvalds (Linux)

Lawful Neutral: Bill Gates (Microsoft)

True Netural: Peter Norton (Norton Utilities / Antivirus)

Chaotic Neutral: Shawn Fanning (Napster)

Lawful Evil: Nobuyuki Idei (Sony)

Neutral Evil: Steve Case (AOL)

Chaotic Evil: Ruslan Ibragimov (Spammer / SoBig virus)

Original post at LiveJournal, with comments.

See also: Wikipedia entry on character alignment in role playing games.

Reverse engineering a referer spam campaign

It looks like someone’s launched a new referrer spam campaign today, there’s a huge uptick in traffic here. The incoming requests are from all over the internet, presumably from a botnet of hijacked PCs, but it looks like all of the links point to a class C network at 85.255.114 somewhere in the Ukraine.

It’s interesting to think a little about link spam campaigns and what opportunity the operators hope to exploit. Two major types of link spam on blogs are comment spam and referrer spam. My perception is that comment spam is more common. Most blogs now wrap outgoing links in reader comments with “rel=nofollow” to prevent comments links from increasing Google rank for the linked items, but the links are still there for people to click on.

Referrer spam is more indirect. It is created by making an HTTP request with the REFERER header set to the URL being promoted. Most of the time, this will only be visible in the web server log.

Here is a typical HTTP log entry:

87.219.8.210 	[04/Feb/2006:15:20:35 	-0800]
    GET 	/weblog/archives/2005/09/15/google-blog-search-referrers-working-now 	HTTP/1.1
    403 	- 	"http://every-search.com"

Some blogs and other web sites post an automatically generated list of “recent referrers” on their home page or on a sidebar. In normal use, this would show a list of the sites that had linked to the site being viewed. Recent referrer lists are less common now, because of the rise of referrer spam.

Referrer spam will also show up in web site statistic and traffic summaries. These are usually private, but are sometimes left open to the public and to search engines.

One presumed objective of a link spam campaign is to increase the target site’s search engine ranking. In general this requires building a collection of valid inbound links, preferably without the “nofollow” attribute. Referrer spam may be more effective for generating inbound links, since recent referrer lists and web site reports typically don’t wrap their links with nofollow.

The landing pages for the links in this campaign are interesting in that they don’t contain advertising at all. This suggests that this campaign is trying to build a sort of PageRank farm to promote something else.

The actual pages are all built on the same blog template, and contain a combination of gibberish and sidebar links to subdomains based on “valuable” keywords. Using the blog format automatically provides a lot of site interlinking, and they also have “recent” and “top referer” lists, which are all from other spam sites in the network.

It looks like the content text should be easy to identify as spam based on frequency analysis. Perhaps having a very large cloud of spam sites linking to each other along with a dispersed set of incoming referrer spam links makes the sites look more plausible to a search engine? These sites don’t appear to have any, but I have come across other spam sites and comment spam posts that have links to non-spam sites such as .gov and .edu sites, perhaps trying to look more credible to a search engine ranking algorithm. All the sites being on the same subnet makes them easier to spot, though.

Given that there aren’t that many public web site stat pages and recent referrer lists around, I’m surprised that referrer spamming is worth the effort. If the spam network can achieved good ranking in the Google and the other search engines, they can probably boost the ranking for a selected target site by pruning back some of their initial links and adding some links pointing at the sites that they want to promote. Affiliate links to porn, gambling, or online pharmacy sites must pay reasonably well for this to work out for the spammers.

More reading: A list of references on PageRank and link spam detection.

If you’re having referrer spam problems on your site, you may find my notes on blocking referer spam useful.

Here’s some sample text from “search-buy.com”:

I search-buy over least and and next train. Ne so at cruelty the search-buy in after anaesthesia difficulty general urinating. T pastry a ben for search-buy boy. An refuses trip search-buy romances seemed azusa pacific university ca. Stoc of my is and search-buy direct having sex teen titans. Kid philadelphiaa would and york search-buy. G search-buy wore shed i dads. obstacles future search-buy right had satire nineteenth. The that i ups this on search-buy least finds audio express richmond. have this window been wonderful me search-buy so. Surel in actually search-buy our boy deep franklin notions. An search-buy it of my has of. To at head boy that a search-buy. O james search-buy everywhere of but. Alread originate search-buy good about since.

Here are a few spam sites from this campaign and their IP addresses:

bikini-now.com          A       85.255.114.212
babestrips.com          A       85.255.114.229
search-biz.biz          A       85.255.114.245
bustytart.com           A       85.255.114.250
cjtalk.net              A       85.255.114.227
search-galaxy.org             A       85.255.114.252
moresearch.org             A       85.255.114.237

Here is the WHOIS output for that netblock:

% Information related to '85.255.112.0 - 85.255.127.255'

inetnum:        85.255.112.0 - 85.255.127.255
netname:        inhoster
descr:          Inhoster hosting company
descr:          OOO Inhoster, Poltavskij Shliax 24, Kharkiv, 61000, Ukraine
remarks:        -----------------------------------
remarks:        Abuse notifications to: abuse@inhoster.com
remarks:        Network problems to: noc@inhoster.com
remarks:        Peering requests to: peering@inhoster.com
remarks:        -----------------------------------
country:        UA
org:            ORG-EST1-RIPE
admin-c:        AK4026-RIPE
tech-c:         AK4026-RIPE
tech-c:         FWHS1-RIPE
status:         ASSIGNED PI
mnt-by:         RIPE-NCC-HM-PI-MNT
mnt-lower:      RIPE-NCC-HM-PI-MNT
mnt-by:         RECIT-MNT
mnt-routes:     RECIT-MNT
mnt-domains:    RECIT-MNT
mnt-by:         DAV-MNT
mnt-routes:     DAV-MNT
mnt-domains:    DAV-MNT
source:         RIPE # Filtered

organisation:   ORG-EST1-RIPE
org-name:       INHOSTER
org-type:       NON-REGISTRY
remarks:        *************************************
remarks:        * Abuse contacts: abuse@inhoster.com *
remarks:        *************************************
address:        OOO Inhoster
address:        Poltavskij Shliax 24, Xarkov,
address:        61000, Ukraine
phone:          +38 066 4633621
e-mail:         support@inhoster.com
admin-c:        AK4026-RIPE
tech-c:         AK4026-RIPE
mnt-ref:        DAV-MNT
mnt-by:         DAV-MNT
source:         RIPE # Filtered

person:         Andrei Kislizin
address:        OOO Inhoster,
address:        ul.Antonova 5, Kiev,
address:        03186, Ukraine
phone:          +38 044 2404332
nic-hdl:        AK4026-RIPE
source:         RIPE # Filtered

person:       Fast Web Hosting Support
address:      01110, Ukraine, Kiev, 20Á, Solomenskaya street. room 201.
address:      UA
phone:        +357 99 117759
e-mail:       support@fwebhost.com
nic-hdl:      FWHS1-RIPE
source:       RIPE # Filtered

Wireless at the car dealer

IMG_5830
More overhead of life today, since the house is temporarily shut down. This morning I’m at the car dealer for annual maintenance. Fortunately, they’ve remodelled since last year, added free wireless service and fixed the coffee machine which brews Starbucks on demand, so I’m having a relatively productive morning. Power is still up at the house so I can get to those systems and everything else is on hosted services already.

Take that, termites!

IMG_5821 IMG_5827

The rains a few weeks ago seems to have prompted the arrival of roving termites, so yesterday we vacated the house and filled it with Vikane. I spent part of the day working in the office while they were wrapping the house with a tent. By the time they finished, it produced an interesting lighting effect.
IMG_5824

Update 02-02-2006 20:55 PST:
Safe to re-enter?
We’re back in the house today, after the inspector declared it safe to re-enter. I’m vaguely disappointed that there aren’t more random dead bugs lying around the house, although the termites wouldn’t be visible in the walls and underground, and there actually were a few dead moths and stray houseflies here and there. Of course, now we’ve let in a whole new set of wandering insects, since we’ve had the windows and doors open all afternoon to air out the house.
IMG_5833
Here’s a good photo of some visible termite activity. Not what you want to see in your linen closet.

Consumables and the decline of recording studios

Today’s Wall Street Journal (January 24, 2006) has a short profile of Paul Motian, an outstanding jazz drummer who was part of the Bill Evans Trio in the early 1960s. (If you haven’t heard of Bill Evans and have any interest in jazz piano, I highly recommend checking out their recordings).

What caught my attention, however, was this comment from Paul Motian on the decline of the recording studio business:

“A lot of recording studios are closing because people don’t use tape anymore, and that’s where the recording studios make their money. Everyone comes in with their hard drive, puts it on their computers.”

I still have a bunch of 1-inch 16-track master tapes somewhere out in the garage and remember spending a relative fortune on studio time and services, back in the 80s, probably the waning days of multitracking and overdubbing by hand on a mixing board. The Cars were wildly successful at the time and had opened a state-of-the-art studio at Synchro Sound, which was starting to use digital recording systems, but which far exceeded our band’s budget.

There’s still no substitute for good microphones, but these days digital mastering to hard disk is a big win over tape.

I’d never thought about the recording tape as being a critical profit driver for a recording studio, but in retrospect it makes some sense. When the only copy of your work is on a little strip of magnetic film shuttling back and forth on open reels, who’s going to buy cheap tape?

No Bluepulse for you!

bluepulse-download

The other day Oliver Starr at MobileCrunch wrote a rave review of Bluepulse, a new mobile application platform. In a quick read through their website, it looks like they’re trying to offer a carrier-independent path for 3rd party mobile application developers to reach mobile users.

Bluepulse is planning to develop applications for customers, as well as rev share with 3rd party developers, and offers a free SDK. Getting applications onto wireless carriers network is a pain, and getting paid for them is also painful, so there are some good opportunities here, and I thought I would give it a try on my Nokia 6820.

The application downloaded and installed, but nothing happened, so after a few tries I sent off a message on the Bluepulse web site, and got a quick response from Stuart Hely, their general manager.

Unfortunately, it turns out that while the Nokia 6820 is capable of downloading and installing the Bluepulse application (which is needed to use other Bluepulse-hosted applications), it can’t actually run the Bluepulse application. No Bluepulse for you!

Our technical support guys have looked into the issue you raised and the bad news is that we can’t squeeze bluepulse onto the Nokia 6610 as the memory size required JUST exceeds the phones capacity even though the bluepulse file is very small.

Thanks for your query and sorry about the fact that you can’t have bluepulse on your current phone. We hope that with your next phone, you will be able to enjoy bluepulse.

Sounds like an interesting idea, but there might be some handset deployment issues for a while. I haven’t been keeping close track of handset capabilities, mine’s about a year old, so anything since then is probably OK.

I’ve been having good results with my new Bluetooth headset, so I may consider switching to a phone with a bigger screen that is better at running applications sometime. I’ve been moving steadily toward carrying less and smaller equipment for the past few years, though, and have been resisting switching to a Treo, Blackberry, or any other PDA-like device, partly because of the bulk.

P.R.A.S.E. – PageRank assisted search engine – compare ranking on Google, Yahoo, and MSN

page rank assisted search engine
P.R.A.S.E., aka “Prase” is a new web tool for examining the PageRank assigned to top search results at Google, Yahoo, and MSN Search. Search terms are entered in the usual way, but a combined list of results from the three search engines is presented in PageRank order, from highest to lowest, along with the search engine and result rank.

I tried a few search queries, such as “web 2.0″, “palo alto”, “search algorithm”, “martin luther king”, and was surprised to see how quickly the PageRank 0 pages start turning up in the search results. For “web 2.0″, the top result on Yahoo is the Wikipedia entry on Web 2.0, which seems reasonable, but it’s also a PR0 page, which is surprising to me.

As a further experiment, I tried a few keywords from this list of top paying search terms, with generally similar results.

PageRank is only used by Google, which no longer uses the original PageRank algorithm for ranking results, but it’s still interesting to see the top search results from the three major search engines laid out with PR scores to get some sense of the page linkage.

See also:

Tagnautica – fun Flickr tag navigator

Tagnautica is a fun and interesting Flash user interface for exploring and navigating among tags, in this case on Flickr. After keying in an initial tag, related tags are displayed in a circle, with a sample image from each tag category displayed in a representative size.

When you move the cursor over a tag bubble, it temporarily becomes larger so you can get a look at it. The other bubbles keep resizing as well, giving the interface a very fluid appearance. When you find something you like, you can click on the Tagnautica bubble to view the tag page over at Flickr.

I always enjoy these sorts of user interfaces for semi-random exploration. I’ve noticed that I don’t really use any of the cool visualization tools when I actually want to find something, though. Not sure if that’s because they don’t represent a useful set of questions as implemented yet, or simply because my brain doesn’t work that way.

I find I experience these interfaces more as pleasant interactive art than as useful data navigation tools. One of these days I’m sure something is going to click, though.

Watching 4th graders use search engines

Last Friday I spent an hour with my daughter’s 4th grade class, helping them do online research for reports on early California explorers. They were individually assigned an explorer, and were looking for basic biographical information such as dates and places of birth and death, and notable historical achievements or other interesting items to write about. From my perspective, this turned out to be a sort of small focus group on using search engines.

I spend most of my time around people who are pretty good at using search engines and online research tools, so it was interesting to see what they would do with this assignment.

The kids are all familiar with computers to varying degrees. They have had classroom activities using the computer at least once a week since kindergarten, and most of them have some experience using computers at home (this is Palo Alto, after all). I don’t think they’ve done any organized “internet research” in school up to this point, though.

They all started with their research subject’s name written on a piece of paper and had about 20 minutes to find some useful information.

Here are some observations:

  • Simply typing in the names of the explorers was challenging for many of them (“Joseph Joaquin Moraga”, “Ivan Alexandrovich Kuskov”, and others I can’t recall).
  • They often tried to type the search phrase into the address bar. I also saw at least one person try to type the search phrase into a form entry field in an advertisement.
  • Their default home page is set to Yahooligans!, which is kid friendly but seems to sharply limit the search results. I had the kids try their queries there first, but most of them returned zero search results.
  • I then let the kids choose which search engine they wanted to use. About a third of the kids voluntarily expressed a preference for using Google, most of the rest didn’t know or care (I sent about half to Yahoo and half to Google), and one kid really wanted to use A9 (strange, I didn’t have a chance to find out why).
  • None of the kids were familiar with using quote marks to specify exact phrase matching. Some of the explorers’ names contain commonly occuring components and return a large number of irrelevant results without quotes.
  • None of the kids were familiar with the advanced search operators for excluding or qualifying search results. I had to help out in a couple of cases where they were having trouble finding relevant pages.
  • Some of them didn’t understand the difference between page content and the ads in the headers, footers, and sidebars.
  • Some of them were already both familiar with Wikipedia and the benefit and problem that anyone can change the page. One person wanted to look exclusively on Wikipedia after the subject came up.
  • The absence of a bookmarking system for the students to use tends to force them to print out pages they want to use later. This isn’t wonderful at a school lab, since the content is semi-disposable and they’re usually scrounging to conserve printer consumables like toner and paper. The kids liked having something to take back to the classroom with them, though
  • The variations in spelling for the mostly Spanish names caused problems for some queries. Google’s “did you mean” suggestions were helpful. At least one query (which I can’t recall) consisted entirely of common Hispanic names, which matched several famous people other than the intended query subject. This is similar to the problem of searching on common Asian names (like mine).
  • Some students quickly clicked themselves into a rathole of completely unrelated pages, usually after clicking on an ad.

Watching the kids trying to find useful pages highlighted the differences with my usual search behavior, which is to quickly scan the search results page, then refine the query using additional keywords and/or search operators, both of which are hard for 9- and 10-year-olds to do. In “research mode” I usually open results in a new browser tab or window. The kids actually click through the link, making it hard to work through a list of candidate results.

Coincidentally, earlier this week I came across a post on Google Blogoscoped which points to a recent dissertation on search user interface design geared towards kids, by Hilary Browne Hutchinson at University of Maryland which has some interesting observations and ideas.

VoicePulse – Hasn’t signed new subscribers since November 2005 due to E911?


The VoicePulse signup problem I described earlier today seems both worse and sillier than before. They apparently stopped signing up new subscribers at the end of November 2005, due to non-compliance with the FCC E911 requirements. They’re currently doing integration testing with Intrado for 911 service as well as negotiating with the FCC on what constitutes an acceptable solution, with an expected resolution sometime in January 2006.

Here’s someone who ran into a similar signup problem (although I didn’t get a warning prompt about no E911 today):

It turns out that Voicepulse isn’t selling new service at all right now. Of course it’s all the big bad FCC’s fault (never mind the fact that many other VOIP providers are selling new service at the moment, and many of them are providing usable 911 service.) I’m sure the FCC is making it hard on these providers, since the old-line phone companies are pulling the strings, but a) other companies are currently selling new service (I proved this to myself, I ordered VOIP service from a known-good provider) and b) many of these other companies are providing 911 and E911 services.

I spoke to a Voicepulse representative who did confirm that they’re not selling ANY new service at all, and don’t know when they will be again. Of course, he said it would be “soon” and the delay was entirely because they were waiting for replies from the FCC. When I commented that it might be a good idea to announce that BEFORE potential customers spend 20 minutes filling out information on their site only to be told that they couldn’t buy anything, he said that “had been discussed in meetings and it was decided to put the message where it is because that’s where the 911 disclaimer already was in the ordering process.” I suggested that he start looking at the help-wanted ads, because I didn’t think an inbound phone sales rep was going to have a job very long at a company that isn’t selling anything, and it couldn’t be satisfying to answer calls from irritated potential customers all day.

My existing VoicePulse line has been working fine, and they’ve never asked for E911 location profile data yet. I have been following the news on VOIP E911 requirements over the past few weeks, but was under the (false) impression that most of the US VOIP service providers had gotten various combinations of deadline extensions from the FCC and technical solutions in place.

This thread lists the current E911 status of US VOIP providers as of January 8th:

[VoicePulse] Not Taking New Orders? (DSL Reports)

Tried to order VP today and was rejected because of the 911 fiasco. So I can’t even order it even if I understand and agree to the 911 situation?

Nope, thanks to the FCC they need to get e911 before they can sell service again.

No, applies only to the VOIPs that failed to get their 911 house in order during the time allowed by the FCC. Of the well-known brands that would include Voicepulse, Lingo, Nuvio. The others managed to get it done and are selling right now: Vonage, Sunrocket, Viatalk, Packet 8, Broadvox, ATT CallVantage (in about 70% of their markets.)

I’m astonished that VoicePulse appears to have gone for nearly two months with an known-broken signup process (and presumably no new subscribers) without mentioning that detail on their website. They also appear to have a lot of company.

It looks like I’ll need to do a bit of work to find an alternate provider, assuming that VoicePulse isn’t able to take orders by tomorrow. I’m trying to set up a phone number in the Malibu, California service area, and would prefer to use an existing SPA-2002 or SPA-3000, rather than buying another adapter. The E911 aspect is irrelevant as the physical IP connection will be here in the Bay Area most of the time but forwarded to various other locations.

More Links:

VoicePulse – how not to implement a customer feature transition


I just got off the phone with VoicePulse, my current VOIP service provider. They are demonstrating how not to manage a web service feature transition today, by both turning away new customers and annoying their existing ones.

I’ve been relatively happy with VoicePulse, having signed up with them a few months ago for commercial US PSTN access. The voice quality and stability has been OK, and they also offer IAX access which I was thinking about using for future integration with our Asterisk implementation.

All day today I’ve been trying to add a new device and a new number to my existing account. The sign up process requires entering the serial number and MAC address from the VOIP adapter (in this case, a SPA-2002 I picked up a few days ago), selecting a telephone number, and providing contact and billing information. I noticed that since I signed up for my account a few months ago they’ve started collecting E911 contact information, and added some verbiage explaining the limitations of VOIP’s 911 service (i.e. they don’t really have any idea at all where you are).

The process only takes a few minutes, so I’ve been trying it in between various other tasks today, expecting that it wouldn’t take very long. Each time I’ve tried it, I get an error page at the end.

Sorry!

You have encountered a problem while going through the ordering process. This is usually due to your session expiring if the browser was left unattended for too long.

If you have encountered an error with our ordering system, VoicePulse’s development team has been automatically notified.

Please close this window, go back to www.voicepulse.com in a few minutes and try again. If you continue experiencing problems, please call 732-339-5100 M-F 9am-7pm EST to place your order with a customer service representative.

The first couple of times it seemed vaguely plausible that the session might have timed out, but the third time I went straight through all the forms, now well practiced and fully equipped with all the information. Still got the error message. This time I called the customer service number.

According to the Voicepulse phone rep, their system is unable to accept any new orders at all today. They’re apparently rolling out changes to their order application, related to the E911 service that I observed during the signup process. Here are some observations:

  • The VoicePulse customer service rep I spoke with didn’t learn about their phone order application being out of service until this morning. You’d think that they’d give their own CSR team advance notification about a planned application outage.
  • The VoicePulse web application team didn’t bother to build a page indicating that they were unable to accept new orders, and that customers keying in any user account data (like me) would be wasting their time.
  • The VoicePulse web application team left the existing failed-signup message in place. Although “true”, it’s misleading, since the site failure has absolutely nothing to do with the session timeout, and they know that the order process could never have worked in its current state.
  • It didn’t sound like they had a committed “time to fix” — the CSR said it should be tomorrow afternoon sometime, but the fact that they didn’t tell them about it until this morning makes me think it might not have been planned. They suggested I call back tomorrow to see if it was working before trying to place an order. Ugh.

I can’t think of a good rationale for not blocking new orders on their site and putting up a maintenance message of some sort. Maybe they didn’t want people to know they couldn’t take orders?

I can’t think of a good rationale for not telling the customer service department ahead of time.

I suspect that most customers might be unhappy about keying in the 12-digit MAC, 12-digit serial number, along with their credit card data and having Voicepulse’s order processing application choke on it repeatedly, especially when they already know it won’t work. A lot of them don’t know how to cut and paste from the Sipura’s configuration page, and are vaguely uncomfortable with giving out their credit card numbers online as well.

I am a relatively patient person, but I’m astonished at the poor planning and execution exhibited at Voicepulse today. They either can’t plan and manage basic site upgrades, or they’re trying to hide some unexpected maintenance work.

If anyone has a VOIP carrier that they actually like, as opposed to simply tolerate, let me know. I may be looking for a new service provider soon.

SearchSIG – January 2006

IMG_5794 IMG_5795

This evening’s SearchSIG featured a panel discussion on tagging and social bookmarking.

L-R: Joshua Schachter (del.icio.us), Kevin Rose (Digg), Michael Tanne (Wink), Manish Chandra (Kaboodle)

Charlene Li (from Forrester) moderated.

The room at Yahoo was full — standing room only. A quick show of hands indicated nearly everyone in the room had used tagging services before.

Some discussion about “how can we trust the tags”, tag spam (Charlene’s term was “spag”), discerning intent from user tagging and other actions, and the problems of tagging users and the range of social gestures built into the various systems.

Joshua used the example of receiving LinkedIn connection requests from someone whose name you don’t recognize. You don’t want to accept it, because you don’t know who it is. You don’t want to reject it, because it would be rude, and you might actually know them. So he has a huge backlog of random connection requests piling up in his inbox.

Someone in the audience commented that between keyworded search and tagging, people are starting to lose grammar, and instead come up with “restaurant san francisco cool” instead of complete sentences.

Participation rates: Wink assumed 5-8% of their users would tag, actual is 30-40% active (but they’re just launching and are picking up a lot of knowledgeable early adopters from word of mouth). Digg has around 20% of their traffic from registered users (they don’t exactly tag, just digg). Kevin says Digg has around 140K registered users, generating around 4M pageviews per day.

Charlene wrapped up the Q&A with some predictions for the upcoming year:
1. The rise of some sort of social link and social standing system to “rate” users
2. Some sort of social “disaster” will occur on one of the new services, despite best efforts to prevent social disease from creeping in.
3. Today’s companies are mostly small, smart, startups. In a year there will be a different cast of characters from mainstream media, search engines, bigger players.

Thanks to Jeff Clavier and Dave McClure for organizing another great session.

IBM T60 and X60 will run for 11 hours on a charge?

I’ve been pretty happy with my T42P, but I think nearly everyone wants longer battery life. I’ve been debating switching to a smaller form factor for a while, it might be time to keep an eye out for the X60. Something like an X41 with an 11 hour run time would be really tempting. The best I can manage on the T42 is around 5-6 hours with the 9-cell battery.

CES: Lenovo says new ThinkPads go 11 hours on battery power

Update 1-10-2006 22:01 PST: Specifications and photos of the X60 and T60 from NotebookReview.com. It looks like an antenna sticks out slightly on the right side of the T60 display. Perhaps it’s for the EVDO service?

Page 4 of 15« First...23456...10...Last »