When you come to a fork in the road…

Crossroads of the World at the Beach Bar, Waikiki

Crossroads of the World at the Beach Bar, Waikiki

As some of you know, I have been exploring a variety of paths forward for SocialQuant, my real time social search and analytics project. My family, friends, and colleagues have given me much support, patience, and advice during this process, which has reached a crossroads, and as Yogi Berra says, “When you come to a fork in the road, take it!”

The rise of Twitter, Facebook, and other social media, combined with web-based applications, smartphones, and cloud computing have all set the stage for new applications and use models based on social discovery, collaboration, and communications, in addition to traditional search. What we’re all calling “real time search” lately isn’t exactly real time, nor is it exactly search, in which you find a definitive/authoritative answer. Much of the opportunity revolves around discovering people, discussions, and events that are relevant to you and bringing it to your attention in a timely, actionable fashion. Information streams from social media are transient, unreliable, and noisy. At the same time, the sheer volume of data can help provide the basis for building better filters. As an added bonus, you can ask questions to people in the social graph itself, and there are numerous examples of communities of interest forming around current events such as Barack Obama’s inauguration, the Iran elections, or even Michael Jackson’s funeral, all of which help surface information content, opinion, and sentiment that were previously inaccessible online. One interesting aspect of real time social media is that it’s not just algorithmic, it’s based on human connections and emotions. So a message  that “feels right” from people you trust can be more relevant than one that is “correct” at times.

The challenge then is in filtering and ranking the massive flow of information in a way that helps direct the user’s limited (and non-expanding) time and attention in a way that’s most valuable to them. With today’s information technology, amazing things are possible with limited resources. I personally have more computing and storage resources than the facility we launched HP’s original photo site with (for millions of dollars), at a fraction of the cost, routinely pushing around datasets of millions of rows on the local development servers. Unfortunately, that’s just the ante to get started on the problem. Running ranking, clustering, and semantic analysis for filtering the ever-growing stream of social media eventually requires web scale computing, even with careful problem selection and data pruning. The bar is also going up every day as the social media user base grows, and as well funded teams make progress on their platforms (+Google).  So very shortly, to be competitive in real time, social search and discovery is going to require access to lots of data and either getting a datacenter or working with someone who has one.

In my case, I have recently chosen the latter path, and will be joining the Microsoft Bing search team, focusing on real time and social search. Microsoft itself has been showing signs of a renaissance, with search relaunching, Windows 7 looking leaner, Azure becoming non-vaporous, more web APIs getting published, core online applications starting to turn up, and a cool Office 2010 video. Even Mini-Microsoft is getting positive recently. And Google is starting to have “bigness” issues.

I look forward to working with Sean Suchter and the Microsoft Bing search team (and likely expanding their carbon footprint) in pursuit of new applications and services as the social media and online application space evolves.

You can follow along on Twitter (@hjl). As always, any and all opinions here are solely mine and do not reflect the position of any past, present, or future employer, partner, or business associate.

Follow suggested users, attract instant spamcloud

Despite Twitter’s amazing growth rate, there is general agreement that the Suggested Users List and the new user experience has shortcomings. As an experiment, I created a new Twitter account. I wanted to see what the experience might look like for someone interested in, but otherwise completely unfamiliar with the service. During the signup process, it automatically picks some suggested users (apparently random), which I selected all of, about a dozen or so. Then it asked for my email credentials to check for other people I know on Twitter, which I declined, since I generally don’t give web applications access to my email services. Then I went back to “Suggested Users” under the “Find People” section, and selected all of them. In total, the Suggested Users list got me up to 237 friends in my incoming stream.

Within a few minutes of completing this process, I already had 13 spam followers offering affiliate links for cameras, porn, and twitter followers. A day later I was up to 41 spam followers, plus 4 follow-backs from accounts I followed in addition to the Suggested Users List.

twitter-newuser-spam-090705There are two different issues here: 1) finding a set of interesting / relevant people for new users to follow, and 2) limiting the impact of spam and affiliate marketers, who appear to be scanning the follower lists of the Suggested Users to identify new accounts to spam.

Link posts seem to be working again

The automatic nightly link posts from del.icio.us stopped working properly sometime last year. The links would get posted, but had extra “\n” inserted at every line break. Here’s an example. An unexpected side effect of having “ugly” link posts is that I mostly stopped posting links to del.icio.us for a while.

As part of the recent blog platform update, I’ve switched from the del.icio.us “experimental” nightly blog posting to Postalicious, which seems to be working nicely, you can see the new link post style (and the old ones too, unless I get around to cleaning them up) here.

New and improved

This evening I’m rolling out a long overdue update to the blogging platform. It’s been a little complicated, because I ‘ve been running a heavily customized WordPress 1.5.2 for a long time, and there have been a lot of changes since then to WordPress, various plugins, and the underlying database (the current release is 2.7.1).

hjl-weblog-feb09-before hjl-weblog-feb09-after

The new version is based on Atahualpa, which has many customizable options. The Recent Posts, Tag Cloud, Recent Links, Twitter status, and permalinks are all working as before. The new template doesn’t have a place for the randomly selected banner thumbnail images from my Flickr account, but does incorporate a larger random image at the top, which currently selects from a few photos I picked out of my snapshot collection. I may figure out some other way of sharing some photos here. I’ve also added a random quote widget. You have to provide your own collection of quotes, so there aren’t many in there yet.

It might be a little slower than the old platform for a while until I get the caching set up, all those customizable options use a lot of database queries.

Let me know what you think, and if you are have any suggestions or are having problems viewing things. I’ve mostly been looking at this with Firefox 3, so people with other browsers may have a different experience.

My Twitter follower tag cloud from Twittersheep

hjl twitter follower cloud
Twittersheep builds a tag cloud from the profile description of your Twitter followers. In my case, the tags suggest that many people following my Twitter feed are technology entrepreneurs and traders with an interest in markets and social media. Sounds about right.

via Webware

140 characters is nice but doesn’t always work

I haven’t been posting here in a while, but think I will try picking up the keyboard here a little more frequently. I added a twitter box on the sidebar a while back, as I have been experimenting with that more, along with friendfeed, facebook, etc. I like the brevity and immediacy of twitter, but not everything fits in 140 characters. You can find me on twitter and friendfeed as “hjl”, also on Facebook.

HJL at the inauguration

HJL at obama inauguration

Me at Barack Obama’s inauguration, via FotoFlexer’s MyInauguralPhoto service. Just call me Zelig. (via TechCrunch)

The Ultimate Captcha

“No Premium User. Please solve the Riemann Hypothesis.”




Random Palo Alto stuff – wheelchair bandit, chickens, Comcast

It’s the time of spring when all the flowering trees bloom. There are a lot of cherry and wisteria trees in our neighborhood, it looks nice and as the petals start falling in a few weeks off later it will look like every home held a wedding recently. Good weather for being out and about. Speaking of which…

The Wachovia Bank (formerly World Savings) branch over at the Stanford Shopping Center was robbed last Thursday. This is already a little unusual, but what caught my attention was that they were robbed by an elderly man in an electric wheelchair. And he got away! He apparently stopped by The Sharper Image and asked for a shopping bag on his way over to the bank.

Mike’s comment about Comcast and chickens wandering in Keith’s yard reminded me about my former neighbors. When we first moved into our current home, we soon discovered that the neighbors bordering our back yard owned several chickens. During the summer when we left the windows open overnight, we would hear their rooster crowing first thing in the morning. Their chickens never made it into our yard, although their cats came through regularly. They were an interesting couple, living kind of like they were homesteaders on a mountain farm, with a rickety greenhouse, garden, and a yard full of debris, on an oversized lot in the middle of Old Palo Alto. They sold a few years ago, at the moment there’s a brand new house going up, the chickens are long gone but we have had random construction work going on for a while.

We also have Comcast here.  I still use PacBell (now AT&T) DSL for the office network, but the house network uses the cable modem service. The download speeds are higher, but it does go offline sometimes, making me reluctant to run my office on Comcast’s internet service. This is a great fit for the rest of our family which mostly surfs the web, watching online video, web pages, or chatting. The DSL service is relatively clunky (I have one of the first lines rolled out in Palo Alto) and slow, but the continuous uptime is similar to my Linux servers in the back of the closet, running for years with uninterrupted service.

Looking at this heatmap, Palo Alto and Stanford are apparently a little blue oasis of solvency in the map of real estate foreclosures, surrounded by a sea of red.


Hacked by keymachine.de

I just noticed that my WordPress installation got hacked by a search engine spam injection attack sometime in the past few weeks. This particular one inserts invisible text with lots of keywords in footer.php. The changes to the file were made using the built-in theme editor, originating from ns.km20725.keymachine.de, which is currently at The spam campaign automatically updates the spam payload every day or so. The links point to a variety of servers that have also been hacked to host the spam content. Here is a sample: http://www.nanosolar.com/feb3/talk.php?28/82138131762.html
I’ve sent an e-mail to Nanosolar, so they’ll probably have that content cleaned up before long. But the automated SEO spam campaign updates the keyword and link payload regularly, so any affected WordPress sites will be updated to point at the new hosting victims.

From a quick check on Google, it looks like keymachine.de is a regular offender

Hey, it’s an earthquake

We had a noticeable earthquake a few minutes ago this evening. Nothing too severe, but the hanging lamps were swinging back and forth a few inches, and the house was shaking for 15-20 seconds. Apparently it was magnitude 5.6, somewhere near Alum Rock.

Volvo’s pointlessly paranoid heartbeat sensor

A few days ago, the first time I saw the television ad for the new Volvo S80′s heartbeat sensor alarm, I thought it was a parody. It shows a woman walking up to her car in a dark parking lot, then turning away after the heartbeat detector shows that someone is hiding in her car. I’m sure they test marketed this before including the feature, but I totally don’t get it.

Here’s what the Volvo site says about the feature:

The Personal Car Communicator (PCC) is your car key’s smart connection with your Volvo S80 applying the latest in two-way radio technology. When in range, you’ll always know the status of your car. Locked or unlocked. Alarm activated or not. If the alarm has been activated, the heart beat sensor will also tell you if there is someone inside the car. The PCC also includes keyless entry and keyless drive.

So…the heartbeat detector will tell you if someone’s unexpectedly locked themselves in the car? It isn’t going to do anything if it’s turned off, and you’d think anyone trying to break into the car would set off the alarm on the way in, or have a way to turn it off. The least likely thing I can imagine is someone successfully breaking into the car, and waiting there with the alarm still turned on. Even if it works with the alarm turned off, I still don’t see how this is useful.

Volvo has a reputation for safety, but I really did think the ad was a parody or a joke of some kind. I’m obviously not in the core demographic for this feature…but who is?

Hello stealthy readers

Hello, dear readers. I had lunch with some friends the other day and they mentioned that I hadn’t posted in a while. Sorry I haven’t been paying much attention to this site lately, other than knocking back comment and link spam. I recently saw that Google Reader is starting to report subscription statistics, which prompted me to take a look. It’s been a while since I looked over the server logs, and I was surprised at the number of RSS subscriptions that have accumulated (i.e. it’s more than I can account for by friends, family, and random acquaintances). I didn’t know you were out there, but now that you’re decloaked and I can see you, I wanted to say hello.

I ended up taking a break from posting for a few weeks (since the beginning of the year). Not by coincidence, I’ve also ramped up my running since the beginning of the year, prepping for this year’s Big Sur Marathon, while holding other obligations roughly constant.

Anyway, I think I’ll try some different approaches to posting here and see how it works out.

Back to school 2006

Back to School 2006
Today is the first day of school in Palo Alto. It feels like we just started summer vacation, but it’s fun seeing everyone after the break. I’m always surprised by how much the kids grow in just a few weeks.

Amazon aStore – custom storefronts for Amazon affiliates

Amidst the speculation about the Amazon Unbox video download service, Amazon has quietly launched aStores, a service providing custom online storefronts for Amazon affiliates. (You may not be able to view the link unless you’re an Amazon affiliate.)

aStore by Amazon is a new Associates product that gives you the power to create a professional online store, in minutes and without the need for programming skills, that can be embedded within or linked to from your website.

Here’s a link to their demo store.

You get to pick up to nine “featured items” to put on the home page of the store, choose product categories, and add reviews and editorial content. The shopping cart and fulfillment are handled by Amazon, with standard referral fees going back to the affiliate. There’s a browser based interface for building a store on the Amazon Affiliates site. The resulting store can be hosted by Amazon or on your own site.

This sort of functionality has been available for a while for those will and able to customize their site using Amazon’s web services API, but the aStores program will make custom stores broadly accessible to all of the Amazon affiliates base (just in time for the holiday shopping season). I suspect we’ll see an explosion of niche shopping sites in short order, it looks pretty easy to set one up.

More on the America Online search query data

The search query data that America Online posted over the weekend has been removed from their site following a blizzard of posts regarding the privacy issues. AOL officially regards this as “a screw up”, according to spokesperson Andrew Weinstein, who responded in comments on several sites:

All –

This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.

Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

I pulled down a copy of the data last night before the link went down, but didn’t get around to actually looking it over until this evening. In a casual glance at random sections of the data, I see a surprising (to me) number of people typing in complete URLs, a range of sex-related queries, (some of which I don’t actually understand), shopping-related queries, celebrity-related queries, and a lot of what looks like homework projects by high school or college students.

In the meantime, many other people have found interesting / problematic entries among the data, including probable social security numbers, driver’s license numbers, addresses, and other personal information. Here’s a list of queries about how to kill your wife from Paradigm Shift.

More samples culled from the data here, here, and here.

#479 Looks like a student at Prairie State University who like playing EA Sports Baseball 2006, is a White Sox fan, and was planning going to Ozzfest. When nothing else is going on, he likes to watch Nip/Tuck.

#507 likes to bargain on eBay, is into ghost hunting, currently drives a 2001 Dodge, but plans on getting a Mercedes. He also lives in the Detroit area.

#1021 is unemployed and living in New Jersey. But that didn’t get him down because with his new found time, he’s going to finally get to see the Sixers.

#1521 like the free porn.

Based on my own eclectic search patterns, I’d be reluctant to infer specific intent based only on a series of search queries, but it’s still interesting, puzzling, and sometimes troubling to see the clusters of queries that appear in the data.

Up to this point, in order to have a good data set of user query behavior, you’d probably need to work for one of the large search engines such as Google or Yahoo (or perhaps a spyware or online marketing company). I still think sharing the data was well-intentioned in spirit (albeit a massive business screwup).

Sav, commenting over at TechCrunch (#67) observes:

The funny part here is that the researchers, accustomed to looking at data like this every day, didn’t realize that you could identify people by their search queries. (Why would you want to do that? We’ve got everyone’s screenname. We’ll just hide those for the public data.) The greatest discoveries in research always happen by accident…

A broader issue in the privacy context is that all this information and more is already routinely collected by search engines, search toolbars, assorted desktop widget/pointer/spyware downloads, online shopping sites, etc. I don’t think most people have internalized how much personal information and behavioral data is already out there in private data warehouses. Most of the time you have to pay something to get at it, though.

I expect to see more interesting nuggets mined out of the query data, and some vigorous policy discussion regarding the collection and sharing of personal attention gestures such as search queries and link clickthroughs in the coming days.

See also: AOL Research publishes 20 million search queries

Update Tuesday 08-08-2006 05:58 PDT – The first online interface for exploring the AOL search query data is up at www.aolsearchdatabase.com (via TechCrunch).

Update Tuesday 08-08-2006 14:18 PDT – Here’s another online interface at dontdelete.com (via Infectious Greed)

Update Wednesday 08-09-2006 19:14 PDT – A profile of user 4417749, Thelma Arnold, a 62-year-old widow who lives in Lilburn, GA, along with a discussion of the AOL query database in the New York Times.

Back from the mobile office

At the mobile office Ocean kayaking at Malibu

Spent most of the past weekend on the beach in Malibu. Emily and I tried a little surfing, ocean kayaking, and also got a good look at some dolphins while we were paddling around.

I brought the Thinkpad, but left the charger at home, the idea being to limit my computer use while on vacation. We decided to stay a couple extra days, so I was effectively offline after running on batteries for 5 hours or so. Next time I’ll bring the charger anyway.

If you’ve been having trouble getting at this site while I’ve been away, Dreamhost posted a narrative of their recent adventures in data hosting, some of which have been power-related, and some not.

Google is having problems this evening?

This evening I’m getting slow response or connection timeouts from Google for the past half hour or so (20:30 – 21:00 PDT). Usually this means that the local network is having problems, but other major sites (Yahoo, CNN) are running as quickly as ever, along with various SSH sessions around the world, so it seems to be specific to Google.

So far I get slow or no response from the main search page, Gmail, Adsense, Adwords, Analytics, and Finance.

Pages that do respond are coming back in 10+ seconds, and some pages are loading without graphics or with templates only and no content.

Anyone else seeing these problems? This is the first time I’ve seen Google unusable for more than a minute or two. (Unlike this site, which has been bouncing up and down due to problems at Dreamhost lately).

Search referrals – July 2006 snapshot

Here’s a quick snapshot of incoming search engine referrals for the past few weeks. Compare this with another post last year on search engine referral share, recently referenced in a post at Alexa noting the discrepancy between the published search engine traffic reports and anecdotal observations by webmasters.

Is it just me, or are these charts a bit goofy? Does Yahoo really still have 23% of the search market? Is Google at less than half the search market?

I don’t believe it. Any webmaster will tell you that Google represents almost ALL of the search engine traffic. Yahoo is nowhere near 23%. Just read the blogs, here, here, here and here and on countless other blogs.

Already at 82% last October, Google has increased to even more of the incoming search traffic (92%) here, largely at the expense of “Other”. In the fall, it looked like those were mostly miscellaneous Chinese search engines, so perhaps my site is not getting indexed or ranked well there anymore, or Google is picking up market share, or both.

Some of the commenters at the Alexa post noted increasing traffic from Microsoft / MSN / Live search, including one who got most of their traffic through MSN search. I’m a little surprised that I don’t see more traffic from Yahoo and Microsoft search here, but that may also be a function of who’s likely to be searching for a given topic.

See also Greg Linden’s comments on the competitiveness of Yahoo and Microsoft search efforts

Who carries three cell phones?

I was out for dinner at Fukisushi in Palo Alto this evening, enjoying some excellent spider rolls and giant clam sushi. A few minutes after we were served, a young couple came in, perhaps meeting for a date after work. At first I noticed that the man had the same cell phone (a Nokia 6682) as my wife, as he took it out and set it on the table next to him. Then he took out a Motorola Razr, flipped it open, and set it on the table next to the Nokia. I’m thinking that this is somewhat geeky and he should be paying more attention to his attractive blonde companion, but he looks like an engineering or tech operations kind of guy, and this is Silicon Valley, so maybe he has a work phone for being on call and a personal phone. But then he pulls out yet another phone, flips it open and sets it down next to the other two, creating a sort of mini-console of cell phones on the table next to the sushi plates.

Now I’m confused. I can think of lots of reasons why someone might have two cell phones. I can’t think of any good reason to park three cell phones on the table while on a date, though.

I don’t think he actually used any of them, except to take a photo of his companion with the Nokia.

Personally, I’ve been cutting down on the hardware I carry for some time now. At one point a few years ago, I often carried two PDAs, two cell phones, and a pager. That didn’t last long. These days I try to stick with one phone, as small as practical.

This episode makes me laugh, because I’m more puzzled by this guy carting three phones around than him parking them on the table in the middle of his dinner date.

Page 1 of 41234