Bookmarks for April 28th from 05:35 to 14:24

These are my links for April 28th from 05:35 to 14:24:

  • Official Google Blog: Adding search power to public data – Interesting. Wonder if the underlying public data sets will eventually become available on Google App Engine as well, sort of like the public data sets available for use with Amazon EC2 applications.
  • MySQL And Search At Craigslist – Jeremy Zawodny's slides on MySQL, Sphinx, and free text search implementation at Craigslist, from last week's MySQL conference.
  • Skew, The Frontend Engineer’s Misery @ Irrational Exuberance – For mashups and the like, the distinction between a FE engineer and web dev is rather small in terms of technical skills; they are both using the same skillset, they are both interacting with APIs, and so on. However, there are important distinctions between the two: 1. web developers tend to move in small groups or as individuals, whereas fe engineers work in larger groups, 2. web developers tend to design a product on top of an existing backend service (api, etc), while fe engineers are usually working in parallel with the backend being developed.
  • Study: Twitter Audience Does Not Have A Return Policy – Over 60 percent of people who sign up to use the popular (and tremendously discussed) micro-blogging platform do not return to using it the following month, according to new data released by Nielsen Online. In other words, Twitter currently has just a 40 percent retention rate, up from just 30 percent in previous months–indicating an “I don’t get it factor” among new users that is reminiscent of the similarly-over hyped Second Life from a few years ago.
  • Hey Americans, Appreciate Your Freedom Of Speech : NPR – Firoozeh Dumas on the underappreciated freedoms of speech and expression we have in the US vs journalists and bloggers in Iran.

Bookmarks for April 11th through April 12th

These are my links for April 11th through April 12th:

  • Wordle – Beautiful Word Clouds – Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes.
  • The dark side of Dubai – Johann Hari, Commentators – The Independent – "Dubai was meant to be a Middle-Eastern Shangri-La, a glittering monument to Arab enterprise and western capitalism. But as hard times arrive in the city state that rose from the desert sands, an uglier story is emerging."
  • Topless Robot – Hot Girls Have Lightsaber Strip-Fight for Your Viewing Pleasure – Star Wars CGI meets fake body spray ad
  • Poll Result: Best VPN to leap China’s Great Firewall? – Thomas Crampton – - Witopia – Undisputed winner. Quality of service, speed of surfing, though it is said to be relatively expensive at US$50 to US$60 per year. Hotspot Shield – Bandwidth limits can be painful. Force you to wait until the next month if you use it too much. – Ultrasurf – StrongVPN
  • InfoQ: Facebook: Science and the Social Graph – In this presentation filmed during QCon SF 2008 (November 2008), Aditya Agarwal discusses Facebook’s architecture, more exactly the software stack used, presenting the advantages and disadvantages of its major components: LAMP (PHP, MySQL), Memcache, Thrift, Scribe.
  • The Running Man, Revisited § SEEDMAGAZINE.COM – a handful of scientists think that these ultra-marathoners are using their bodies just as our hominid forbears once did, a theory known as the endurance running hypothesis (ER). ER proponents believe that being able to run for extended lengths of time is an adapted trait, most likely for obtaining food, and was the catalyst that forced Homo erectus to evolve from its apelike ancestors.

Bookmarks for April 9th through April 10th

These are my links for April 9th through April 10th:

Bookmarks for April 9th from 08:07 to 17:53

These are my links for April 9th from 08:07 to 17:53:

Bookmarks for February 18th through February 19th

These are my links for February 18th through February 19th:

Volvo’s pointlessly paranoid heartbeat sensor

A few days ago, the first time I saw the television ad for the new Volvo S80′s heartbeat sensor alarm, I thought it was a parody. It shows a woman walking up to her car in a dark parking lot, then turning away after the heartbeat detector shows that someone is hiding in her car. I’m sure they test marketed this before including the feature, but I totally don’t get it.

Here’s what the Volvo site says about the feature:

The Personal Car Communicator (PCC) is your car key’s smart connection with your Volvo S80 applying the latest in two-way radio technology. When in range, you’ll always know the status of your car. Locked or unlocked. Alarm activated or not. If the alarm has been activated, the heart beat sensor will also tell you if there is someone inside the car. The PCC also includes keyless entry and keyless drive.

So…the heartbeat detector will tell you if someone’s unexpectedly locked themselves in the car? It isn’t going to do anything if it’s turned off, and you’d think anyone trying to break into the car would set off the alarm on the way in, or have a way to turn it off. The least likely thing I can imagine is someone successfully breaking into the car, and waiting there with the alarm still turned on. Even if it works with the alarm turned off, I still don’t see how this is useful.

Volvo has a reputation for safety, but I really did think the ad was a parody or a joke of some kind. I’m obviously not in the core demographic for this feature…but who is?

The Bridge to Terabithia

My 10-year-old daughter and I went to see The Bridge to Terabithia yesterday. She read the book last year and wanted to see the movie, which has been advertised regularly over the past few months.

For movies that are based on a book, my general rule for my daughter is that you should try to read the book before you see the movie. In this case, I didn’t follow my own advice. Although this book is well known in children’s literature (winner of the 1978 Newberry Award), I never got around to reading it, and thus was utterly blindsided by the movie.

The movie advertisements make it look like mostly a fantasy and adventure story, kind of like Chronicles of Narnia or perhaps Neverending Story. It’s not. It’s mostly about friendship and pointless tragedy in middle school. I found it enormously disturbing. It pushed a lot of my emotional buttons, both as a parent today, and in recollection of being an odd kid out in a rural school system in the past.

I don’t think I was the only one who got caught off guard at the movie theater, either. I think this is actually a better-than-average family/kids story (for perhaps 4th-5th grade and up), it just isn’t what they marketed, and parents should be prepared for a conversation about death, which might not work for everyone.

When I was in high school, I used to enjoy (emotionally authentic, depressing) movies like this more. Now, I’d rather just see stylized fantasy or heroic death (Kill Bill, Lord of the Rings) or entertaining family cartoons (Cars, The Incredibles). There’s enough authentic tragedy in the world, I don’t need more of it from the movies, and I don’t find it enlightening or uplifting.

In reading the Wikipedia entry on the movie, I see that the issue with the marketing has come up before:

The filmmakers have disavowed the advertisement campaign for the movie saying that the advertising is deliberately misleading; making the movie seem like it was about or occurring in a fantasy world like that of Harry Potter or Chronicles of Narnia[3]. David L. Paterson in the SCI FI Wire article was surprised by the trailer but understood the marketing reasoning behind it saying:

“Although there is a generation that is very familiar with book, if you are over 40, then you probably haven’t, and we need to reach them. … Everyone who read the book and sees the trailer says, ‘What is this? This is nothing like the book. What are you doing, Dave?’ And I say, ‘You know what you’re seeing is 15 seconds of a 90-minute film. Give me a little leeway and respect. Go see it, and then tell me what you think.’”

I’m generally positive on the movie, but I wish I’d read the book first.

The Long Tail of Invalid Clicks and other Google click fraud concepts

Some fine weekend reading for search engineers, SEOs, and spam network operators:

A 47-page independent report on Google Adwords / Adsense click fraud, filed yesterday as part of a legal dispute between Lane’s Gifts and Google, provides a great overview of the history and current state of click fraud, invalid clicks of all types, and the four-layered filtering process that Google uses to detect them.

Google has built the following four “lines of defense” against invalid clicks: pre-filtering, online filtering, automated offline detection and manual offline detection, in that order. Google deploys different detection methods in each of these stages: the rule-based and anomaly-based approaches in the pre-filtering and the filtering stages, the combination of all the three approaches in the automated offline detection stage, and the anomaly-based approach in the offline manual inspection stage. This deployment of different methods in different stages gives Google an opportunity to detect invalid clicks using alternative techniques and thus increases their chances of detecting more invalid clicks in one of these stages, preferably proactively in the early stages.

An interesting observation is that most click fraud can be eliminated through simple filters. Alexander Tuzhilin, author of the report, speculates on a Zipf-law Long Tail of invalid clicks of less common attacks, and observes:

Despite its current reasonable performance, this situation may change significantly in the future if new attacks will shift towards the Long Tail of the Zipf distribution by becoming more sophisticated and diverse. This means that their effects will be more prominent in comparison to the current situation and that the current set of simple filters deployed by Google may not be sufficient in the future. Google engineers recognize that they should remain vigilant against new possible types of attacks and are currently working on the Next Generation filters to address this problem and to stay “ahead of the curve” in the never-ending battle of detecting new types of invalid clicks.

He also highlights the irreducible problem of click fraud in a PPC model:

  • Click fraud and invalid clicks can be defined conceptually, but the only working defintion is an operationally defined one
  • The operational definition of invalid clicks can not be fully disclosed to the general public, because it will lead to massive click fraud.
  • If the operational definition is not disclosed to some degree, advertisers can not verify or dispute why they have been charged for certain clicks

The court settlement asks for an independent evaluation of whether Google’s efforts to combat click fraud are reasonable, which Tuzhulin believes they are. The more interesting question is whether they will continue to be sufficient as time progresses and the Long Tail of click fraud expands.


Sure sign of a boomlet underway…

…the business and technology magazines are getting thicker again. The latest issue of Wired magazine is 294 pages, Forbes is 280. Not in the phonebook-sized range yet, but noticeably heavier than they’ve been in a while.

Apparently, Adsense hasn’t sucked up all the advertising money. Plus there’s no way to put cardboard inserts and perfume samples onto a web page yet.

Update 12-03-2005 19:15 PST: This guy plotted Wired page counts vs the Nasdaq index, and some similar comments here as well.

Pandora is now free

I spent a lot of time digging up new music a couple of months ago during the pre-launch period beta tests of the Pandora music service. I put together a list of interesting music that I found, and ended up purchasing a number of new albums, and put off signing up for their paid subscription service until I finished working through the new music. I thought the fee was OK ($12/quarter or $36/year) but I simply had too much other stuff to listen to, so it would have been wasted money until the backlog cleared a bit (all the CDs I found from listening to Pandora in the first place).

Given my experience (liked the service, liked the music, put off signing up temporarily when the fees started), and the opportunity for affiliate referral fees from Amazon and others, this move to a ad- and affiliate-supported service could end up generating more revenue in the end.

In addition to many new features (bookmarking, station editing, playlist improvements, etc.), Pandora v2.0 includes a free, ad-supported version. Listeners have the choice to subscribe and stay clear of ads, or use the free service, which will gradually incorporate advertising. What does this mean for you? You can now come back and listen to Pandora as much as you’d like for free–and all the stations you’ve created remain intact.

At a referral fee of 6% of sales, it would take around $50 of CD sales to directly replace the old subscription fee. However, many more users who would turn away if even a small payment were required might try using a free service. And Pandora is the sort of service that creates demand for new music that those “free” users might be happy to purchase from Amazon (or iTunes). I don’t know what their conversion rates look like, but if they look anything like my behavior, Pandora is far better off working on bringing in more music-loving users than trying to collect subscription fees.

See also:

Beauty is only Pixel Deep

I’m not very good at Photoshop, but this portfolio of photo retouching projects by Glenn Feron nicely illustrates the disconnect between reality and the beautiful Photoshop-enhanced images that fill today’s advertising and print media. You can view his before-and-after images by moving your mouse back and forth, some of the differences are quite striking. These images were all part of various commercial projects, but if you have a favorite photo you can apparently send it to him for the full treatment. I’m not sure how well this works when you start with normal-looking people, though. All of the “before” photos are of professional models who look pretty good to start with.

For those who want to play along at home, you can read more about how to remove wrinkles, and blemishes, plump up lips, whiten teeth, tidy up loose hair, add contours, and generally glamourize your photos in these articles:

Maybe there should be a service splicing the Amazon Mechanical Turk with Gimp and HotOrNot to help people who need to boost their photo appeal?

Word of Blog

Word of Blog:

Word of Blog is a new and free service that helps you spread the word about things you like, events you care about and worthy causes you want to support.

Bloggers: You can pick and choose any of the ads appearing on this site and display them into your blog or website. Simply copy the HTML code appearing below the ad and paste it where you wish it to appear. The ads have been formatted to fit into most blog columns.

Organizations: If you want to post an ad on this site so that bloggers can start spreading the “word of blog” about you, please go to the “Submit Ad” section.

This site provides a clearinghouse for non-profit organizations to post their ads for use by bloggers and web site publishers that would like to contribute their support.

It appears to be just getting started. The organizations listed so far include mainstream NGOs such as Red Cross, Grameen Foundation, CARE and United Way, along with assorted political groups.

There’s nothing prohibiting commercial use, though, so it may be swamped with commercial placements before too long. There are already ads for free breast implants, NeoPets, and other sites that just seem to be looking for visibility. Nothing wrong with that, but most bloggers and publishers aren’t likely to place free ads for a commercial site.

The Inevitability of Blog Outsourcing

The blog outsourcing topic has rolled along while I’ve been spending the day at the Blog Business Summit, listening to discussions on commercializing blogs. There’s now a post about it (Outsourcing bloggers in China) at CNET, which turned up a few other skeptics, and it’s looking like the Blogoriented guys are probably a hoax.

Despite that, I also think it’s inevitable that we’ll see at least a couple of real projects along these lines within a year, not aimed at simulating teenaged girls, but rather at building blog networks, filled and buzzed by creating inexpensive original content and editing search feeds that target specific niches.

David Sifry at Technorati has a good summary on the growing problems of spam blogs and fake blogs, and all the search engines are likely to make progress against what are essentially the next generation of link farms. Unfortunately, as discussed in this afternoon’s sessions on web advertising and affiliate models, if you can get traffic, there’s potential for a lot of money to be made by simple manipulations of the system, at least until the search engines improve. Content picked up by the blog search engines gets indexed immediately, leaving a way around some of the the sandboxing and other mechanisms used by Google and others, and makes profitable links visible immediately.

It’s cheap and apparently effective to implement spam and fake blogs. I’ve noticed the volume of junk e-mail is decreasing, while the number of spam blogs in search results seems to be increasing. It’s going to take cooperation among multiple parties to fix this, but everyone recognizes this as a problem, so it’s going to get better. (Here’s Mark Cuban’s take.)

I think that a follow on issue is that genuinely “original” content, in the “first author” sense, rather than in the “new idea” sense, can be probably be reliably cranked out through a well defined process. Think of something like an Indian call center or coding shop crossed with a daily news bureau, supervised by an editor who picked topics with some guidance from Wordtracker, Google and others. You’d get low cost, original writing, around an editorially consistent, topically relevant set of themes, and perhaps even with some interesting domain expertise, all tuned to be informative and keyworded to be search engine friendly.

Many of the same processes used at Wipro, Infosys, and other software and BPO outsourcers could be adapted to this application. Why cheat the search engine rankings when you can just reduce the cost of production and actually receive ranking benefit when the search engines get better at filtering for contextually better results and get rid of the “really fake” blogs? The Weblogs Inc blog network model seems to be working so far – Jason Calcanis says they’ve just hit a $1M annual ad revenue rate. Reducing the content production costs can’t hurt. I’m sure they could apply some of these ideas, if they haven’t already, and if they don’t, some other new blog network will certainly try.

This approach to farming out the process-oriented writing tasks should apply equally to a number of periodicals, such as magazines and newspapers. The difference between the news content in many newspapers is already often just the local editor’s preferences on the AP or Reuters newsfeeds and what fit in between the committed ad inches.

I don’t think this sort of blog or content outsourcing would be “bad” or “evil” in the sense of creating lower quality content, at least in some topic domains, since a pool of skilled professionals already exists offshore, and is growing rapidly. If you got a good editor in place, it might even improve the overall quality of online content. It’s not misrepresentation, unless you tried to pass off your authors as being something they’re not. But I wouldn’t even bother with attempting the nuances of local US culture with a staff of offshore bloggers, despite the availability of cultural indoctrination programs they run call center trainees through. That would work about as well having US bloggers cover cricket or Bollywood gossip or Korean K-pop singers for their respective local audiences.

This seems to leave American pop culture as a secure niche for a while. Unfortunately, I’m incredibly bad at celebrity gossip. Although, now that I think about it, I did meet Cher once at her house in Malibu…

Putting on my evil genius hat, here’s a hypothetical approach for building an astroturfing blog empire, filled with posts from simulated teenaged (18-35) girls. Start by extracting common phrases, topics, and contexts from some LiveJournal and MySpace blogs. Next, build some auto-blogging agents resembling Weisenbaum’s Eliza program crossed with some modern chatterbots. Finally, set it loose on LiveJournal, Xanga, and MySpace and have it start forming its own blogrings and online cliques, responding to filtered inputs from comments, selected feeds, and topical news, biased for the current hot keywords and with statistically plausible content and linkage…any Emacs Lisp and SQL hackers want to take this on?

See also: Outsource your Blog, Reasons I Still Read Newspapers

Update 08-19-2005 12:32 – some discussion at My Heart’s in Accra

Update 08-27-2005 00:10 – See also Goofy algorithm generates web page about “Prostitute Phobia” (at BoingBoing), which comments on this site, which is one of a collection of automatically generated pages.

Outsource Your Blog

I had been speculating on something like this after reading an article last month about outsourcing personal website maintenance to India.

via Marginal Revolution, Content to Go

As I write this entry my partner Jeff is in the air on the way to our office in Shanghai. What Jeff and I are doing is simple but as far as I know we are the first. We are outsourcing blogs to China.

Our general business model is a two tiered effort to hire Chinese citizens to write blogs en masse for us at a valued wage. The first tier is to create original blogs. These blogs will pop up in various areas of the net and appear to the unknowing reader to be written by your standard American. Our short term goal for these original blogs is to generate a steady stream of revenue through traditional blog advertising like google adwords. We estimate that our current blogforce of 25 can support around 500 unrelated blogs. Hopefully a few of those will be hits. The long term goal is to generate a large untraceable astroturfing mechanism for launching of various products. When a vendor needs to promote a new product to the internet demographic we will be able to create a believable buzz across hundreds of ‘reputable’ blogs and countless message boards. We can offer a legitimacy to advertisers that doesen’t exist anywhere else.

The second tier of our plan is a blog vacation service where our employees fill in for established bloggers who need to take a break from regular posting. As all bloggers know, an unupdated blog is quickly forgotten. For a nominal fee we can provide seamless integration of filler.

I’m not entirely sure that the project is real, they claim to have raised $5 million US and the domain was just registered 3 days ago, but this caught my eye because I think there are some real possibilities for something like this.

Personally, I don’t have a problem with commercial blogging or professional blogging. However…their plan calls for deliberate misrepresentation of commercial interests as personal ones, on a large scale. This could be blog spam taken to the next level.

If they’re really heading off to put together an offshored blog content network, I think it could be done without heading straight for the “astroturf” market, which might give it a slower start but longer legs.

In my quick take on this idea, I’d probably choose India or Phillipines over China for basic English language skills, since the target audience is in the US, and have content editors with actual domain knowledge working with lower cost writers. This might not work for simulating teen LiveJournal sites, but should fit pretty well for topical blogs of most sorts. Hmm. That sounds like the direction the newspaper and magazine business is already heading…

Update 08-19-2005 – Followed up with more comments, plus ideas on how to build the evil astroturfing network in a new post.