More on the America Online search query data

August 7th, 2006 7:58pm

The search query data that America Online posted over the weekend has been removed from their site following a blizzard of posts regarding the privacy issues. AOL officially regards this as “a screw up”, according to spokesperson Andrew Weinstein, who responded in comments on several sites:

All –

This was a screw up, and we’re angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant.

Although there was no personally-identifiable data linked to these accounts, we’re absolutely not defending this. It was a mistake, and we apologize. We’ve launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again.

AOL Research publishes 20 million search queries

August 6th, 2006 3:45pm

More raw data for search engineers and SEOs, and fodder for online privacy debates - AOL Research has released a collection of roughly 20 million search queries which include all searches done by a randomly selected set of around 500,000 users from March through May 2006.

This should be a great data set to work with if you’re doing research on search engines, but seems problematic from a privacy perspective. The data is anonymized, so AOL user names are replaced with a numerical user ID:

The data set includes {UserID, Query, QueryTime, ClickedRank, DestinationDomainUrl}.

I suspect it may be possible to reverse engineer some of the query clusters to identify specific users or other personal data. If nothing else, I occasionally observe people accidentally typing in user names or passwords into search boxes, so there are likely to be some of those in the mix. “Anonymous” in the comments over at Greg Linden’s blog thinks there will be a lot of those. The destination URLs have apparently been clipped as well, so you won’t be able to see the exact page that resulted in a click-through.

A primer on the evolving media industry from Carl Icahn and friends

February 7th, 2006 11:01pm


The proposal on the table is to split Time Warner into four pieces, undoing years of mergers and acquisitions. The (massive) report from Carl Icahn’s investment banking team at Lazard is worth a look for anyone with an interest in online or traditional media businesses or who simply lived through the dot-com boom and crash. I’ve only skimmed through it so far, but it’s practically a textbook on the evolution and current state of the media industry.

TWX is at the center of the storm that has and will continue to jolt American industry. Technology, regulation and competition are changing at an accelerated pace. The markets are increasingly rewarding companies—across all industries—with a well-defined vision, as shareholder expectations on transparency, capital returns, appreciation and corporate governance increase. Against this backdrop, anticipating and harnessing change is critical for success.

Googlepark: the battle for AOL

December 19th, 2005 3:17pm


More business comics - the latest installment of Googlepark is up at Channel 9 (via Google Blogoscoped)

If you haven’t seen the previous episodes of Googlepark, here are links to the other installments: Googlepark.

GooglePark

October 11th, 2005 12:45pm

Google Park Kids
Brad Feld points out this awesome comic series that went by on Channel9 recently featuring Larry, Sergey, and Scoble (among others) as the South Park kids.

Update 11-06-2005 19:39 PST A new installment! GooglePark: Disruption
Update 12-19-2005 14:35 PST The Battle For AOL

Update 02-13-2006 18:33 PST The Spaghetti Code


 
  • A Random Selection of Other Fine Posts

  •  
    Translate this page
    German Flag Spanish Flag French Flag Italian Flag Portuguese Flag Japanese Flag Korean Flag Chinese Flag
    Plugin by Taragana
    Google
    Web hojohnlee.com

    •  

     

     
     

    © 2004-2008 Ho John Lee