Bookmarks for January 20th through January 23rd

These are my links for January 20th through January 23rd:

  • Data.gov – Featured Datasets: Open Government Directive Agency – Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act request logs, Treasury TARP and derivative activity logs, crime, income, agriculture datasets.
  • All Your Twitter Bot Needs Is Love – The bot’s name? Jason Thorton. He’s been humming along for months now, sending out over 1250 tweets to some 174 followers. His tweets, while not particularly creative, manage to be both believable and timely. And he’s powered by a single word: Love.

    Thorton is the creation of developer Ryan Merket, who built him as a side project in around three hours. Merket has just posted the code that powers him, and has also divulged how he made Thorton seem somewhat realistic: the bot looks for tweets with the word “love” in them and tweets them as its own.

  • Building a Twitter Bot – "Meet Jason Thorton. To people who know Jason, he is a successful entrepreneur in San Francisco who tweets 4-5 times a day. But Jason has a secret, he’s not really a human, he’s the product of my simple algorithm in PHP

    Jason tweets A LOT about the word “love” – that’s because Jason actually steals tweets from the public timeline that contain the word “love” and posts them as his own

    Jason also @replies to people who use the word “love” in their tweets, and asks them random questions or says something arbitrary

    It took me about 3 hours to code Jason, imagine what a real engineer could do with real AI algorithms? Now realize that it’s already a reality. Sites like Twitter are full of side projects, company initiatives, spambots and AI robots. When the free flow of information becomes open, the amount of disinformation increases. Theres a real need for someone to vet the people we ‘meet’ on social sites – will be interesting to see how this market grows in the next year

  • Website monitoring status – Public API Status – Health monitor for 26 APIs from popular Web services, including Google Search, Google Maps, Bing, Facebook, Twitter, SalesForce, YouTube, Amazon, eBay and others
  • PG&E Electrical System Outage Map – This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.

Bookmarks for June 6th through June 8th

These are my links for June 6th through June 8th:

  • Latin motto generator: make your own catchy slogans! – Create your own life mottos and slogans in Latin! (Learning Latin not required, some vague idea for a desired motto a plus)
  • A Map Of Social (Network) Dominance – Using Alexa and Google Trend data, Cosenza color-coded the map based on which social network is the most popular in each country. All of the light green countries belong to Facebook. But there are still pockets of resistance in Russia (where V Kontakte rules), China (QQ), Brazil and India (Orkut), Central America, Peru, Mongolia, and Thailand (hi5), South Korea (Cyworld), Japan (Mixi), the Middle East (Maktoob), and the Philippines (Friendster).
  • Microsoft Releases Bing API – With No Usage Quotas – Updated search API, with no quotas and some improvements.
    * Developers can now request data in JSON and XML formats. The SOAP interface that the Live Search API required has also been retained.
    * Requested data can be narrowed to one of the following source types: web, news, images, phonebook, spell-checker, related queries, and Encarta instant answer.
    * It is now possible to send requests in OpenSearch-compliant RSS format for web, news, image and phonebook queries.
    * Client applications will be able to combine any number of different data source types into a single request with a single query string.
  • Twitter Limits Getting Ridiculous! « Verwon’s Blog – Anecdotal reports of Twitter users running into problems with rate limiting, either API or max posts/tweets/follows/directs.
  • flot – Google Code – Flot is a pure Javascript plotting library for jQuery. It produces graphical plots of arbitrary datasets on-the-fly client-side. The focus is on simple usage (all settings are optional), attractive looks and interactive features like zooming and mouse tracking. The plugin is known to work with Internet Explorer 6/7/8, Firefox 2.x+, Safari 3.0+, Opera 9.5+ and Konqueror 4.x+. If you find a problem, please report it. Drawing is done with the canvas tag introduced by Safari and now available on all major browsers, except Internet Explorer where the excanvas Javascript emulation helper is used.

Bookmarks for June 3rd through June 4th

These are my links for June 3rd through June 4th:

Bookmarks for June 1st through June 2nd

These are my links for June 1st through June 2nd:

  • jqPlot – Pure Javascript Plotting – jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.
  • New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – HarvardBusiness.org – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
  • Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
  • well-formed.eigenfactor.org : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
  • Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.

Bookmarks for May 30th through May 31st

These are my links for May 30th through May 31st:

Bookmarks for May 4th through May 5th

These are my links for May 4th through May 5th:

Bookmarks for April 28th through April 29th

These are my links for April 28th through April 29th:

Bookmarks for April 15th through April 17th

These are my links for April 15th through April 17th:

Bookmarks for March 6th through March 8th

These are my links for March 6th through March 8th:

Bookmarks for February 27th through February 28th

These are my links for February 27th through February 28th:

Bookmarks for February 26th from 10:39 to 20:05

These are my links for February 26th from 10:39 to 20:05:

Bookmarks for February 25th through February 26th

These are my links for February 25th through February 26th:

Bookmarks for February 21st from 13:59 to 21:55

These are my links for February 21st from 13:59 to 21:55:

Bookmarks for February 20th through February 21st

These are my links for February 20th through February 21st:

Google is having problems this evening?

This evening I’m getting slow response or connection timeouts from Google for the past half hour or so (20:30 – 21:00 PDT). Usually this means that the local network is having problems, but other major sites (Yahoo, CNN) are running as quickly as ever, along with various SSH sessions around the world, so it seems to be specific to Google.

So far I get slow or no response from the main search page, Gmail, Adsense, Adwords, Analytics, and Finance.

Pages that do respond are coming back in 10+ seconds, and some pages are loading without graphics or with templates only and no content.

Anyone else seeing these problems? This is the first time I’ve seen Google unusable for more than a minute or two. (Unlike this site, which has been bouncing up and down due to problems at Dreamhost lately).

Deconstructing search at Alexa

Wow! Although the basic idea is straightforward, crawling and indexing for a general purpose search engine requires huge resources. Web crawlers are effectively downloading copies of the entire internet over and over, turning them over to indexing applications which scan the contents for structure and meaning.

The sheer scale of the task is a substantial barrier to entry for anyone wanting to develop a new indexing or retrieval application. Some projects have narrowed the problem domain, which can reduce the problem scope to a manageable level, but this announcement from Alexa looks like it may offer an exciting alternative for building new search applications.

John Batelle writes:

Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it (details are not up yet, but soon). Alexa has about 5 billion documents in its index – about 100 terabytes of data.

Anyone can also use Alexa’s servers and processing power to mine its index to discover things – perhaps, to outsource the crawl needed to create a vertical search engine, for example. Or maybe to build new kinds of search engines entirely, or …well, whatever creative folks can dream up. And then, anyone can run that new service on Alexa’s (er…Amazon’s) platform, should they wish.

The service will be priced on a usage basis: $1 per CPU hour, $1 per GB stored or uploaded, $1 per 50GB data processed.

There’s no announcement posted on the Alexa or Amazon sites yet, it’s apparently due out overnight. (Updated 12-13-2005 00:25 – the site is up now)

Not every search and retrieval application is necessarily going to fit onto the way Alexa has built their crawler and indexing infrastructure, or onto any other search engine platform, for that matter. But opening up access to more of the platform should make it possible for a lot of new ideas to be tried out quickly without having to build yet another crawler for each project. Up to this point, many search ideas can’t be evaluated without working at one of the major search engines. I suspect most development teams would prefer to get access to Google’s crawl and index data, but I’m certainly looking forward to seeing what’s available at Alexa when they get their documentation online in the morning.

More from Om Malik, TechCrunch, ReadWrite Web

Amazon Mechanical Turk – Putting Humans in the Loop

I came across a cryptic link to mturk.com on supr.c.ilio.us, asking “Isn’t that how the Matrix came to be?”

Amazon Mechanical Turk provides a web services API for computers to integrate “artificial, artificial intelligence” directly into their processing by making requests of humans. Developers use the Amazon Mechanical Turk web services API to submit tasks to the Amazon Mechanical Turk web site, approve completed tasks, and incorporate the answers into their software applications. To the application, the transaction looks very much like any remote procedure call: the application sends the request, and the service returns the results. In reality, a network of humans fuels this artificial, artificial intelligence by coming to the web site, searching for and completing tasks, and receiving payment for their work.

All software developers need to do is write normal code. The pseudo code below illustrates how simple this can be.

 read (photo);
 photoContainsHuman = callMechanicalTurk(photo);
 if (photoContainsHuman == TRUE) {
   acceptPhoto;
 }
 else {
   rejectPhoto;
 }

Given the source of the link, I was a little skeptical at first read, but it appears to be a legitimate beta project that just launched yesterday at Amazon. At least, the documentation links point back into Amazon Web Services, and at least one person seems to know someone there.

This is an interesting idea that should find some useful applications. Spammers have supposedly been doing something like this to defeat the image-based Turing tests used to screen comment posting systems, offering access to porn in exchange for solving the puzzles, and there are other anecdotes of using low cost offshore labor for similar tasks. Having a simpler web service interface for finding a human key operator somewhere will probably allow smaller and more experimental applications to emerge.

Update 11-04-2005 08:09 PST – Slashdot, TechDirt, Google Blogoscoped on Mechanical Turk, pointer to BoingBoing on porn puzzles and spam, captcha.net