Bookmarks for February 4th through February 11th

  • Schneier on Security: Interview with a Nigerian Internet Scammer – "We had something called the recovery approach. A few months after the original scam, we would approach the victim again, this time pretending to be from the FBI, or the Nigerian Authorities. The email would tell the victim that we had caught a scammer and had found all of the details of the original scam, and that the money could be recovered. Of course there would be fees involved as well. Victims would often pay up again to try and get their money back."
  • xkcd – Frequency of Strip Versions of Various Games – n = Google hits for "strip <game name>" / Google hits for "<game name>"
  • PeteSearch: How to split up the US – Visualization of social network clusters in the US. "information by location, with connections drawn between places that share friends. For example, a lot of people in LA have friends in San Francisco, so there's a line between them.

    Looking at the network of US cities, it's been remarkable to see how groups of them form clusters, with strong connections locally but few contacts outside the cluster. For example Columbus, OH and Charleston WV are nearby as the crow flies, but share few connections, with Columbus clearly part of the North, and Charleston tied to the South."

  • Redis: Lightweight key/value Store That Goes the Extra Mile | Linux Magazine – Sort of like memcache. "Calling redis a key/value store doesn’t quite due it justice. It’s better thought of as a “data structures” server that supports several native data types and operations on them. That’s pretty much how creator Salvatore Sanfilippo (known as antirez) describes it in the documentation. Let’s dig in and see how it works."
  • Op-Ed Contributor – Microsoft’s Creative Destruction – – Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually developed a system to thwart innovation. Despite having one of the largest and best corporate laboratories in the world, and the luxury of not one but three chief technology officers, the company routinely manages to frustrate the efforts of its visionary thinkers.

A last look at Twitter userbase growth (through June 2009)

A number of people have been asking about updates to the earlier posts on Twitter’s user profile population as well as some statistical analysis.  I’m joining the Microsoft Bing search team so I probably won’t be sharing as much data in the future, but I wanted to get a couple of charts out first.

Here’s an updated look at Twitter’s user base growth, through June 2009. This survey has many spam accounts pruned out, so the actual number of user profiles at any point in time is probably higher than the graph plotted here. Up and to the right, heading past 13M is the main takeaway. Also note that the majority of Twitter profiles have been created within the past few months. Compare with the graph through May 2009


Here’s the corresponding estimate of new user accounts per day. That first big spike is the Oprah show featuring Twitter.  Not sure exactly which media events go with the more recent spike, likely some combination of Ashton Kutcher vs CNN and other celebrities on a campaign to get more followers.  As a reminder, the graphs don’t really drop off at the  right edge, that’s just from new users not being discovered immediately.


Unfortunately I probably won’t be putting together any stats visualizations here as I transition the SocialQuant work to Microsoft Bing. But  I’m looking forward to help bring some interesting applications for Twitter and other social media on the Bing platform, and hope you’ll be able to enjoy some results there in the near future.

Twitter’s user growth per day

Twitter estimated new users per day through May 2009

Here is a companion to the Twitter user population growth chart from last week. This chart shows an estimate of the number of new users per day. The dashed blue bar is the 2009 US inauguration of Barack Obama, and the extreme spike is the Oprah Winfrey show featuring Twitter.

The data used for this chart isn’t as complete for the last week or so at the right hand edge, i.e. the rate of new user signups hasn’t gone to zero, and in fact remains quite high, not 100k users per day, but well above the “pre-mainstream adoption” user signup rates, in the range of 30-50K users/day. As of mid June, Twitter has more than 8M user accounts that have been created.

Twitter’s amazing user growth

Twitter estimated userbase through May 2009

The graph above shows an estimate of Twitter’s user population from its launch in March 2006 through May 2009, based on a sample of around 6 million observed user profiles. The dashed blue line is around the 2009 US inauguration of Barack Obama and where the transition from early adopter to early mass audience seems to have taken off.

The entire user population of Twitter appears to have reached 1 million sometime in January but today there are several accounts that have over 1M followers each.

Stated another way, if you signed up before February 2009, you can consider yourself something of an early adopter on Twitter, and among the earliest 15% or so of the entire user population.

The numbers in this survey are inexact but representative, taken from research I’ve been doing for SocialQuant and FailWatch.  There is some survivor bias built in, since I’m pruning spam and suspended accounts. Only Twitter knows the true state of the user base and the social graph, of course.

The initial Twitter users tend to know each other more in real  life, since much of the social network grew from friends of founders, SWSX attendees, and the San Francisco / Silicon Valley tech community. The more recent (post-Obama)  arrivals tend not to have connections to those networks, and often don’t know anyone else to follow. They arrive via mass media and celebrity campaigns, and end up following mass media and celebrities, either from the suggested users list or because those are the only people they know of.

If you look carefully, you can see the rate of increase slows down toward the end of the graph. There was a huge ramp in  new user signups around the time of the Oprah show, which has receded somewhat. This has led to blog posts about Twitter’s impending demise, but looking back, there have been previous surges in the user base (typically around SXSW etc) which led to a peak, then a drop in new user signups to an off-peak but higher-than-before average. So far the current surge is the largest, but seems to be following the pattern. In the absence of any  new driver, user growth should continue at an off-peak but higher level, until the next big jump, or something better comes along.

Bookmarks for June 6th through June 8th

  • Latin motto generator: make your own catchy slogans! – Create your own life mottos and slogans in Latin! (Learning Latin not required, some vague idea for a desired motto a plus)
  • A Map Of Social (Network) Dominance – Using Alexa and Google Trend data, Cosenza color-coded the map based on which social network is the most popular in each country. All of the light green countries belong to Facebook. But there are still pockets of resistance in Russia (where V Kontakte rules), China (QQ), Brazil and India (Orkut), Central America, Peru, Mongolia, and Thailand (hi5), South Korea (Cyworld), Japan (Mixi), the Middle East (Maktoob), and the Philippines (Friendster).
  • Microsoft Releases Bing API – With No Usage Quotas – Updated search API, with no quotas and some improvements.
    * Developers can now request data in JSON and XML formats. The SOAP interface that the Live Search API required has also been retained.
    * Requested data can be narrowed to one of the following source types: web, news, images, phonebook, spell-checker, related queries, and Encarta instant answer.
    * It is now possible to send requests in OpenSearch-compliant RSS format for web, news, image and phonebook queries.
    * Client applications will be able to combine any number of different data source types into a single request with a single query string.
  • Twitter Limits Getting Ridiculous! « Verwon’s Blog – Anecdotal reports of Twitter users running into problems with rate limiting, either API or max posts/tweets/follows/directs.
  • flot – Google Code – Flot is a pure Javascript plotting library for jQuery. It produces graphical plots of arbitrary datasets on-the-fly client-side. The focus is on simple usage (all settings are optional), attractive looks and interactive features like zooming and mouse tracking. The plugin is known to work with Internet Explorer 6/7/8, Firefox 2.x+, Safari 3.0+, Opera 9.5+ and Konqueror 4.x+. If you find a problem, please report it. Drawing is done with the canvas tag introduced by Safari and now available on all major browsers, except Internet Explorer where the excanvas Javascript emulation helper is used.

Bookmarks for June 3rd through June 4th

Bookmarks for June 1st through June 2nd

  • jqPlot – Pure Javascript Plotting – jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.
  • New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
  • Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
  • : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
  • Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.

Bookmarks for May 8th through May 12th

Bookmarks for May 5th through May 6th

Bookmarks for May 4th through May 5th

Bookmarks for May 3rd through May 4th

  • Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
  • Without Warning, Twitter Kills StatTweets (Businesses Beware) – ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
  • Dooley Online: What URL Shortener Should I Use? – Comparison of features and some usage data for URL shorteners such as tinyurl and used on twitter and other services.
  • Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
  • Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.

Bookmarks for April 28th from 05:35 to 14:24

  • Official Google Blog: Adding search power to public data – Interesting. Wonder if the underlying public data sets will eventually become available on Google App Engine as well, sort of like the public data sets available for use with Amazon EC2 applications.
  • MySQL And Search At Craigslist – Jeremy Zawodny's slides on MySQL, Sphinx, and free text search implementation at Craigslist, from last week's MySQL conference.
  • Skew, The Frontend Engineer’s Misery @ Irrational Exuberance – For mashups and the like, the distinction between a FE engineer and web dev is rather small in terms of technical skills; they are both using the same skillset, they are both interacting with APIs, and so on. However, there are important distinctions between the two: 1. web developers tend to move in small groups or as individuals, whereas fe engineers work in larger groups, 2. web developers tend to design a product on top of an existing backend service (api, etc), while fe engineers are usually working in parallel with the backend being developed.
  • Study: Twitter Audience Does Not Have A Return Policy – Over 60 percent of people who sign up to use the popular (and tremendously discussed) micro-blogging platform do not return to using it the following month, according to new data released by Nielsen Online. In other words, Twitter currently has just a 40 percent retention rate, up from just 30 percent in previous months–indicating an “I don’t get it factor” among new users that is reminiscent of the similarly-over hyped Second Life from a few years ago.
  • Hey Americans, Appreciate Your Freedom Of Speech : NPR – Firoozeh Dumas on the underappreciated freedoms of speech and expression we have in the US vs journalists and bloggers in Iran.

Bookmarks for April 12th from 17:02 to 19:13

Bookmarks for April 11th through April 12th

  • Wordle – Beautiful Word Clouds – Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes.
  • The dark side of Dubai – Johann Hari, Commentators – The Independent – "Dubai was meant to be a Middle-Eastern Shangri-La, a glittering monument to Arab enterprise and western capitalism. But as hard times arrive in the city state that rose from the desert sands, an uglier story is emerging."
  • Topless Robot – Hot Girls Have Lightsaber Strip-Fight for Your Viewing Pleasure – Star Wars CGI meets fake body spray ad
  • Poll Result: Best VPN to leap China’s Great Firewall? – Thomas Crampton – - Witopia – Undisputed winner. Quality of service, speed of surfing, though it is said to be relatively expensive at US$50 to US$60 per year. Hotspot Shield – Bandwidth limits can be painful. Force you to wait until the next month if you use it too much. – Ultrasurf – StrongVPN
  • InfoQ: Facebook: Science and the Social Graph – In this presentation filmed during QCon SF 2008 (November 2008), Aditya Agarwal discusses Facebook’s architecture, more exactly the software stack used, presenting the advantages and disadvantages of its major components: LAMP (PHP, MySQL), Memcache, Thrift, Scribe.
  • The Running Man, Revisited § SEEDMAGAZINE.COM – a handful of scientists think that these ultra-marathoners are using their bodies just as our hominid forbears once did, a theory known as the endurance running hypothesis (ER). ER proponents believe that being able to run for extended lengths of time is an adapted trait, most likely for obtaining food, and was the catalyst that forced Homo erectus to evolve from its apelike ancestors.

Bookmarks for April 3rd through April 7th

  • Agile Testing: Experiences deploying a large-scale infrastructure in Amazon EC2 – Practical guidance on using cloud computing at EC2. Expect failures, automate deployment, more.
  • joshua’s blog: on url shorteners – Joshua Schachter (founder of summary on the state of URL shorteners (tinyurl,, etc), and issues with 3rd party redirects, link sharing through twitter, etc.
  • Control Yourself » coming soon – On, plans for hosting sites, and federating microblogging status networks
  • There must be some way out of here (Scripting News) – Comments on the rise of celebrity accounts on Twitter, increasing spam/noise, and alternative models for and
  • Stochastic Models of User-Contributory Web Sites – Tad Hogg, Kristina Lerman 31 Mar 2009 Abstract: We describe a general stochastic processes-based approach to modeling user-contributory web sites, where users create, rate and share content. These models describe aggregate measures of activity and how they arise from simple models of individual users. This approach provides a tractable method to understand user activity on the web site and how this activity depends on web site design choices, especially the choice of what information about other users' behaviors is shown to each user. We illustrate this modeling approach in the context of user-created content on the news rating site Digg.

Bookmarks for March 16th through April 2nd

Bookmarks for March 12th through March 16th

Genius, in search of lab coat


Didn’t attend ETech this week, but thanks to a Twitter pointer from Gene Becker,  I did take a few breaks to participate in a collaborative future forecasting experiment at the event, organized by Institute For the Future / Signtific Labs. The general idea is to enlist game players to offer Twitter-like short notes with outlier ideas regarding a scenario under discussion, in this case the consequences of inexpensive ($100) 1kg microsatellites (“CubeSats”) capable of high speed networking and remote sensing. The same game framework could be used for any scenario, though. Bonus points are awarded to “Super-Interesting” ideas and ideas that result in additional discussion, which helped me out on the scoreboard.

Gene (“ubik“) won a “Feynman” award on the first day, and I managed to end up with a high score at ETech, thus winning a lab coat to go with my “Genius” label.

Some of my favorite future forecast contributions from “What will you do when space is as cheap and accessible as the Web is today?” (slide summary here):

Jurisdiction-free data haven built with csats full of rad-hard flash memory, hbase-style distributed replication across multiple nodes. Subpoena-proof anonymizers, for better or worse. Alternative, universal internet currency evolves, outside any government’s central bank control. Following forced disclosure of banking client list, Swiss government recognizes anonymous cSat net IDs, followed by Cayman, Bermuda etc.

CSats deorbited in vacant areas of oceans as impulse input to passive sonar imaging. Oceanographers get great maps, submarines lose stealth. Depending on how accurately you can drop a CSat, you can effectively “ping” a region and listen to the return signal through existing arrays. This really messes with strategic deterrence since now subs are vulnerable to first strike. But CSat deorbit is cheap WMD for all. On the positive side, detailed acoustic propagation data leads to new insights on ocean dynamics – bathymetrics, thermoclines, currents, etc. A similar version of dropping CSats on land might yield useful seismic imaging. But these would all be surface impulse, not at depth.

Csat data networks circumvent the Great Firewall of China and other govt access controls, leading to broader/safer citizen engagement online

CSat operating interface is marketed as a toy, like Tamagochi. Recharge, collect interesting data, avoid mean csats, team with friends. Organizations might post cash prize/rewards for things like locating missing ships, oil/trash dumping at sea, smokestack emissions, etc

Commodity traders are early adopters of CSat operator networks. Looking for crop yield data, mine production volumes, freight shipments etc. Among other things, CSat observations could give a more accurate estimate of “floating” oil parked in tankers as well as ongoing demand. Similarly, you’d get a decent idea of iron ore production by watching BHP’s railway in Australia, and the demand side in China, Korea etc. CSat data could improve the market visbility into supply/demand. But one might start creating Potemkin mining/farming operations etc… Sadly, credit derivative risk is not observable via CSat.

Ubiquitous, near real time satellite surveillance. No more privacy outdoors. But really good Google Maps. Ultra high resolution terrain maps of the world synthesized from multiple satellite passes/viewing aspects. Long term studies of effects of erosion, farming, development, earthquakes, flooding, drought, etc. Insurgents, militias, and terrorists get real time tactical data feeds, make use of homebrew UAVs, sensors, and in-field dispatch from afar. Turf wars among poppy and marijuana growers who now know where each other’s fields are. All vehicles – car, truck, rail, container, airplanes, etc – get a sky-facing ID plate. Maybe these should just be really big QR codes with an authoritative registry to foil car thieves from painting on bogus “plates”.

Now I need to figure out how to collect that lab coat.

Bookmarks for March 9th through March 12th

Bookmarks for March 6th through March 8th

