These are my links for February 4th through February 11th:
- Schneier on Security: Interview with a Nigerian Internet Scammer – "We had something called the recovery approach. A few months after the original scam, we would approach the victim again, this time pretending to be from the FBI, or the Nigerian Authorities. The email would tell the victim that we had caught a scammer and had found all of the details of the original scam, and that the money could be recovered. Of course there would be fees involved as well. Victims would often pay up again to try and get their money back."
- xkcd – Frequency of Strip Versions of Various Games – n = Google hits for "strip <game name>" / Google hits for "<game name>"
- PeteSearch: How to split up the US – Visualization of social network clusters in the US. "information by location, with connections drawn between places that share friends. For example, a lot of people in LA have friends in San Francisco, so there's a line between them.
Looking at the network of US cities, it's been remarkable to see how groups of them form clusters, with strong connections locally but few contacts outside the cluster. For example Columbus, OH and Charleston WV are nearby as the crow flies, but share few connections, with Columbus clearly part of the North, and Charleston tied to the South."
- Redis: Lightweight key/value Store That Goes the Extra Mile | Linux Magazine – Sort of like memcache. "Calling redis a key/value store doesn’t quite due it justice. It’s better thought of as a “data structures” server that supports several native data types and operations on them. That’s pretty much how creator Salvatore Sanfilippo (known as antirez) describes it in the documentation. Let’s dig in and see how it works."
- Op-Ed Contributor – Microsoft’s Creative Destruction – NYTimes.com – Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually developed a system to thwart innovation. Despite having one of the largest and best corporate laboratories in the world, and the luxury of not one but three chief technology officers, the company routinely manages to frustrate the efforts of its visionary thinkers.
A number of people have been asking about updates to the earlier posts on Twitter’s user profile population as well as some statistical analysis. I’m joining the Microsoft Bing search team so I probably won’t be sharing as much data in the future, but I wanted to get a couple of charts out first.
Here’s an updated look at Twitter’s user base growth, through June 2009. This survey has many spam accounts pruned out, so the actual number of user profiles at any point in time is probably higher than the graph plotted here. Up and to the right, heading past 13M is the main takeaway. Also note that the majority of Twitter profiles have been created within the past few months. Compare with the graph through May 2009
Here’s the corresponding estimate of new user accounts per day. That first big spike is the Oprah show featuring Twitter. Not sure exactly which media events go with the more recent spike, likely some combination of Ashton Kutcher vs CNN and other celebrities on a campaign to get more followers. As a reminder, the graphs don’t really drop off at the right edge, that’s just from new users not being discovered immediately.
Unfortunately I probably won’t be putting together any stats visualizations here as I transition the SocialQuant work to Microsoft Bing. But I’m looking forward to help bring some interesting applications for Twitter and other social media on the Bing platform, and hope you’ll be able to enjoy some results there in the near future.
Twitter estimated new users per day through May 2009
Here is a companion to the Twitter user population growth chart from last week. This chart shows an estimate of the number of new users per day. The dashed blue bar is the 2009 US inauguration of Barack Obama, and the extreme spike is the Oprah Winfrey show featuring Twitter.
The data used for this chart isn’t as complete for the last week or so at the right hand edge, i.e. the rate of new user signups hasn’t gone to zero, and in fact remains quite high, not 100k users per day, but well above the “pre-mainstream adoption” user signup rates, in the range of 30-50K users/day. As of mid June, Twitter has more than 8M user accounts that have been created.
Twitter estimated userbase through May 2009
The graph above shows an estimate of Twitter’s user population from its launch in March 2006 through May 2009, based on a sample of around 6 million observed user profiles. The dashed blue line is around the 2009 US inauguration of Barack Obama and where the transition from early adopter to early mass audience seems to have taken off.
The entire user population of Twitter appears to have reached 1 million sometime in January but today there are several accounts that have over 1M followers each.
Stated another way, if you signed up before February 2009, you can consider yourself something of an early adopter on Twitter, and among the earliest 15% or so of the entire user population.
The numbers in this survey are inexact but representative, taken from research I’ve been doing for SocialQuant and FailWatch. There is some survivor bias built in, since I’m pruning spam and suspended accounts. Only Twitter knows the true state of the user base and the social graph, of course.
The initial Twitter users tend to know each other more in real life, since much of the social network grew from friends of founders, SWSX attendees, and the San Francisco / Silicon Valley tech community. The more recent (post-Obama) arrivals tend not to have connections to those networks, and often don’t know anyone else to follow. They arrive via mass media and celebrity campaigns, and end up following mass media and celebrities, either from the suggested users list or because those are the only people they know of.
If you look carefully, you can see the rate of increase slows down toward the end of the graph. There was a huge ramp in new user signups around the time of the Oprah show, which has receded somewhat. This has led to blog posts about Twitter’s impending demise, but looking back, there have been previous surges in the user base (typically around SXSW etc) which led to a peak, then a drop in new user signups to an off-peak but higher-than-before average. So far the current surge is the largest, but seems to be following the pattern. In the absence of any new driver, user growth should continue at an off-peak but higher level, until the next big jump, or something better comes along.
These are my links for June 6th through June 8th:
- Latin motto generator: make your own catchy slogans! – Create your own life mottos and slogans in Latin! (Learning Latin not required, some vague idea for a desired motto a plus)
- A Map Of Social (Network) Dominance – Using Alexa and Google Trend data, Cosenza color-coded the map based on which social network is the most popular in each country. All of the light green countries belong to Facebook. But there are still pockets of resistance in Russia (where V Kontakte rules), China (QQ), Brazil and India (Orkut), Central America, Peru, Mongolia, and Thailand (hi5), South Korea (Cyworld), Japan (Mixi), the Middle East (Maktoob), and the Philippines (Friendster).
- Microsoft Releases Bing API – With No Usage Quotas – Updated search API, with no quotas and some improvements.
* Developers can now request data in JSON and XML formats. The SOAP interface that the Live Search API required has also been retained.
* Requested data can be narrowed to one of the following source types: web, news, images, phonebook, spell-checker, related queries, and Encarta instant answer.
* It is now possible to send requests in OpenSearch-compliant RSS format for web, news, image and phonebook queries.
* Client applications will be able to combine any number of different data source types into a single request with a single query string.
- Twitter Limits Getting Ridiculous! « Verwon’s Blog – Anecdotal reports of Twitter users running into problems with rate limiting, either API or max posts/tweets/follows/directs.
These are my links for June 3rd through June 4th:
These are my links for June 1st through June 2nd:
- New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – HarvardBusiness.org – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
- Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
- well-formed.eigenfactor.org : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
- Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.
These are my links for May 8th through May 12th:
These are my links for May 5th through May 6th:
- Coding Horror: I Just Logged In As You: How It Happened – On good password management, why forums should mostly not be storing user passwords in general, and how re-use of passwords on multiple sites can lead to vulnerability on other sites.
- Arc Forum | Arc – Arc is a version of Lisp. Among other things it is used to implement Hacker News.
- John Graham-Cumming: Can you trust Paul Graham with your password? – On best practices for storing password hashes to avoid attacks on compromised password files and the use of rainbow files, in a look at Hacker News implementation of passwords
- Deliberate Ambiguity: How *not* to rate a search engine – Search engines have very simple user interfaces, but are used in many different contexts, most of which don't resemble the way people often try out a new search engine.
- The Slow Erosion of Google Search – Bokardo – On changes in internet user behaviors over time, more social media (ask your Twitter friends) vs directed search (send a keyword query) etc.
- Brynn Marie Evans » Why social search won’t topple Google (anytime soon) – On differences between searching through social media such as Twitter, Facebook etc, vs Google etc.
- The Financial Services Club’s Blog: Stock picking with real-time news – Looking at real time social media trends for trading ideas.
- Lisp’s reputation is so bad that many people don’t even take a look at Lisp | International Lisp Conference 2009 – I haven't touched Lisp in years, except maybe for configuring emacs. A list of possible reasons why Lisp is not more widely used, e.g. "Lisp is old and moldy. It must be primitive by today's standards.", "The exciting languages to learn now are Python, Ruby, Groovy, etc."
- Peering into North Korea – The Big Picture – Boston.com – A collection of recent photos of scenes from North Korea.
These are my links for May 4th through May 5th:
- Inﬂuential Nodes in a Diﬀusion Model for Social Networks (icalp05-inf.pdf) – Kempe, Kleinberg, Tardos. Algorithm for greedy approximation of most influential nodes in social network (63% of optimal) under various conditions.
- Maximizing the Spread of Inﬂuence through a Social Network (kdd03-inf.pdf) – Kempe, Kleinberg, Tardos. Maximizing propagation by selecting most influential nodes is NP-hard, but a greedy approximation can work well (63% of optimal) under various conditions.
- Notification Strategies for Social Networks – Discussion on approaches to maximizing use of a limited number of notifications within social networks e.g. Facebook
- James Smith • loopj.com » Blog Archive » jQuery Plugin: Tokenizing Autocomplete Text Entry – Looks handy – "This is a jQuery plugin to allow users to select multiple items from a predefined list, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook."
- Google Code FAQ – Using cURL to interact with Google data services – Step by step tutorial on using curl with Google data APIs.
- Behind The Business Plan Of Pirates Inc. : NPR – It takes around $250K to fund a Somali pirate operation. About 20 percent goes to pay off officials who look the other way. About 50 percent is for expenses and payroll. The leader of an attack makes $10,000 to $20,000 (the average Somali family lives on $500 a year). The initial investor — who put in $250,000 of seed capital — gets 30 percent, sometimes up to $500,000.
These are my links for May 3rd through May 4th:
- Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
- Without Warning, Twitter Kills StatTweets (Businesses Beware) – StatSheet.com ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
- Dooley Online: What URL Shortener Should I Use? – Comparison of features and some usage data for URL shorteners such as tinyurl and bit.ly used on twitter and other services.
- Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
- Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.
These are my links for April 28th from 05:35 to 14:24:
- Official Google Blog: Adding search power to public data – Interesting. Wonder if the underlying public data sets will eventually become available on Google App Engine as well, sort of like the public data sets available for use with Amazon EC2 applications.
- MySQL And Search At Craigslist – Jeremy Zawodny's slides on MySQL, Sphinx, and free text search implementation at Craigslist, from last week's MySQL conference.
- Skew, The Frontend Engineer’s Misery @ Irrational Exuberance – For mashups and the like, the distinction between a FE engineer and web dev is rather small in terms of technical skills; they are both using the same skillset, they are both interacting with APIs, and so on. However, there are important distinctions between the two: 1. web developers tend to move in small groups or as individuals, whereas fe engineers work in larger groups, 2. web developers tend to design a product on top of an existing backend service (api, etc), while fe engineers are usually working in parallel with the backend being developed.
- Study: Twitter Audience Does Not Have A Return Policy – Over 60 percent of people who sign up to use the popular (and tremendously discussed) micro-blogging platform do not return to using it the following month, according to new data released by Nielsen Online. In other words, Twitter currently has just a 40 percent retention rate, up from just 30 percent in previous months–indicating an “I don’t get it factor” among new users that is reminiscent of the similarly-over hyped Second Life from a few years ago.
- Hey Americans, Appreciate Your Freedom Of Speech : NPR – Firoozeh Dumas on the underappreciated freedoms of speech and expression we have in the US vs journalists and bloggers in Iran.
These are my links for April 12th from 17:02 to 19:13:
These are my links for April 11th through April 12th:
- Wordle – Beautiful Word Clouds – Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes.
- The dark side of Dubai – Johann Hari, Commentators – The Independent – "Dubai was meant to be a Middle-Eastern Shangri-La, a glittering monument to Arab enterprise and western capitalism. But as hard times arrive in the city state that rose from the desert sands, an uglier story is emerging."
- Topless Robot – Hot Girls Have Lightsaber Strip-Fight for Your Viewing Pleasure – Star Wars CGI meets fake body spray ad
- Poll Result: Best VPN to leap China’s Great Firewall? – Thomas Crampton – - Witopia – Undisputed winner. Quality of service, speed of surfing, though it is said to be relatively expensive at US$50 to US$60 per year. Hotspot Shield – Bandwidth limits can be painful. Force you to wait until the next month if you use it too much. – Ultrasurf – StrongVPN
- InfoQ: Facebook: Science and the Social Graph – In this presentation filmed during QCon SF 2008 (November 2008), Aditya Agarwal discusses Facebook’s architecture, more exactly the software stack used, presenting the advantages and disadvantages of its major components: LAMP (PHP, MySQL), Memcache, Thrift, Scribe.
- The Running Man, Revisited § SEEDMAGAZINE.COM – a handful of scientists think that these ultra-marathoners are using their bodies just as our hominid forbears once did, a theory known as the endurance running hypothesis (ER). ER proponents believe that being able to run for extended lengths of time is an adapted trait, most likely for obtaining food, and was the catalyst that forced Homo erectus to evolve from its apelike ancestors.
These are my links for April 3rd through April 7th:
- Agile Testing: Experiences deploying a large-scale infrastructure in Amazon EC2 – Practical guidance on using cloud computing at EC2. Expect failures, automate deployment, more.
- joshua’s blog: on url shorteners – Joshua Schachter (founder of del.icio.us) summary on the state of URL shorteners (tinyurl, bit.ly, etc), and issues with 3rd party redirects, link sharing through twitter, etc.
- Control Yourself » status.net coming soon – On status.net, plans for hosting laconi.ca sites, and federating microblogging status networks
- There must be some way out of here (Scripting News) – Comments on the rise of celebrity accounts on Twitter, increasing spam/noise, and alternative models for laconi.ca and status.net
- Stochastic Models of User-Contributory Web Sites – Tad Hogg, Kristina Lerman 31 Mar 2009 Abstract: We describe a general stochastic processes-based approach to modeling user-contributory web sites, where users create, rate and share content. These models describe aggregate measures of activity and how they arise from simple models of individual users. This approach provides a tractable method to understand user activity on the web site and how this activity depends on web site design choices, especially the choice of what information about other users' behaviors is shown to each user. We illustrate this modeling approach in the context of user-created content on the news rating site Digg.
These are my links for March 16th through April 2nd:
- Google uncloaks once-secret server | Business Tech – CNET News – Photo and more comments on the Google data center server configuration, 12vdc only, local battery, shown at yesterday's data center power conference.
- Google’s Custom Web Server, Revealed « Data Center Knowledge – 1:30 video of current server configuration, from Google Data Center Energy Summit, April 1, 2009. Open shelf, power supply with built in battery (per-unit UPS) rather than centralized UPS.
- HerHotSpot Uses Facebook Connect to Block Boys Out – Relies on Facebook profile data to limit boys access to site targeting girls only. Uses FBConnect as the exclusive login method.
- SandHill.com | Opinion : Cloud Computing Ecosystem Map v1.0: Standing on the Shoulders of Giants – Collection of pointers to maps of the cloud computing ecosystem, and a merged map, as of March 2009
- Penny Arcade! – Le Twittre –
These are my links for March 12th through March 16th:
Didn’t attend ETech this week, but thanks to a Twitter pointer from Gene Becker, I did take a few breaks to participate in a collaborative future forecasting experiment at the event, organized by Institute For the Future / Signtific Labs. The general idea is to enlist game players to offer Twitter-like short notes with outlier ideas regarding a scenario under discussion, in this case the consequences of inexpensive ($100) 1kg microsatellites (“CubeSats”) capable of high speed networking and remote sensing. The same game framework could be used for any scenario, though. Bonus points are awarded to “Super-Interesting” ideas and ideas that result in additional discussion, which helped me out on the scoreboard.
Gene (“ubik“) won a “Feynman” award on the first day, and I managed to end up with a high score at ETech, thus winning a lab coat to go with my “Genius” label.
Some of my favorite future forecast contributions from “What will you do when space is as cheap and accessible as the Web is today?” (slide summary here):
Jurisdiction-free data haven built with csats full of rad-hard flash memory, hbase-style distributed replication across multiple nodes. Subpoena-proof anonymizers, for better or worse. Alternative, universal internet currency evolves, outside any government’s central bank control. Following forced disclosure of banking client list, Swiss government recognizes anonymous cSat net IDs, followed by Cayman, Bermuda etc.
CSats deorbited in vacant areas of oceans as impulse input to passive sonar imaging. Oceanographers get great maps, submarines lose stealth. Depending on how accurately you can drop a CSat, you can effectively “ping” a region and listen to the return signal through existing arrays. This really messes with strategic deterrence since now subs are vulnerable to first strike. But CSat deorbit is cheap WMD for all. On the positive side, detailed acoustic propagation data leads to new insights on ocean dynamics – bathymetrics, thermoclines, currents, etc. A similar version of dropping CSats on land might yield useful seismic imaging. But these would all be surface impulse, not at depth.
Csat data networks circumvent the Great Firewall of China and other govt access controls, leading to broader/safer citizen engagement online
CSat operating interface is marketed as a toy, like Tamagochi. Recharge, collect interesting data, avoid mean csats, team with friends. Organizations might post cash prize/rewards for things like locating missing ships, oil/trash dumping at sea, smokestack emissions, etc
Commodity traders are early adopters of CSat operator networks. Looking for crop yield data, mine production volumes, freight shipments etc. Among other things, CSat observations could give a more accurate estimate of “floating” oil parked in tankers as well as ongoing demand. Similarly, you’d get a decent idea of iron ore production by watching BHP’s railway in Australia, and the demand side in China, Korea etc. CSat data could improve the market visbility into supply/demand. But one might start creating Potemkin mining/farming operations etc… Sadly, credit derivative risk is not observable via CSat.
Ubiquitous, near real time satellite surveillance. No more privacy outdoors. But really good Google Maps. Ultra high resolution terrain maps of the world synthesized from multiple satellite passes/viewing aspects. Long term studies of effects of erosion, farming, development, earthquakes, flooding, drought, etc. Insurgents, militias, and terrorists get real time tactical data feeds, make use of homebrew UAVs, sensors, and in-field dispatch from afar. Turf wars among poppy and marijuana growers who now know where each other’s fields are. All vehicles – car, truck, rail, container, airplanes, etc – get a sky-facing ID plate. Maybe these should just be really big QR codes with an authoritative registry to foil car thieves from painting on bogus “plates”.
Now I need to figure out how to collect that lab coat.
These are my links for March 9th through March 12th:
- Google Friend Connect APIs – Google Code –
- Geek And Poke – Mostly twitter and cloud computing themed cartoons.
- Official Google Blog: Here comes Google Voice – GrandCentral makes a comeback, after disappearing into Google a while back. Now with voice transcription, SMS folders, and integration with GMail address book.
- Amazon Web Services Blog: Announcing Amazon EC2 Reserved Instances – AWS introduces pricing structure for longer term, reserved capacity. Upfront payment, plus a (lower) incremental hourly charge, net savings for continuous 24×7 clients, and guaranteed availability of instances for backup or surge capacity.
- How To Monetize a Social Network: MySpace and Facebook Should Follow TenCent « abovethecrowd.com – Bill Gurley on the case for virtual goods and casual gaming as revenue vehicles on US-based social networking sites, in a look at China-based QQ / TenCent.
- Too Big Has Failed – Thomas Hoenig, Kansas City Federal Reserve Bank, March 6, 2009 (PDF) – Hoenig argues that too-big-to-fail institutions have failed, US banks will require some form of nationalization eventually.
These are my links for March 6th through March 8th:
- Wolfram Blog : Wolfram|Alpha Is Coming! –
- Wolfram Alpha is Coming — and It Could be as Important as Google | Twine –
- Wolfram Alpha — it’s like plugging into an electronic brain » VentureBeat –
- If browsers were women – Sharenator.org – "[Chrome] Extremely skinny, but very cool and friendly. However, when it comes to the bedroom, she is very inexperienced and has little to offer. [IE] For most, she's the first woman they tried. She's really easy but can get you infected." etc etc
- Rough Type: Nicholas Carr’s Blog: The coming of the megacomputer – Nick Carr commentary on Rick Rashid's statement that 20% of servers were going to major cloud data centers. Also some interesting discussion in comments.
- FT.com | Tech Blog | How many computers does the world need? – According to Microsoft research chief Rick Rashid, around 20 per cent of all the servers sold around the world each year are now being bought by a small handful of internet companies – he named Microsoft, Google, Yahoo and Amazon.
- The New Hot Cuisine: Korean – WSJ.com – Korean food is slowly making its way into mainstream awareness, both high end (French Laundry, Le Bernardin) and everyday (CPK, Kogi BBQ).
- WriteOnIt – Fake pictures – Build fake magazine covers, newspapers, and photos.