Bookmarks for December 31st through January 17th

These are my links for December 31st through January 17th:

  • Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.

    We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.

  • StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
    The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
    * Planning
    * Data Mining
    * Machine Learning
    * Case-Based Reasoning
  • Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
  • Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
  • SSH Tunneling through web filters | – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
  • PyDroid GUI automation toolkit – GitHub – What is Pydroid?

    Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
    Why use Pydroid?

    * Testing a GUI application for bugs and edge cases
    o You might think your app is stable, but what happens if you press that button 5000 times?
    * Automating games
    o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
    * Freaking out friends and family
    o Well maybe this isn't really a practical use, but…

  • Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
  • How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
  • Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:

    * TinkerGraph in-memory graph
    * Neo4j graph database
    * Sesame 2.0 compliant RDF stores
    * MongoDB document database

    The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.

  • The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
    (int mene[], int wgah, int nagl) {
    int Ia, fhtagn;
    if (wgah>=nagl) return;
    swap (mene,wgah,(wgah+nagl)/2);
    fhtagn = wgah;
    for (Ia=wgah+1; Ia<=nagl; Ia++)
    if (mene[Ia]<mene[wgah])
    swap (mene,++fhtagn,Ia);
    swap (mene,wgah,fhtagn);
    Rlyeh (mene,wgah,fhtagn-1);
    Rlyeh (mene,fhtagn+1,nagl);


  • How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
  • Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
  • Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
  • PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
  • Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
  • mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."

Bookmarks for May 14th through May 15th

These are my links for May 14th through May 15th:

  • Congratulations, Google staff: $210k in profit per head in 2008 | Royal Pingdom – Google had $209,624 in profit per employee in 2008, which beats all the other large tech companies we looked at, including big hitters like Microsoft ($194K), Apple ($151K), Intel ($64K) and IBM ($30K).
  • Statistical Data Mining Tutorials – A nice collection of presentations reviewing topics in data mining and machine learning. e.g. "HillClimbing, Simulated Annealing and Genetic Algorithms. Some very useful algorithms, to be used only in case of emergency." These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
  • Dare Obasanjo aka Carnage4Life – Why Twitter’s Engineers Hate the @replies feature – Looking at the infrastructure overhead required for Twitter's attempted change to @reply behavior.
  • Scratch Helps Kids Get With the Program – Gadgetwise Blog – – On my candidate list for 7th grade introductory programming and analysis. "Scratch, an M.I.T.-developed computer-programming language for children, is the focus of worldwide show-and-tell sessions this Saturday. "
  • jLinq – Javascript Query Language – For manipulating data sets in Javascript, sort of like jQuery

Bookmarks for May 5th through May 6th

These are my links for May 5th through May 6th:

Bookmarks for May 4th through May 5th

These are my links for May 4th through May 5th:

Bookmarks for May 3rd through May 4th

These are my links for May 3rd through May 4th:

  • Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
  • Without Warning, Twitter Kills StatTweets (Businesses Beware) – ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
  • Dooley Online: What URL Shortener Should I Use? – Comparison of features and some usage data for URL shorteners such as tinyurl and used on twitter and other services.
  • Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
  • Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.

Bookmarks for April 13th through April 15th

These are my links for April 13th through April 15th:

Bookmarks for March 12th through March 16th

These are my links for March 12th through March 16th:

Bookmarks for March 9th through March 12th

These are my links for March 9th through March 12th:

Bookmarks for February 28th through March 1st

These are my links for February 28th through March 1st:

  • Community Data – Swivel – User contributed datasets, for visualization and graphs with Swivel
  • Obamameter – Map visualization of economic stimulus outlays. "Keep tabs on the the US economy, the global economy and the stimulus through our dashboard for the economy."
  • – Slide presentation on data sources and construction of initial site in Jan 2009, from talk at Transparency Camp.
  • Virtual Hoff : DoxPara Research – Slides from Dan Kaminsky's talk at CloudCamp Seattle on network and application security issues in cloud and virtualized computing environments.
  • Can You Buy a Silicon Valley? Maybe. – from Paul Graham – "If you could get startups to stick to your town for a million apiece, then for a billion dollars you could bring in a thousand startups. That probably wouldn't push you past Silicon Valley itself, but it might get you second place. For the price of a football stadium, any town that was decent to live in could make itself one of the biggest startup hubs in the world."
  • Berkshire Hathaway 2008 shareholders letter (PDF) – Warren Buffet reviews the state of the financial markets, his worst year ever, and the outlook for 2009.
  • White House 2: Where YOU set the nation’s priorities – Not the actual White House, but an interesting experiment in collaborative input for setting government agenda.
  • Python for Lisp Programmers – Peter Norvig examines Python. "(Although it wasn't my intent, Python programers have told me this page has helped them learn Lisp.) Basically, Python can be seen as a dialect of Lisp with "traditional" syntax (what Lisp people call "infix" or "m-lisp" syntax). One message on comp.lang.python said "I never understood why LISP was a good idea until I started playing with python." Python supports all of Lisp's essential features except macros, and you don't miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so you can create custom languages that way. "

Bookmarks for February 26th through February 27th

These are my links for February 26th through February 27th:

Registered for SF MusicTech 2009

Took advantage of the discounted ($49 through end February) early registration for SF MusicTech, coming up on May 18th.

The SanFran MusicTech Summit will bring together the best and brightest developers in the Music/Technology Space, along with the musicians, entrepreneurial business people, press, investors, service providers, and organizations who work with them at the convergence of culture and commerce. We will meet to discuss the evolving music/business/technology ecosystem in a proactive, conducive to dealmaking environment.

Unfortunately, it overlaps with ICWSM09, will try to make both though.

Bookmarks for February 23rd through February 24th

These are my links for February 23rd through February 24th:

Bookmarks for February 21st from 13:59 to 21:55

These are my links for February 21st from 13:59 to 21:55:

Bookmarks for February 16th through February 17th

These are my links for February 16th through February 17th:

  • Top 100 Network Security Tools – Many many security testing and hacking tools.
  • FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
  • The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
  • Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
  • Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
  • Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."

Bookmarks for February 14th through February 15th

These are my links for February 14th through February 15th:

How to make a small fortune

…start with a large one.

This weekend’s news that JP Morgan will take over Bear Stearns for around $2/share is astonishing. Last Friday BSC closed at around 30. A week ago it was around 60. A year ago it was around $150/share. So in a year the shares are down by 99%, and even the bargain shoppers who got in on Friday are down by something like 85%, based on today’s close.


Global Markets Daily Trading Schedule


Global financial markets are linked more closely than ever. Here’s a crib sheet of a few markets of interest and their opening/closing times in Pacific Time (US West Coast).

PST EST Market
12:00M 3:00am London stock exchange open (8:00am local)
Frankfurt stock exchange open (9:00am local)
Hong Kong stock exchange afternoon session close (4:00pm local)
1:00am 4:00am Singapore stock exchange afternoon session close (5:00pm local)
3:30am 6:30am Bombay stock exchange close (3:30pm local)
5:00am 8:00am US ECN premarket open
6:30am 9:30am NYSE, NASDAQ, AMEX, TSE markets open
8:30am 11:30am London stock exchange close (4:30pm local)
Frankfurt stock exchange close (5:30pm local)
1:00pm 4:00pm NYSE NASDAQ, AMEX, TSE market close, US afterhours ECN trading open
1:15pm 4:15pm CME close (US Globex electronics futures daily close)
1:30pm 4:30pm CME open (US Globex electronics futures open)
3:00pm 6:00pm CME Sunday/Holiday open (US Globex electronic futures weekly open)
4:00pm 7:00pm Tokyo stock exchange morning session open (9:00am local)
Korean stock exchange open (9:00am local)
Australian stock exchange open (10:00am local)
5:00pm 8:00pm Singapore stock exchange morning session open (9:00am local)
Taiwan stock exchange open (9:00am local)
US afterhours ECN trading close
5:30pm 8:30pm Shanghai stock exchange morning session open (9:30am local)
6:00pm 9:00pm Hong Kong stock exchange morning session open (10:00am local)
Tokyo stock exchange morning session close (11:00am local)
7:30pm 10:30pm Tokyo stock exchange afternoon session open (12:30pm  local)
Shanghai stock exchange morning session close (11:30am local)
8:30pm 11:30pm Hong Kong stock exchange morning session close (12:30pm local)
Singapore stock exchange morning session close (12:30pm local)
9:00pm 12:00M Shanghai stock exchange afternoon session open (1:00pm  local)
9:25pm 12:25am Bombay stock exchange open (9:55am  local)
10:00pm 1:00am Tokyo stock exchange afternoon session close (3:00pm local)
Australian stock exchange close (4:00pm local)
Singapore stock exchange afternoon session open (2:00pm local)
10:15pm 1:15am Korean stock exchange close (3:15pm local)
10:30pm 1:30am Hong Kong stock exchange afternoon session open (2:30pm local)
Taiwan stock exchange close (1:30pm local)
11:00pm 2:00am Shanghai stock exchange afternoon session close (3:00pm local)


There are often interesting interactions at major market open and closes, especially during the overlap between the US market open and European market close. In addition, US index futures, particularly the ES (S&P 500 e-mini futures) also trade nearly around the clock, closing only for daily settlement between 4:15 and 4:30pm US East Coast time (plus weekend and holidays).

Note that many Asian markets have morning and afternoon sessions, and close for lunch. Different countries also observe differing practices with respect to Daylight Savings Time, so the relative timing may change seasonally. You may find it useful to check with a World Clock for the current times. Also remember that Asia begins it’s week on Sunday evening in the US, and is closed for the week by the time it’s Friday in the US.

Jerome Kerviel’s not-so-excellent adventure in the futures market


A cautionary tale gets added to market lore. This is going to make a good movie at some point…

In one of the banking world’s most unsettling recent disclosures, France’s Société Générale SA said Mr. Kerviel had cost the bank €4.9 billion, equal to $7.2 billion, by making huge unauthorized trades that he hid for months by hacking into computers. The combined trading positions he built up over recent months, say people close to the situation, totaled some €50 billion, or $73 billion.

Mr. Kerviel is no trading legend who let a transaction get out of hand. He was a low-level trader in the bank’s “Delta One” desk in western Paris, earning about €100,000 ($145,000) a year. His job was to make bets on how large European stock indexes would move, according to bank officials. His expertise was trading baskets of stocks such as the Euro Stoxx 50.

At $7.2 billion, this loss is larger than than the estimated 2006 GDP of 65 of the 183 countries tracked by the World Bank. It’s just about the entire output of Cambodia ($7.193 billion), and greater than the combined output of Seychelles, Liberia, Grenada, Gambia, Saint Kitts and Nevis, Saint Vincent and the Grenadines, Samoa, Comoros, Vanuatu, East Timor, Solomon Islands, Guinea-Bissau, Dominica, Micronesia, Tonga, Palau, Marshall Islands, São Tomé and Príncipe, and Kiribati ($6.846 billion).

I’ve noticed that the news coverage keep reporting “fraud”, which is apparently true (he made up fictitious trades with outside partners of the bank), but it mostly sounds like internal risk controls failed in more than one place.

Of course, if it had gone the other way and turned a profit, we never would have heard about it.

Current NYSE circuit breaker levels: -1350, -2700

Looks like no one was impressed by Bernanke’s non-intervention on Thursday and the Bush/Paulson send-everyone-800-bucks stimulus package. US markets are closed for Martin Luther King Day, but the rest of the world is open, and down hard. DJ futures are showing something like -520 for tomorrow’s open, around 11586. The NYSE circuit breaker rules don’t kick in until a 10% move (1350 points), which would be somewhere around 10740. (The thresholds get reset every quarter.)

Don’t think we’ll see that tomorrow, but the way things have been going recently, it’s not out of the question. I wouldn’t be surprised to see a last-ditch central bank intervention tomorrow morning before the open, either. I think they missed their chance on Thursday, but I also don’t think the Fed has the luxury of waiting until the official FOMC meeting January 29 for their next move.

Hey, remind me again, who’s the Fed Chair?

Fed chair Ben Bernanke appeared before the House Budget Committee this morning, giving a prepared statement, then taking questions from the panel members. Aside from the content of his comments (growth is slowing, we’re not in a recession, some quick economic stimulus would be good), I always find it unsettling to see and hear the questions from our elected officials on the budget committee, as they tend to make speeches posing as questions, that sometimes border on the absurd. Basically, they pretend to ask questions, and the Fed Chair pretends to give answers.

One congresswoman had Ben Bernanke confused with Hank Paulsen, (former head of Goldman Sachs, now Treasury Secretary) in a prepared question asking if the bankers who caused the credit market problems would repay their bonuses and salaries to the American people. You’d think at least her staff would be able to keep track of who was at Treasury and Fed. Ben probably wishes he had the bonuses she wanted him to repay.

The short term trading question tonight is whether we see the widely-expected “surprise” rate cut premarket tomorrow to ambush the index option traders before the open, like the discount window cut before the August 17 options expiration. Unlike equity options, US index options mostly settle based on the opening trades on expiration day. Futures are creeping up overnight, in case. But they already pulled that trick once, and everyone is watching for it, which means that even if they do it again, it won’t work as well as last time.

2008 so far: S&P down 9.2%, DJ -8.33%, Nas -11.51%

Update 01-19-2008 09:15 PT – The confused congresswoman is Marcy Kaptur currently on her 13th (!) term as US Representative from Ohio.

“CEO of the Princeton Economics Department”. At least he has a sense of humor.

Page 1 of 3123