Bookmarks for December 31st through January 17th

These are my links for December 31st through January 17th:

  • Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.

    We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.

  • StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
    The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
    * Planning
    * Data Mining
    * Machine Learning
    * Case-Based Reasoning
  • Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
  • Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
  • SSH Tunneling through web filters | s-anand.net – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
  • PyDroid GUI automation toolkit – GitHub – What is Pydroid?

    Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
    Why use Pydroid?

    * Testing a GUI application for bugs and edge cases
    o You might think your app is stable, but what happens if you press that button 5000 times?
    * Automating games
    o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
    * Freaking out friends and family
    o Well maybe this isn't really a practical use, but…

  • Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
  • How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
  • Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:

    * TinkerGraph in-memory graph
    * Neo4j graph database
    * Sesame 2.0 compliant RDF stores
    * MongoDB document database

    The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.

  • The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
    (int mene[], int wgah, int nagl) {
    int Ia, fhtagn;
    if (wgah>=nagl) return;
    swap (mene,wgah,(wgah+nagl)/2);
    fhtagn = wgah;
    for (Ia=wgah+1; Ia<=nagl; Ia++)
    if (mene[Ia]<mene[wgah])
    swap (mene,++fhtagn,Ia);
    swap (mene,wgah,fhtagn);
    Rlyeh (mene,wgah,fhtagn-1);
    Rlyeh (mene,fhtagn+1,nagl);

    } // PH'NGLUI MGLW'NAFH CTHULHU!

  • How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
  • Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
  • Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
  • PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
  • Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
  • mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."

Bookmarks for May 3rd through May 4th

These are my links for May 3rd through May 4th:

  • Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
  • Without Warning, Twitter Kills StatTweets (Businesses Beware) – StatSheet.com ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
  • Dooley Online: What URL Shortener Should I Use? – Comparison of features and some usage data for URL shorteners such as tinyurl and bit.ly used on twitter and other services.
  • Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
  • Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.

Bookmarks for March 12th through March 16th

These are my links for March 12th through March 16th:

Bookmarks for March 9th through March 12th

These are my links for March 9th through March 12th:

Bookmarks for February 23rd through February 24th

These are my links for February 23rd through February 24th:

Bookmarks for February 21st from 13:59 to 21:55

These are my links for February 21st from 13:59 to 21:55:

Bookmarks for February 16th through February 17th

These are my links for February 16th through February 17th:

  • Top 100 Network Security Tools – Many many security testing and hacking tools.
  • FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
  • The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
  • Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
  • Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
  • Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, last.fm during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."

May as well put this guy in charge of the banks

Another day, another subprime-related fiasco. Today GE Asset Management announced that one of its not-quite-money-market short bond funds, the Enhanced Cash Trust, took a loss from subprime holdings, and is offering customer redemptions at 96 cents on the dollar. Normally these funds are considered to be a higher-yielding version of a money market fund. This would make you pretty unhappy if you were looking for 5%-ish stable returns while waiting for the stock market to settle down.

Along these lines, here are British comedians John Fortune & John Bird chatting about the state of the banking system, Northern Rock, and subprime in another interview of “George Parr, investment banker” from last month.

See also: Subprime crisis explained, by British comedians

Prosper – a social lending marketplace


This evening I’ve been looking over Prosper (formerly known as CircleOne), a social lending site similar to Zopa, which provides an eBay-like marketplace for borrowers and lenders to transact loans.

Prosper manages credit scoring, loan servicing, and provides social and economic incentives for borrower groups to build their reputation as good lending risks. All loans are 36 months with no prepayment penalty, and Prosper charges a 1% origination fee from the borrower, and 0.5% loan servicing fee from the lender. Groups with good reputations can get some incentive payments for loan performance and lower rates over time.

In addition to using credit scoring, social lending and finance approaches can be effective and much less risky than the borrowers might otherwise appear. Informal lending clubs are common among many Asian and other immigrant communities, and something like this might provide an online venue for a more transparent and widely accessible model. It’s harder to bail out on a debt if everyone knows about it and is also a creditor.

Aside from the philosophical and community aspects, looking at this from the lender’s perspective, it looks like it would behave sort of like buying 36-month bonds with a call option. It’s not like you can do a lot of due diligence on the borrowers, and for the amounts involved it’s not going to be worth the effort, so some combination of reputation and diversification is needed. If there were lots of good-but-unrated credit risks in the borrowing pool, you could build a portfolio of sub-prime loans and possibly achieve something in the range of junk bond returns.

Returning to the philosophical, I like the idea of community and socially based lending, because it values good reputations and provides social incentives for people to perform. On the other hand, it looks like a lot of work for a prospective lender. If they’re looking purely at a financial investment, it’s a lot easier going after a portfolio of bonds or a bond fund, so I think you’d have to want to support the model to participate. The borrower’s case looks much more straightforward, since consumer credit tends to be readily available, but expensive. The listings posted on the site so far include a number of “pay off my credit card” loans, which seems quite sensible.

In a slightly different context, it would be interesting to see something similar which matched borrower groups in relatively poor and developing areas with lenders in relatively wealthy areas. Grameen Bank has done amazing things with microfinance in Bangaladesh, in the sense of helping the borrowing communities building new businesses and opportunities for themselves and making a positive economic return for the bank.

In another different context, it would be interesting to see some kind of market making approach for investing in and financing speculative early stage startups. This wouldn’t work for capital intensive projects, but perhaps there’s some standard terms that could be worked out for asset-light software startups (aka web 2.0). The levels of funding required are too small to justify the level of effort, and there is (or should be) a high mortality rate, which would argue for a lighterweight way to build portfolios of small investments.

(via TechCrunch)

See also:
Zopa – eBay for money?