Ho John Lee | January 18th, 2010 | Comments are closed
These are my links for December 31st through January 17th:
Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.
StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
* Planning
* Data Mining
* Machine Learning
* Case-Based Reasoning
Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
Why use Pydroid?
* Testing a GUI application for bugs and edge cases
o You might think your app is stable, but what happens if you press that button 5000 times?
* Automating games
o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
* Freaking out friends and family
o Well maybe this isn't really a practical use, but…
Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.
The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
(int mene[], int wgah, int nagl) {
int Ia, fhtagn;
if (wgah>=nagl) return;
swap (mene,wgah,(wgah+nagl)/2);
fhtagn = wgah;
for (Ia=wgah+1; Ia<=nagl; Ia++)
if (mene[Ia]<mene[wgah])
swap (mene,++fhtagn,Ia);
swap (mene,wgah,fhtagn);
Rlyeh (mene,wgah,fhtagn-1);
Rlyeh (mene,fhtagn+1,nagl);
} // PH'NGLUI MGLW'NAFH CTHULHU!
How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."
Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
Without Warning, Twitter Kills StatTweets (Businesses Beware) – StatSheet.com ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.
site admin | March 16th, 2009 | Comments are closed
These are my links for March 12th through March 16th:
Aig Systemic 090309 – AIG memo outlining systemic risks of their various businesses to the global economy.
FT.com / Companies / Insurance – AIG publishes counterparty list – AIG caved in to political pressure Sunday and released a list of some of the financial counterparties that benefited from its $160bn US government rescue, including some of Europe’s largest banks.
Geek And Poke – Mostly twitter and cloud computing themed cartoons.
Official Google Blog: Here comes Google Voice – GrandCentral makes a comeback, after disappearing into Google a while back. Now with voice transcription, SMS folders, and integration with GMail address book.
Amazon Web Services Blog: Announcing Amazon EC2 Reserved Instances – AWS introduces pricing structure for longer term, reserved capacity. Upfront payment, plus a (lower) incremental hourly charge, net savings for continuous 24×7 clients, and guaranteed availability of instances for backup or surge capacity.
Recipe for Disaster: The Formula That Killed Wall Street – Felix Salmon writes about David Li, the Gaussian copula function, and its widespread (mis)use in pricing derivatives based on correlation only, leading to parts of today's financial crisis.
A Tutorial on Support Vector Machines for Pattern Recognition – Christopher J.C. Burges (PDF) – Appeared in: Data Mining and Knowledge Discovery 2, 121-167, 1998. The tutorial starts with an overview of the concepts of VC dimension and structural risk
minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable
data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss
when SVM solutions are unique and when they are global. We describe how support vector training can
be practically implemented, and discuss in detail the kernel mapping technique which is used to construct
SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large
(even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian
radial basis function kernels. While very high VC dimension would normally bode ill for generalization
performance, there are several arguments which support the observed high accuracy of SVMs,
which we review.
YouTube – The Crisis of Credit Visualized – Part 1 – Nice animated video attempting to present a simplified explanation of the credit crisis and the relationship between home mortgage lending, bank leverage, and risk.
FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, last.fm during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."
Ho John Lee | November 14th, 2007 | Comments are closed
Another day, another subprime-related fiasco. Today GE Asset Management announced that one of its not-quite-money-market short bond funds, the Enhanced Cash Trust, took a loss from subprime holdings, and is offering customer redemptions at 96 cents on the dollar. Normally these funds are considered to be a higher-yielding version of a money market fund. This would make you pretty unhappy if you were looking for 5%-ish stable returns while waiting for the stock market to settle down.
Along these lines, here are British comedians John Fortune & John Bird chatting about the state of the banking system, Northern Rock, and subprime in another interview of “George Parr, investment banker” from last month.
This evening I’ve been looking over Prosper (formerly known as CircleOne), a social lending site similar to Zopa, which provides an eBay-like marketplace for borrowers and lenders to transact loans.
Prosper manages credit scoring, loan servicing, and provides social and economic incentives for borrower groups to build their reputation as good lending risks. All loans are 36 months with no prepayment penalty, and Prosper charges a 1% origination fee from the borrower, and 0.5% loan servicing fee from the lender. Groups with good reputations can get some incentive payments for loan performance and lower rates over time.
In addition to using credit scoring, social lending and finance approaches can be effective and much less risky than the borrowers might otherwise appear. Informal lending clubs are common among many Asian and other immigrant communities, and something like this might provide an online venue for a more transparent and widely accessible model. It’s harder to bail out on a debt if everyone knows about it and is also a creditor.
Aside from the philosophical and community aspects, looking at this from the lender’s perspective, it looks like it would behave sort of like buying 36-month bonds with a call option. It’s not like you can do a lot of due diligence on the borrowers, and for the amounts involved it’s not going to be worth the effort, so some combination of reputation and diversification is needed. If there were lots of good-but-unrated credit risks in the borrowing pool, you could build a portfolio of sub-prime loans and possibly achieve something in the range of junk bond returns.
Returning to the philosophical, I like the idea of community and socially based lending, because it values good reputations and provides social incentives for people to perform. On the other hand, it looks like a lot of work for a prospective lender. If they’re looking purely at a financial investment, it’s a lot easier going after a portfolio of bonds or a bond fund, so I think you’d have to want to support the model to participate. The borrower’s case looks much more straightforward, since consumer credit tends to be readily available, but expensive. The listings posted on the site so far include a number of “pay off my credit card” loans, which seems quite sensible.
In a slightly different context, it would be interesting to see something similar which matched borrower groups in relatively poor and developing areas with lenders in relatively wealthy areas. Grameen Bank has done amazing things with microfinance in Bangaladesh, in the sense of helping the borrowing communities building new businesses and opportunities for themselves and making a positive economic return for the bank.
In another different context, it would be interesting to see some kind of market making approach for investing in and financing speculative early stage startups. This wouldn’t work for capital intensive projects, but perhaps there’s some standard terms that could be worked out for asset-light software startups (aka web 2.0). The levels of funding required are too small to justify the level of effort, and there is (or should be) a high mortality rate, which would argue for a lighterweight way to build portfolios of small investments.