Ho John Lee | January 18th, 2010 | Comments are closed
These are my links for December 31st through January 17th:
Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.
StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
* Planning
* Data Mining
* Machine Learning
* Case-Based Reasoning
Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
Why use Pydroid?
* Testing a GUI application for bugs and edge cases
o You might think your app is stable, but what happens if you press that button 5000 times?
* Automating games
o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
* Freaking out friends and family
o Well maybe this isn't really a practical use, but…
Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.
The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
(int mene[], int wgah, int nagl) {
int Ia, fhtagn;
if (wgah>=nagl) return;
swap (mene,wgah,(wgah+nagl)/2);
fhtagn = wgah;
for (Ia=wgah+1; Ia<=nagl; Ia++)
if (mene[Ia]<mene[wgah])
swap (mene,++fhtagn,Ia);
swap (mene,wgah,fhtagn);
Rlyeh (mene,wgah,fhtagn-1);
Rlyeh (mene,fhtagn+1,nagl);
} // PH'NGLUI MGLW'NAFH CTHULHU!
How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."
Statistical Data Mining Tutorials – A nice collection of presentations reviewing topics in data mining and machine learning. e.g. "HillClimbing, Simulated Annealing and Genetic Algorithms. Some very useful algorithms, to be used only in case of emergency." These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
Scratch Helps Kids Get With the Program – Gadgetwise Blog – NYTimes.com – On my candidate list for 7th grade introductory programming and analysis. "Scratch, an M.I.T.-developed computer-programming language for children, is the focus of worldwide show-and-tell sessions this Saturday. "
Coding Horror: I Just Logged In As You: How It Happened – On good password management, why forums should mostly not be storing user passwords in general, and how re-use of passwords on multiple sites can lead to vulnerability on other sites.
Arc Forum | Arc – Arc is a version of Lisp. Among other things it is used to implement Hacker News.
Deliberate Ambiguity: How *not* to rate a search engine – Search engines have very simple user interfaces, but are used in many different contexts, most of which don't resemble the way people often try out a new search engine.
The Slow Erosion of Google Search – Bokardo – On changes in internet user behaviors over time, more social media (ask your Twitter friends) vs directed search (send a keyword query) etc.
James Smith • loopj.com » Blog Archive » jQuery Plugin: Tokenizing Autocomplete Text Entry – Looks handy – "This is a jQuery plugin to allow users to select multiple items from a predefined list, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook."
Behind The Business Plan Of Pirates Inc. : NPR – It takes around $250K to fund a Somali pirate operation. About 20 percent goes to pay off officials who look the other way. About 50 percent is for expenses and payroll. The leader of an attack makes $10,000 to $20,000 (the average Somali family lives on $500 a year). The initial investor — who put in $250,000 of seed capital — gets 30 percent, sometimes up to $500,000.
Dilbert comic strip for 05/04/2009 from the official Dilbert comic strips archive. – Secretary to Pointy Haired Boss: "I live in a rented trailer and all of my money is in my checking account. Your investments are worthless and your mortgage is underwater. My net worth is higher than yours now. I guess promiscuity and a G.E.D. was a pretty good strategy after all." Reminded me of a thought I had earlier this year, that much of Western Civilization is built on valuing delayed gratification, which hasn't worked out so well recently as opposed to immediate consumption in many cases.
Without Warning, Twitter Kills StatTweets (Businesses Beware) – StatSheet.com ChangeLog – Owner of StatTweets post regarding his network of sports-related Twitter handles being banned. They had several hundred accounts, one for stats for each team. This makes sense for users, given the way Twitter works, but they don't like mass account creation. Interested to see how this sorts out, there seem to be at least a few similar Twitter networks with team/region/topic-specific handles.
Obesity and Overweight: Trends: U.S. Obesity Trends 1985-2007 | DNPAO | CDC – During the past 20 years there has been a dramatic increase in obesity in the United States. This slide set illustrates this trend by mapping the increased prevalence of obesity across each of the states. In 2007, only one state (Colorado) had a prevalence of obesity less than 20%. Thirty states had a prevalence equal to or greater than 25%; three of these states (Alabama, Mississippi and Tennessee) had a prevalence of obesity equal to or greater than 30%. The animated map below shows the United States obesity prevalence from 1985 through 2007.
Why text messages are limited to 160 characters | Technology | Los Angeles Times – A look back to the beginnings of SMS in 1985 – Would the 160-character maximum be enough space to prove a useful form of communication? Having zero market research, they based their initial assumptions on two "convincing arguments," Hillebrand said. For one, they found that postcards often contained fewer than 150 characters. Second, they analyzed a set of messages sent through Telex, a then-prevalent telegraphy network for business professionals. Despite not having a technical limitation, Hillebrand said, Telex transmissions were usually about the same length as postcards.
site admin | April 15th, 2009 | Comments are closed
These are my links for April 13th through April 15th:
On Netflix’s video streaming security framework – A black box look at Netflix video streaming implementation for "Watch Instantly" , built with Microsoft Silverlight for Windows/Mac or various hardware clients, and various authentication and DRM services + HTTPS
Nati Shalom’s Blog: Designing a Scalable Twitter – A look at using in-memory data grid vs memcached architecture for a simplified Twitter-like service with goals of 10 billion tweets/day, 100 GB stored/day (with 10:1 compression), immutable tweets i.e., there are no updates, only inserts, client applications limited to 70 requests per hour.
agile approach | World Bank Open API 2.0 Launched – Economic development indicators for 200 countries from World Bank. 1000 API calls/day free for non-commercial use, info at developer.worldbank.org
site admin | March 16th, 2009 | Comments are closed
These are my links for March 12th through March 16th:
Aig Systemic 090309 – AIG memo outlining systemic risks of their various businesses to the global economy.
FT.com / Companies / Insurance – AIG publishes counterparty list – AIG caved in to political pressure Sunday and released a list of some of the financial counterparties that benefited from its $160bn US government rescue, including some of Europe’s largest banks.
Geek And Poke – Mostly twitter and cloud computing themed cartoons.
Official Google Blog: Here comes Google Voice – GrandCentral makes a comeback, after disappearing into Google a while back. Now with voice transcription, SMS folders, and integration with GMail address book.
Amazon Web Services Blog: Announcing Amazon EC2 Reserved Instances – AWS introduces pricing structure for longer term, reserved capacity. Upfront payment, plus a (lower) incremental hourly charge, net savings for continuous 24×7 clients, and guaranteed availability of instances for backup or surge capacity.
site admin | March 1st, 2009 | Comments are closed
These are my links for February 28th through March 1st:
Community Data – Swivel – User contributed datasets, for visualization and graphs with Swivel
Obamameter – Map visualization of economic stimulus outlays. "Keep tabs on the the US economy, the global economy and the stimulus through our dashboard for the economy."
recovery.gov.pdf – Slide presentation on data sources and construction of initial Recover.gov site in Jan 2009, from talk at Transparency Camp.
Virtual Hoff : DoxPara Research – Slides from Dan Kaminsky's talk at CloudCamp Seattle on network and application security issues in cloud and virtualized computing environments.
Can You Buy a Silicon Valley? Maybe. – from Paul Graham – "If you could get startups to stick to your town for a million apiece, then for a billion dollars you could bring in a thousand startups. That probably wouldn't push you past Silicon Valley itself, but it might get you second place. For the price of a football stadium, any town that was decent to live in could make itself one of the biggest startup hubs in the world."
Python for Lisp Programmers – Peter Norvig examines Python. "(Although it wasn't my intent, Python programers have told me this page has helped them learn Lisp.) Basically, Python can be seen as a dialect of Lisp with "traditional" syntax (what Lisp people call "infix" or "m-lisp" syntax). One message on comp.lang.python said "I never understood why LISP was a good idea until I started playing with python." Python supports all of Lisp's essential features except macros, and you don't miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so you can create custom languages that way. "
Beatles Unknown "A Hard Day’s Night" Chord Mystery Solved Using Fourier Transform – Four years ago, inspired by reading news coverage about the song’s 40th anniversary, Jason Brown of Dalhousie’s Department of Mathematics decided to try and see if he could apply a mathematical calculation known as Fourier transform to solve the Beatles’ riddle. The process allowed him to decompose the sound into its original frequencies using computer software and parse out which notes were on the record.
Took advantage of the discounted ($49 through end February) early registration for SF MusicTech, coming up on May 18th.
The SanFran MusicTech Summit will bring together the best and brightest developers in the Music/Technology Space, along with the musicians, entrepreneurial business people, press, investors, service providers, and organizations who work with them at the convergence of culture and commerce. We will meet to discuss the evolving music/business/technology ecosystem in a proactive, conducive to dealmaking environment.
Unfortunately, it overlaps with ICWSM09, will try to make both though.
Recipe for Disaster: The Formula That Killed Wall Street – Felix Salmon writes about David Li, the Gaussian copula function, and its widespread (mis)use in pricing derivatives based on correlation only, leading to parts of today's financial crisis.
A Tutorial on Support Vector Machines for Pattern Recognition – Christopher J.C. Burges (PDF) – Appeared in: Data Mining and Knowledge Discovery 2, 121-167, 1998. The tutorial starts with an overview of the concepts of VC dimension and structural risk
minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable
data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss
when SVM solutions are unique and when they are global. We describe how support vector training can
be practically implemented, and discuss in detail the kernel mapping technique which is used to construct
SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large
(even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian
radial basis function kernels. While very high VC dimension would normally bode ill for generalization
performance, there are several arguments which support the observed high accuracy of SVMs,
which we review.
YouTube – The Crisis of Credit Visualized – Part 1 – Nice animated video attempting to present a simplified explanation of the credit crisis and the relationship between home mortgage lending, bank leverage, and risk.
FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, last.fm during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."
This weekend’s news that JP Morgan will take over Bear Stearns for around $2/share is astonishing. Last Friday BSC closed at around 30. A week ago it was around 60. A year ago it was around $150/share. So in a year the shares are down by 99%, and even the bargain shoppers who got in on Friday are down by something like 85%, based on today’s close.
Global financial markets are linked more closely than ever. Here’s a crib sheet of a few markets of interest and their opening/closing times in Pacific Time (US West Coast).
There are often interesting interactions at major market open and closes, especially during the overlap between the US market open and European market close. In addition, US index futures, particularly the ES (S&P 500 e-mini futures) also trade nearly around the clock, closing only for daily settlement between 4:15 and 4:30pm US East Coast time (plus weekend and holidays).
Note that many Asian markets have morning and afternoon sessions, and close for lunch. Different countries also observe differing practices with respect to Daylight Savings Time, so the relative timing may change seasonally. You may find it useful to check with a World Clock for the current times. Also remember that Asia begins it’s week on Sunday evening in the US, and is closed for the week by the time it’s Friday in the US.
Mr. Kerviel is no trading legend who let a transaction get out of hand. He was a low-level trader in the bank’s “Delta One” desk in western Paris, earning about €100,000 ($145,000) a year. His job was to make bets on how large European stock indexes would move, according to bank officials. His expertise was trading baskets of stocks such as the Euro Stoxx 50.
I’ve noticed that the news coverage keep reporting “fraud”, which is apparently true (he made up fictitious trades with outside partners of the bank), but it mostly sounds like internal risk controls failed in more than one place.
Of course, if it had gone the other way and turned a profit, we never would have heard about it.
Looks like no one was impressed by Bernanke’s non-intervention on Thursday and the Bush/Paulson send-everyone-800-bucks stimulus package. US markets are closed for Martin Luther King Day, but the rest of the world is open, and down hard. DJ futures are showing something like -520 for tomorrow’s open, around 11586. The NYSE circuit breaker rules don’t kick in until a 10% move (1350 points), which would be somewhere around 10740. (The thresholds get reset every quarter.)
Don’t think we’ll see that tomorrow, but the way things have been going recently, it’s not out of the question. I wouldn’t be surprised to see a last-ditch central bank intervention tomorrow morning before the open, either. I think they missed their chance on Thursday, but I also don’t think the Fed has the luxury of waiting until the official FOMC meeting January 29 for their next move.
Fed chair Ben Bernanke appeared before the House Budget Committee this morning, giving a prepared statement, then taking questions from the panel members. Aside from the content of his comments (growth is slowing, we’re not in a recession, some quick economic stimulus would be good), I always find it unsettling to see and hear the questions from our elected officials on the budget committee, as they tend to make speeches posing as questions, that sometimes border on the absurd. Basically, they pretend to ask questions, and the Fed Chair pretends to give answers.
One congresswoman had Ben Bernanke confused with Hank Paulsen, (former head of Goldman Sachs, now Treasury Secretary) in a prepared question asking if the bankers who caused the credit market problems would repay their bonuses and salaries to the American people. You’d think at least her staff would be able to keep track of who was at Treasury and Fed. Ben probably wishes he had the bonuses she wanted him to repay.
The short term trading question tonight is whether we see the widely-expected “surprise” rate cut premarket tomorrow to ambush the index option traders before the open, like the discount window cut before the August 17 options expiration. Unlike equity options, US index options mostly settle based on the opening trades on expiration day. Futures are creeping up overnight, in case. But they already pulled that trick once, and everyone is watching for it, which means that even if they do it again, it won’t work as well as last time.
2008 so far: S&P down 9.2%, DJ -8.33%, Nas -11.51%