|
|
Ho John Lee | January 18th, 2010 | Comments are closed
These are my links for December 31st through January 17th:
- Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.
- StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
* Planning
* Data Mining
* Machine Learning
* Case-Based Reasoning
- Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
- Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
- SSH Tunneling through web filters | s-anand.net – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
- PyDroid GUI automation toolkit – GitHub – What is Pydroid?
Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
Why use Pydroid?
* Testing a GUI application for bugs and edge cases
o You might think your app is stable, but what happens if you press that button 5000 times?
* Automating games
o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
* Freaking out friends and family
o Well maybe this isn't really a practical use, but…
- Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
- How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
- Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
* TinkerGraph in-memory graph
* Neo4j graph database
* Sesame 2.0 compliant RDF stores
* MongoDB document database
The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.
- The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
(int mene[], int wgah, int nagl) {
int Ia, fhtagn;
if (wgah>=nagl) return;
swap (mene,wgah,(wgah+nagl)/2);
fhtagn = wgah;
for (Ia=wgah+1; Ia<=nagl; Ia++)
if (mene[Ia]<mene[wgah])
swap (mene,++fhtagn,Ia);
swap (mene,wgah,fhtagn);
Rlyeh (mene,wgah,fhtagn-1);
Rlyeh (mene,fhtagn+1,nagl);
} // PH'NGLUI MGLW'NAFH CTHULHU!
- How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
- Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
- Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
- PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
- Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
- mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."
site admin | May 23rd, 2009 | Comments are closed
These are my links for May 22nd through May 23rd:
- Improve MySQL Insert Performance – Summary – use LOAD DATA INFILE
- Scratch | Home | imagine, program, share – Scratch is designed to help young people (ages 8 and up) develop 21st century learning skills. As they create and share Scratch projects, young people learn important mathematical and computational ideas, while also learning to think creatively, reason systematically, and work collaboratively
- Alice.org – Programming language environment for teaching kids, built on Java, geared toward a story telling approach.
- Jason R Briggs | Snake Wrangling for Kids – “Snake Wrangling for Kids” is a printable electronic book, for children 8 years and older, who would like to learn computer programming. It covers the very basics of programming, and uses the Python 3 programming language to teach the concepts.
- Benchmarking BDB, CDB and Tokyo Cabinet on large datasets – CDB comes out significantly faster. (It's for unchanging data though, so not totally surprising) Benchmark data for 11M key-value pair dataset stored in Berkeley DB, CDB, and Tokyo Cabinet.
site admin | May 15th, 2009 | Comments are closed
These are my links for May 14th through May 15th:
- Congratulations, Google staff: $210k in profit per head in 2008 | Royal Pingdom – Google had $209,624 in profit per employee in 2008, which beats all the other large tech companies we looked at, including big hitters like Microsoft ($194K), Apple ($151K), Intel ($64K) and IBM ($30K).
- Statistical Data Mining Tutorials – A nice collection of presentations reviewing topics in data mining and machine learning. e.g. "HillClimbing, Simulated Annealing and Genetic Algorithms. Some very useful algorithms, to be used only in case of emergency." These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
- Dare Obasanjo aka Carnage4Life – Why Twitter’s Engineers Hate the @replies feature – Looking at the infrastructure overhead required for Twitter's attempted change to @reply behavior.
- Scratch Helps Kids Get With the Program – Gadgetwise Blog – NYTimes.com – On my candidate list for 7th grade introductory programming and analysis. "Scratch, an M.I.T.-developed computer-programming language for children, is the focus of worldwide show-and-tell sessions this Saturday. "
- jLinq – Javascript Query Language – For manipulating data sets in Javascript, sort of like jQuery
site admin | May 5th, 2009 | Comments are closed
These are my links for May 4th through May 5th:
- Influential Nodes in a Diffusion Model for Social Networks (icalp05-inf.pdf) – Kempe, Kleinberg, Tardos. Algorithm for greedy approximation of most influential nodes in social network (63% of optimal) under various conditions.
- Maximizing the Spread of Influence through a Social Network (kdd03-inf.pdf) – Kempe, Kleinberg, Tardos. Maximizing propagation by selecting most influential nodes is NP-hard, but a greedy approximation can work well (63% of optimal) under various conditions.
- Notification Strategies for Social Networks – Discussion on approaches to maximizing use of a limited number of notifications within social networks e.g. Facebook
- James Smith • loopj.com » Blog Archive » jQuery Plugin: Tokenizing Autocomplete Text Entry – Looks handy – "This is a jQuery plugin to allow users to select multiple items from a predefined list, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook."
- Google Code FAQ – Using cURL to interact with Google data services – Step by step tutorial on using curl with Google data APIs.
- Behind The Business Plan Of Pirates Inc. : NPR – It takes around $250K to fund a Somali pirate operation. About 20 percent goes to pay off officials who look the other way. About 50 percent is for expenses and payroll. The leader of an attack makes $10,000 to $20,000 (the average Somali family lives on $500 a year). The initial investor — who put in $250,000 of seed capital — gets 30 percent, sometimes up to $500,000.
site admin | April 19th, 2009 | Comments are closed
These are my links for April 18th through April 19th:
- Why Programmers Suck at CSS Design – Stefano’s Linotype – A practical approach to CSS for non-designers (programmers).
- The Art & Science of Seductive Interactions – Presentation slides on improving application user experience by making them more game like (points, levels, scarcity), social interaction, and other ideas.
- Stephen Marsland – Python code from "Machine Learning: An Algorithmic Perspective", assorted clustering and estimation algorithms.
- Firediff – In Case of Stairs – Firediff implements a change monitor that records all of the changes made by firebug and the application itself to CSS and the DOM. This
provides insight into the functionality of the application as well as provide a record of the changes that were required to debug and tweak the page’s display.
- Crowdsourcing the semantic web | lexanderA – "Currently, all attempts at providing semantic metadata require server-side changes which means that we need to rely on page authors to implement them. This, of course, is a major obstacle. But what if we could change that? What if we could bypass page authors and have the crowd add semantic metadata to existing pages?"
- Just How Important is the Valley? Let’s Look at some Data. – Tony Wright dot com – Is the silicon valley entrepreneurship model specific to SV? List of acquisitions in 2007 and 2008.
site admin | April 15th, 2009 | Comments are closed
These are my links for April 13th through April 15th:
site admin | March 1st, 2009 | Comments are closed
These are my links for February 28th through March 1st:
- Community Data – Swivel – User contributed datasets, for visualization and graphs with Swivel
- Obamameter – Map visualization of economic stimulus outlays. "Keep tabs on the the US economy, the global economy and the stimulus through our dashboard for the economy."
- recovery.gov.pdf – Slide presentation on data sources and construction of initial Recover.gov site in Jan 2009, from talk at Transparency Camp.
- Virtual Hoff : DoxPara Research – Slides from Dan Kaminsky's talk at CloudCamp Seattle on network and application security issues in cloud and virtualized computing environments.
- Can You Buy a Silicon Valley? Maybe. – from Paul Graham – "If you could get startups to stick to your town for a million apiece, then for a billion dollars you could bring in a thousand startups. That probably wouldn't push you past Silicon Valley itself, but it might get you second place. For the price of a football stadium, any town that was decent to live in could make itself one of the biggest startup hubs in the world."
- Berkshire Hathaway 2008 shareholders letter (PDF) – Warren Buffet reviews the state of the financial markets, his worst year ever, and the outlook for 2009.
- White House 2: Where YOU set the nation’s priorities – Not the actual White House, but an interesting experiment in collaborative input for setting government agenda.
- Python for Lisp Programmers – Peter Norvig examines Python. "(Although it wasn't my intent, Python programers have told me this page has helped them learn Lisp.) Basically, Python can be seen as a dialect of Lisp with "traditional" syntax (what Lisp people call "infix" or "m-lisp" syntax). One message on comp.lang.python said "I never understood why LISP was a good idea until I started playing with python." Python supports all of Lisp's essential features except macros, and you don't miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so you can create custom languages that way. "
site admin | February 21st, 2009 | Comments are closed
These are my links for February 21st from 13:59 to 21:55:
- Non Sequitur — Gocomics.com – "Hi. My name is Bob, and I'm a Twitter addict…"
- A Tutorial on Support Vector Machines for Pattern Recognition – Christopher J.C. Burges (PDF) – Appeared in: Data Mining and Knowledge Discovery 2, 121-167, 1998. The tutorial starts with an overview of the concepts of VC dimension and structural risk
minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable
data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss
when SVM solutions are unique and when they are global. We describe how support vector training can
be practically implemented, and discuss in detail the kernel mapping technique which is used to construct
SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large
(even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian
radial basis function kernels. While very high VC dimension would normally bode ill for generalization
performance, there are several arguments which support the observed high accuracy of SVMs,
which we review.
- Data Mining Research – dataminingblog.com: Data Miners on Twitter – A list of data mining people on twitter.
- YouTube – The Crisis of Credit Visualized – Part 1 – Nice animated video attempting to present a simplified explanation of the credit crisis and the relationship between home mortgage lending, bank leverage, and risk.
- “10 Obstacles to Cloud Computing” by UC Berkeley & How GoGrid Hurdles Them | GoGrid Blog – Another commentary on the recent UCB cloud computing overview paper
site admin | February 15th, 2009 | Comments are closed
These are my links for February 14th through February 15th:
|
|