Bookmarks for December 31st through January 17th

These are my links for December 31st through January 17th:

  • Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.

    We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.

  • StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
    The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
    * Planning
    * Data Mining
    * Machine Learning
    * Case-Based Reasoning
  • Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
  • Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
  • SSH Tunneling through web filters | s-anand.net – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
  • PyDroid GUI automation toolkit – GitHub – What is Pydroid?

    Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
    Why use Pydroid?

    * Testing a GUI application for bugs and edge cases
    o You might think your app is stable, but what happens if you press that button 5000 times?
    * Automating games
    o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
    * Freaking out friends and family
    o Well maybe this isn't really a practical use, but…

  • Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
  • How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
  • Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:

    * TinkerGraph in-memory graph
    * Neo4j graph database
    * Sesame 2.0 compliant RDF stores
    * MongoDB document database

    The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.

  • The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
    (int mene[], int wgah, int nagl) {
    int Ia, fhtagn;
    if (wgah>=nagl) return;
    swap (mene,wgah,(wgah+nagl)/2);
    fhtagn = wgah;
    for (Ia=wgah+1; Ia<=nagl; Ia++)
    if (mene[Ia]<mene[wgah])
    swap (mene,++fhtagn,Ia);
    swap (mene,wgah,fhtagn);
    Rlyeh (mene,wgah,fhtagn-1);
    Rlyeh (mene,fhtagn+1,nagl);

    } // PH'NGLUI MGLW'NAFH CTHULHU!

  • How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
  • Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
  • Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
  • PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
  • Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
  • mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."

Bookmarks for June 11th through June 12th

These are my links for June 11th through June 12th:

Bookmarks for June 9th through June 10th

These are my links for June 9th through June 10th:

Bookmarks for June 3rd through June 4th

These are my links for June 3rd through June 4th:

Bookmarks for June 1st through June 2nd

These are my links for June 1st through June 2nd:

  • jqPlot – Pure Javascript Plotting – jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.
  • New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – HarvardBusiness.org – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
  • Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
  • well-formed.eigenfactor.org : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
  • Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.

Bookmarks for May 29th from 05:17 to 12:45

These are my links for May 29th from 05:17 to 12:45:

Bookmarks for May 24th through May 27th

These are my links for May 24th through May 27th:

  • Formulas and game mechanics – WoWWiki – Your guide to the World of Warcraft – Formulas and game mechanics rules and guidelines for developing role playing games
  • Manchester United’s Park Has the Endurance to Persevere – NYTimes.com – Korean soccer player Park Ji-Sung – On Wednesday night in Rome, Park is expected to become the first Asian player to participate in the European Champions League final when Manchester United faces Barcelona.
  • mloss.org – Machine Learning Open Source Software – Big collection of open source packages for machine learning, data mining, statistical analysis
  • The Datacenter as Computer – Luiz André Barroso and Urs Hölzle 2009 (PDF) – 120 pages on large scale computing lessons from Google. "These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base."
  • Geeking with Greg: The datacenter is the new mainframe – Pointer to a paper by Googlers Luiz Andre Barroso and Urs Holzle on the evolution of warehouse scale computing and the management and use of computing resources in a contemporary datacenter.

Bookmarks for May 22nd from 06:31 to 07:14

These are my links for May 22nd from 06:31 to 07:14:

  • Javascript Malware Analysis: A Case Study – "This particular beast was found in the wild in May 2009 on a site phishing for Facebook user credentials, and is a particularly-nasty bugger. Note the number of strangely-named variables created up front, many of which are not even referenced in the code blocks that follow. Additionally notice the odd ternary statements which have no impact on the operation of the code, and presumably must exist to trip up scanners (unless there is a fancy form of string replacement on the body of some functions, in which case the functions could be mutated before execution – and that would be scary. A cipher based on the body of the function has also been seen.)"
  • MySQL: Forked beyond repair? | Developer World – InfoWorld – Now that MySQL is part of Oracle, will the forks take over? "if MySQL's approval ratings are slumping, all the more reason for Oracle to move decisively. Oracle must work to regain the trust and support of the MySQL community or risk losing mindshare to a fork, such as Drizzle or MariaDB. To do that, it has to avoid making the mistakes that Sun made when it acquired MySQL. In a sense, to succeed with MySQL, Oracle will have to stop acting like Oracle."
  • Scott Hanselman’s Computer Zen – Less Virtual, More Machine – Windows 7 and the magic of Boot to VHD – Notes on using Windows virtual hard drives to manage instances of multiple version of Windows in parallel, e.g. Windows 7 beta, WinXP, etc.
  • How Opera’s business model works – Communication Breakdown – David Meyer’s Blog at ZDNet.co.uk Community – Around 40M users, "Most of our revenue — 75-80 percent — comes from mobile devices, fom a free browser. We provide the browser for free, like Opera desktop and Mini, and then we generate revenue through our content partners. We provide the search in the right corner and things like that, and that generates revenues in the free distributions. Then you get paid by OEMs [original equipment manufacturers] for distribution — companies like Nokia and Motorola. Most of the mobile OEMs and a fair amount of the other OEMs. We signed up Ford recently and we're now in Ford trucks."
  • Digicorp » Blog Archive » Prevention of Sql Injection with PHP – Notes on good coding hygiene for avoiding SQL injection attacks while processing web form input such as passwords and other text fields.

Bookmarks for May 21st from 06:07 to 22:34

These are my links for May 21st from 06:07 to 22:34:

Bookmarks for May 20th from 19:50 to 22:03

These are my links for May 20th from 19:50 to 22:03:

Bookmarks for May 19th from 08:04 to 19:24

These are my links for May 19th from 08:04 to 19:24:

  • List of Really Useful Free Tools For JavaScript Developers | W3Avenue
  • When Korean Culture Flourished – WSJ.com – In the geography of the Metropolitan Museum of Art, the gallery devoted to Korea acts as a sort of land bridge between China and South Asia that all too often serves as passage rather than destination. The first in a series of shows to be held over the next 10 to 15 years, "Art of the Korean Renaissance, 1400-1600" may change this. With only 47 objects(!), the exhibition explores a fertile 200-year period in Korea's cultural history, revealing as much through its choice of works as it does through the order in which it displays them. The show's modest size makes the point that, sadly, little has survived from this period, when the Joseon — or Fresh Dawn — dynasty (1392-1910) united the Korean peninsula militarily, established Confucianism as the national ideology and introduced a phonetic alphabet.
  • Axiis : Data Visualization Framework – Axiis provides both pre-built visualization components as well as abstract layout patterns and rendering classes that allow you to create your own unique visualizations. Axiis is built upon the Degrafa graphics framework and Adobe Flex 3.
  • Report: Mint Considers Selling Anonymized Data from Its Users – ReadWriteWeb – A lot of people would be interested in that dataset. Tricky to balance data exposure with consumer privacy.
  • Lendingclub.com: A De-anonymization Walkthrough « 33 Bits of Entropy – Step by step look at de-anonymizing a consumer data set. Given alternate sources, you can fill in a lot of gaps.

Bookmarks for May 13th from 06:26 to 22:36

These are my links for May 13th from 06:26 to 22:36:

Bookmarks for May 12th from 10:52 to 21:56

These are my links for May 12th from 10:52 to 21:56:

Bookmarks for May 8th through May 12th

These are my links for May 8th through May 12th:

Bookmarks for April 3rd through April 7th

These are my links for April 3rd through April 7th:

  • Agile Testing: Experiences deploying a large-scale infrastructure in Amazon EC2 – Practical guidance on using cloud computing at EC2. Expect failures, automate deployment, more.
  • joshua’s blog: on url shorteners – Joshua Schachter (founder of del.icio.us) summary on the state of URL shorteners (tinyurl, bit.ly, etc), and issues with 3rd party redirects, link sharing through twitter, etc.
  • Control Yourself » status.net coming soon – On status.net, plans for hosting laconi.ca sites, and federating microblogging status networks
  • There must be some way out of here (Scripting News) – Comments on the rise of celebrity accounts on Twitter, increasing spam/noise, and alternative models for laconi.ca and status.net
  • Stochastic Models of User-Contributory Web Sites – Tad Hogg, Kristina Lerman 31 Mar 2009 Abstract: We describe a general stochastic processes-based approach to modeling user-contributory web sites, where users create, rate and share content. These models describe aggregate measures of activity and how they arise from simple models of individual users. This approach provides a tractable method to understand user activity on the web site and how this activity depends on web site design choices, especially the choice of what information about other users' behaviors is shown to each user. We illustrate this modeling approach in the context of user-created content on the news rating site Digg.

Bookmarks for March 12th through March 16th

These are my links for March 12th through March 16th:

Bookmarks for February 28th through March 1st

These are my links for February 28th through March 1st:

  • Community Data – Swivel – User contributed datasets, for visualization and graphs with Swivel
  • Obamameter – Map visualization of economic stimulus outlays. "Keep tabs on the the US economy, the global economy and the stimulus through our dashboard for the economy."
  • recovery.gov.pdf – Slide presentation on data sources and construction of initial Recover.gov site in Jan 2009, from talk at Transparency Camp.
  • Virtual Hoff : DoxPara Research – Slides from Dan Kaminsky's talk at CloudCamp Seattle on network and application security issues in cloud and virtualized computing environments.
  • Can You Buy a Silicon Valley? Maybe. – from Paul Graham – "If you could get startups to stick to your town for a million apiece, then for a billion dollars you could bring in a thousand startups. That probably wouldn't push you past Silicon Valley itself, but it might get you second place. For the price of a football stadium, any town that was decent to live in could make itself one of the biggest startup hubs in the world."
  • Berkshire Hathaway 2008 shareholders letter (PDF) – Warren Buffet reviews the state of the financial markets, his worst year ever, and the outlook for 2009.
  • White House 2: Where YOU set the nation’s priorities – Not the actual White House, but an interesting experiment in collaborative input for setting government agenda.
  • Python for Lisp Programmers – Peter Norvig examines Python. "(Although it wasn't my intent, Python programers have told me this page has helped them learn Lisp.) Basically, Python can be seen as a dialect of Lisp with "traditional" syntax (what Lisp people call "infix" or "m-lisp" syntax). One message on comp.lang.python said "I never understood why LISP was a good idea until I started playing with python." Python supports all of Lisp's essential features except macros, and you don't miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so you can create custom languages that way. "

Bookmarks for February 23rd through February 24th

These are my links for February 23rd through February 24th:

Bookmarks for February 16th through February 17th

These are my links for February 16th through February 17th:

  • Top 100 Network Security Tools – Many many security testing and hacking tools.
  • FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
  • The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
  • Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
  • Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
  • Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, last.fm during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."

Nvu – open source FrontPage-like tool

This looks interesting, if it works.
http://www.nvu.com/
From the web site:
Nvu (pronounced N-view, for a “new view”) is an Open Source project started by Linspire, Inc. Linspire is committed exclusively to bringing Desktop Linux to the masses, and realized that an easy-to-use web authoring system was needed for Linux to continue its expansion to the Desktop. Linspire contributes significant capital, expertise, servers, bandwidth, marketing, and other resources to guarantee the continuation and success of the Nvu project.