Bookmarks for January 30th through February 4th

These are my links for January 30th through February 4th:

  • Op-Ed Contributor – Microsoft’s Creative Destruction – NYTimes.com – Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually developed a system to thwart innovation. Despite having one of the largest and best corporate laboratories in the world, and the luxury of not one but three chief technology officers, the company routinely manages to frustrate the efforts of its visionary thinkers.
  • Leonardo da Vinci’s Resume Explains Why He’s The Renaissance Man For the Job – Davinci – Gizmodo – At one time in history, even da Vinci himself had to pen a resume to explain why he was a qualified applicant. Here's a translation of his letter to the Duke of Milan, delineating his many talents and abilities. "Most Illustrious Lord, Having now sufficiently considered the specimens of all those who proclaim themselves skilled contrivers of instruments of war, and that the invention and operation of the said instruments are nothing different from those in common use: I shall endeavor, without prejudice to any one else, to explain myself to your Excellency, showing your Lordship my secret, and then offering them to your best pleasure and approbation to work with effect at opportune moments on all those things which, in part, shall be briefly noted below..The document, written when da Vinci was 30, is actually more of a cover letter than a resume; he leaves out many of his artistic achievements and instead focuses on what he can provide for the Duke in technologies of war.
  • jsMath: jsMath Home Page – The jsMath package provides a method of including mathematics in HTML pages that works across multiple browsers under Windows, Macintosh OS X, Linux and other flavors of unix. It overcomes a number of the shortcomings of the traditional method of using images to represent mathematics: jsMath uses native fonts, so they resize when you change the size of the text in your browser, they print at the full resolution of your printer, and you don't have to wait for dozens of images to be downloaded in order to see the mathematics in a web page. There are also advantages for web-page authors, as there is no need to preprocess your web pages to generate any images, and the mathematics is entered in TeX form, so it is easy to create and maintain your web pages. Although it works best with the TeX fonts installed, jsMath will fall back on a collection of image-based fonts (which can still be scaled or printed at high resolution) or unicode fonts when the TeX fonts are not available.
  • Josh on the Web » Blog Archive » Abusing the Cache: Tracking Users without Cookies – To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first – and only – time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code.

    One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.

  • Panopticlick – Your browser fingerprint appears to be unique among the 342,943 tested so far.

    Currently, we estimate that your browser has a fingerprint that conveys at least 18.39 bits of identifying information.

    The measurements we used to obtain this result are listed below. You can read more about the methodology here, and about some defenses against fingerprinting here

Bookmarks for December 31st through January 17th

These are my links for December 31st through January 17th:

  • Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.

    We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.

  • StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
    The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
    * Planning
    * Data Mining
    * Machine Learning
    * Case-Based Reasoning
  • Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
  • Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
  • SSH Tunneling through web filters | s-anand.net – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
  • PyDroid GUI automation toolkit – GitHub – What is Pydroid?

    Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
    Why use Pydroid?

    * Testing a GUI application for bugs and edge cases
    o You might think your app is stable, but what happens if you press that button 5000 times?
    * Automating games
    o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
    * Freaking out friends and family
    o Well maybe this isn't really a practical use, but…

  • Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
  • How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
  • Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:

    * TinkerGraph in-memory graph
    * Neo4j graph database
    * Sesame 2.0 compliant RDF stores
    * MongoDB document database

    The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.

  • The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
    (int mene[], int wgah, int nagl) {
    int Ia, fhtagn;
    if (wgah>=nagl) return;
    swap (mene,wgah,(wgah+nagl)/2);
    fhtagn = wgah;
    for (Ia=wgah+1; Ia<=nagl; Ia++)
    if (mene[Ia]<mene[wgah])
    swap (mene,++fhtagn,Ia);
    swap (mene,wgah,fhtagn);
    Rlyeh (mene,wgah,fhtagn-1);
    Rlyeh (mene,fhtagn+1,nagl);

    } // PH'NGLUI MGLW'NAFH CTHULHU!

  • How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
  • Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
  • Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
  • PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
  • Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
  • mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."

Bookmarks for May 30th through May 31st

These are my links for May 30th through May 31st:

Bookmarks for May 29th from 05:17 to 12:45

These are my links for May 29th from 05:17 to 12:45:

Bookmarks for May 24th through May 27th

These are my links for May 24th through May 27th:

  • Formulas and game mechanics – WoWWiki – Your guide to the World of Warcraft – Formulas and game mechanics rules and guidelines for developing role playing games
  • Manchester United’s Park Has the Endurance to Persevere – NYTimes.com – Korean soccer player Park Ji-Sung – On Wednesday night in Rome, Park is expected to become the first Asian player to participate in the European Champions League final when Manchester United faces Barcelona.
  • mloss.org – Machine Learning Open Source Software – Big collection of open source packages for machine learning, data mining, statistical analysis
  • The Datacenter as Computer – Luiz André Barroso and Urs Hölzle 2009 (PDF) – 120 pages on large scale computing lessons from Google. "These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base."
  • Geeking with Greg: The datacenter is the new mainframe – Pointer to a paper by Googlers Luiz Andre Barroso and Urs Holzle on the evolution of warehouse scale computing and the management and use of computing resources in a contemporary datacenter.

Bookmarks for May 12th from 10:52 to 21:56

These are my links for May 12th from 10:52 to 21:56:

Bookmarks for May 6th through May 7th

These are my links for May 6th through May 7th:

Bookmarks for April 12th through April 13th

These are my links for April 12th through April 13th:

Bookmarks for April 11th through April 12th

These are my links for April 11th through April 12th:

  • Wordle – Beautiful Word Clouds – Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes.
  • The dark side of Dubai – Johann Hari, Commentators – The Independent – "Dubai was meant to be a Middle-Eastern Shangri-La, a glittering monument to Arab enterprise and western capitalism. But as hard times arrive in the city state that rose from the desert sands, an uglier story is emerging."
  • Topless Robot – Hot Girls Have Lightsaber Strip-Fight for Your Viewing Pleasure – Star Wars CGI meets fake body spray ad
  • Poll Result: Best VPN to leap China’s Great Firewall? – Thomas Crampton – - Witopia – Undisputed winner. Quality of service, speed of surfing, though it is said to be relatively expensive at US$50 to US$60 per year. Hotspot Shield – Bandwidth limits can be painful. Force you to wait until the next month if you use it too much. – Ultrasurf – StrongVPN
  • InfoQ: Facebook: Science and the Social Graph – In this presentation filmed during QCon SF 2008 (November 2008), Aditya Agarwal discusses Facebook’s architecture, more exactly the software stack used, presenting the advantages and disadvantages of its major components: LAMP (PHP, MySQL), Memcache, Thrift, Scribe.
  • The Running Man, Revisited § SEEDMAGAZINE.COM – a handful of scientists think that these ultra-marathoners are using their bodies just as our hominid forbears once did, a theory known as the endurance running hypothesis (ER). ER proponents believe that being able to run for extended lengths of time is an adapted trait, most likely for obtaining food, and was the catalyst that forced Homo erectus to evolve from its apelike ancestors.

Bookmarks for April 9th from 08:07 to 17:53

These are my links for April 9th from 08:07 to 17:53:

Bookmarks for March 4th through March 6th

These are my links for March 4th through March 6th:

  • Welcome to VIPERdb – Scripps – VIPERdb is a database for icosahedral virus capsid structures . The emphasis of the resource is on providing data from structural and computational analyses on these systems, as well as high quality renderings for visual exploration.
  • Virus images at VIPERdb – If you have ever wanted to make beautiful images of viruses, in colors of your choice, then go to VIPERdb, the virus particle explorer.
  • Reverse HTTP – IETF draft-lentczner-rhttp-00.txt – Formal description of the reverse HTTP proposal for initiating connections through firewalls then reversing server and client roles.
  • Reverse HTTP – Second Life Wiki – Experimental protocol which takes advantage of the HTTP/1.1 Upgrade: header to turn one HTTP socket around. When a client makes a request to a server with the Upgrade: PTTH/0.9 header, the server may respond with an Upgrade: PTTH/1.0 header, after which point the server starts using the socket as a client, and the client starts using the socket as a server.
  • WTFs/m – The only valid measurement of code quality, WTFs/min

Bookmarks for March 3rd from 05:48 to 12:10

These are my links for March 3rd from 05:48 to 12:10:

Bookmarks for February 27th through February 28th

These are my links for February 27th through February 28th:

Bookmarks for February 26th through February 27th

These are my links for February 26th through February 27th:

Bookmarks for February 18th through February 19th

These are my links for February 18th through February 19th:

Can’t see the lunar eclipse from here

I was thinking about taking a look at tonight’s lunar eclipse, but it turns out that it will be difficult to get a good look from here. The eclipse will last from 7:00 to 7:52pm PST, but moonrise isn’t until 7:42pm PST. By the time it clears the hills, trees and houses, I don’t think there will be much to see from here.

No more fisheye? A better security camera lens


A team at Honam University in Korea has developed a low cost wide angle lens that provides the wide field of view associated with fisheye lenses, but with much lower distortion. The image above is from a wide angle camera mounted on the ceiling of a university book store. Notice the relatively straight lines of the book shelves, in contrast to the curving distortion associated with a fisheye lens.

There are already various software solutions for remapping lens distortion from captured images, but this is a much more elegant approach performing the mapping in analog space before the image is sampled. There is still a blind spot at the center of the image, where the camera blocks the conical miror.

Optics.org says the lens costs around $100, although I suspect that may be the cost of materials for the development team, and probably doesn’t include the cost of the camera. The lens assembly looks more fragile than a typical security camera, but I could also see this making a nice webcam, especially if they come up with a way to minimize/mask/move the blind spot.

Speaking to optics.org, Prof. Gyeong-il Kweon of Honam University said: “We have successfully designed a wide angle lens that can provide a FoV of over 150 degrees with less than 1% distortion, and are very excited about its potential in the security arena.”

Dubbed as a “catadioptric” wide-angle lens, it is made up of a mirror that reflects the light from a wide area (catoptric), and lenses that focus this light on the sensor of a small camera (dioptric).

The setup consists of cone-shaped mirror fixed inside a hemispherical glass dome. At the top of the dome are a series of lenses leading up to a slot for connecting a small camera. Light entering from the dome strikes the mirror and is directed toward the lens. Here, it is focused to form a sharp image at the exact location of the camera’s sensor.

Looking at some of the sample images one can’t help but notice a small black spot at the centre of every picture. This phenomenon, called central obscuration, is actually a reflection of the camera appearing on the mirror. Kweon and his research partner Milton Laikin are looking for ways to overcome this problem. Currently, they have designed a purely dioptric lens that doesn’t suffer from this problem and has a FoV of 120 °.

Link (requires free registration)

Better Eavesdropping with Microwaves


Although there’s no working system described in any articles I can find about this, the patent application that goes with this is filed on behalf of NASA, so it might not be total vaporware.

From Audio DesignLine:

At last, you think that you have a secure room for conversations. No windows to bounce laser beams off as a means to eavesdrop. The doors are sealed and air tight. But don’t rest too easy. Now there’s a new way of snooping using Gigahertz waves.

Reflected electromagnetic signals can be used to detect audible sound. Electromagnetic radiation reflected by a vibrating object includes an amplitude modulated component that represents the object’s vibrations. The new audio interception method works by illuminating an object with an RF beam that does not include any amplitude modulation. Reflections of the RF beam include amplitude modulation that provide information about vibrations or movements of the object. Audio information can be extracted from the amplitude modulated information and used to reproduce any sound pressure waves striking the object. Interestingly enough, the object can be something as unlikely as a piece of clothing. Thus, something as intensely personal as your heart beat can be intercepted by reflected RF waves in addition to audio sounds.

More from New Scientist, discussion at Slashdot, Bruce Schneier (see comments)

Mini windmills for powering very small devices


There are many applications for remote sensors and other small electronic devices in remote locations without access to the electrical grid, and where batteries may be unsuitable. A group from the University of Texas, Arlington has built a miniature windmill is 10cm (a little less than 4 inches) in diameter and can provide a power output of 7.5 milliwatts in a breeze of 16 km/hour (10 mph).

The novel aspect of this design is in its use of piezoelectric crystals rather than a conventional generator. Piezo crystals generate a voltage when they are deformed, and are commonly found in cigarette lighters and barbeque ignitions. This piezoelectric windmill brushes a series of cymbal-shaped transducers as it rotates to generate electricity.

A conventional generator that used a 10-centimetre turbine would convert only 1% of the available wind energy directly into electricity. A piezoelectric generator ups that to 18%, which is comparable to the average efficiency of the best large-scale windmills, says Priya.

Details are published in

  • Energy Harvesting Using a Piezoelectric ‘‘Cymbal’’ Transducer in Dynamic Environment,
    Hyeoung Woo Kim, Amit Batra, Shashank Priay, Kenji Uchino, Douglas Markley,
    Robert E. Newnham and Heath F. Hofmann (PDF)
  • Piezoelectric Windmill: A Novel Solution to Remote Sensing Shashank Priya, Chih-Ta Chen, Darren Fye and Jeff Zahnd (PDF)
  • (via Nature)

    Hemo the Magnificent

    I found Hemo the Magnificentin the Netflix catalog a while back.

    I remember watching these in elementary school, in the days before VHS videos, DVD players, or the internet. The classroom window shades would come down, a projector cart would be rolled into the back of the room, and we all got to watch film strips or slides, or occasionally, movies, which would be shown on an ancient 16mm projector with a built-in amplifier and detachable speaker built into the cover. Aside from the entertainment value of the film, it also meant a change from the usual class schedule, plus nap time for one or two in the back of the room.

    I think I saw Hemo the Magnificent, Our Mister Sun, and Powers of Ten a couple of times in 4th and 5th grade or so, and hadn’t seen them since.

    I just watched Hemo the Magnificent with my 4th grade daughter and her two friends. It wasn’t a big hit with them (“Too Educational!”), although the animated sections helped. I still like the explaination of the circulatory system, with the little musclemen and the brain on the telephone. The girls did make it all the way through the movie, but bailed out on the second half of the DVD, Unchained Goddess, which is about weather. They were probably wondering about the “magic screen”, the ancient telephone, and the reel-change prompts in the movie, too. I was thinking it needed the movie projector sprocket rattling and the random flutter, distortion, and midrange boominess to recreate the experience.

    For me, it was fun and interesting to watch. The movie was made in the late 50′s, and has the titling, musical soundtrack, set decor, and character mannerisms that capture the sense of confidence, optimism, and vaguely happy goofiness that I remember. Elements of the visual style are recognizable to today’s kids from watching the Powerpuff Girls, and the friendly scientist-narrators could easily be swapped with The Professor.

    In general, they just seem so pleased with things. You can practically see them thinking “We are Scientists! Isn’t that Great! We are Thinking Big Thoughts!”

    At this point, these movies may be more fun for adults than kids, but I may give these another try when things are a little quieter around here. The Palo Alto school district is having a 5-day weekend, so the neighborhood kids are running around and I think their attention span may be too short for this sort of movie today.

    In the meantime, I’ll be busy Thinking Big Thoughts.