Bookmarks for May 14th through May 15th

These are my links for May 14th through May 15th:

  • Congratulations, Google staff: $210k in profit per head in 2008 | Royal Pingdom – Google had $209,624 in profit per employee in 2008, which beats all the other large tech companies we looked at, including big hitters like Microsoft ($194K), Apple ($151K), Intel ($64K) and IBM ($30K).
  • Statistical Data Mining Tutorials – A nice collection of presentations reviewing topics in data mining and machine learning. e.g. "HillClimbing, Simulated Annealing and Genetic Algorithms. Some very useful algorithms, to be used only in case of emergency." These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
  • Dare Obasanjo aka Carnage4Life – Why Twitter’s Engineers Hate the @replies feature – Looking at the infrastructure overhead required for Twitter's attempted change to @reply behavior.
  • Scratch Helps Kids Get With the Program – Gadgetwise Blog – NYTimes.com – On my candidate list for 7th grade introductory programming and analysis. "Scratch, an M.I.T.-developed computer-programming language for children, is the focus of worldwide show-and-tell sessions this Saturday. "
  • jLinq – Javascript Query Language – For manipulating data sets in Javascript, sort of like jQuery

Watching 4th graders use search engines

Last Friday I spent an hour with my daughter’s 4th grade class, helping them do online research for reports on early California explorers. They were individually assigned an explorer, and were looking for basic biographical information such as dates and places of birth and death, and notable historical achievements or other interesting items to write about. From my perspective, this turned out to be a sort of small focus group on using search engines.

I spend most of my time around people who are pretty good at using search engines and online research tools, so it was interesting to see what they would do with this assignment.

The kids are all familiar with computers to varying degrees. They have had classroom activities using the computer at least once a week since kindergarten, and most of them have some experience using computers at home (this is Palo Alto, after all). I don’t think they’ve done any organized “internet research” in school up to this point, though.

They all started with their research subject’s name written on a piece of paper and had about 20 minutes to find some useful information.

Here are some observations:

  • Simply typing in the names of the explorers was challenging for many of them (“Joseph Joaquin Moraga”, “Ivan Alexandrovich Kuskov”, and others I can’t recall).
  • They often tried to type the search phrase into the address bar. I also saw at least one person try to type the search phrase into a form entry field in an advertisement.
  • Their default home page is set to Yahooligans!, which is kid friendly but seems to sharply limit the search results. I had the kids try their queries there first, but most of them returned zero search results.
  • I then let the kids choose which search engine they wanted to use. About a third of the kids voluntarily expressed a preference for using Google, most of the rest didn’t know or care (I sent about half to Yahoo and half to Google), and one kid really wanted to use A9 (strange, I didn’t have a chance to find out why).
  • None of the kids were familiar with using quote marks to specify exact phrase matching. Some of the explorers’ names contain commonly occuring components and return a large number of irrelevant results without quotes.
  • None of the kids were familiar with the advanced search operators for excluding or qualifying search results. I had to help out in a couple of cases where they were having trouble finding relevant pages.
  • Some of them didn’t understand the difference between page content and the ads in the headers, footers, and sidebars.
  • Some of them were already both familiar with Wikipedia and the benefit and problem that anyone can change the page. One person wanted to look exclusively on Wikipedia after the subject came up.
  • The absence of a bookmarking system for the students to use tends to force them to print out pages they want to use later. This isn’t wonderful at a school lab, since the content is semi-disposable and they’re usually scrounging to conserve printer consumables like toner and paper. The kids liked having something to take back to the classroom with them, though
  • The variations in spelling for the mostly Spanish names caused problems for some queries. Google’s “did you mean” suggestions were helpful. At least one query (which I can’t recall) consisted entirely of common Hispanic names, which matched several famous people other than the intended query subject. This is similar to the problem of searching on common Asian names (like mine).
  • Some students quickly clicked themselves into a rathole of completely unrelated pages, usually after clicking on an ad.

Watching the kids trying to find useful pages highlighted the differences with my usual search behavior, which is to quickly scan the search results page, then refine the query using additional keywords and/or search operators, both of which are hard for 9- and 10-year-olds to do. In “research mode” I usually open results in a new browser tab or window. The kids actually click through the link, making it hard to work through a list of candidate results.

Coincidentally, earlier this week I came across a post on Google Blogoscoped which points to a recent dissertation on search user interface design geared towards kids, by Hilary Browne Hutchinson at University of Maryland which has some interesting observations and ideas.