<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ho John Lee's Weblog</title>
	<atom:link href="http://www.hojohnlee.com/weblog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.hojohnlee.com/weblog</link>
	<description>Living at the intersection of technology, finance, culture, and markets</description>
	<lastBuildDate>Tue, 08 Jun 2010 06:00:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Slides from the Social Graph Symposium panel</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/06/07/slides-from-the-social-graph-symposium-panel/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/06/07/slides-from-the-social-graph-symposium-panel/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 06:00:43 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[social software]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[The Internet]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[presentations]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[sgs]]></category>
		<category><![CDATA[sgs10]]></category>
		<category><![CDATA[social graph]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsearch]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1569</guid>
		<description><![CDATA[<p>Some introductory slides from a panel session at the <a href="http://socialgraphsymp.com/">Social Graph Symposium</a>. </p>
<div id="__ss_4435191" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Social Graph Symposium Panel - May 2010" href="http://www.slideshare.net/hojohnlee/social-graph-symposium-panel-may-2010">Social Graph Symposium Panel &#8211; May 2010</a></strong>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/hojohnlee">Ho John Lee</a>.</div>
</div>
<p>Social Graph&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Some introductory slides from a panel session at the <a href="http://socialgraphsymp.com/">Social Graph Symposium</a>. </p>
<div id="__ss_4435191" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Social Graph Symposium Panel - May 2010" href="http://www.slideshare.net/hojohnlee/social-graph-symposium-panel-may-2010">Social Graph Symposium Panel &#8211; May 2010</a></strong><object id="__sse4435191" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sgs10-may2010-100607235918-phpapp02&amp;stripped_title=social-graph-symposium-panel-may-2010" /><param name="name" value="__sse4435191" /><param name="allowfullscreen" value="true" /><embed id="__sse4435191" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sgs10-may2010-100607235918-phpapp02&amp;stripped_title=social-graph-symposium-panel-may-2010" name="__sse4435191" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/hojohnlee">Ho John Lee</a>.</div>
</div>
<p>Social Graph Symposium Panel &#8211; May 2010 &#8211; Presentation Transcript</p>
<p>1. Social Graph Symposium Panel<br />
Ho John Lee | Principal Program Manager | Bing Social Search<br />
2. About me:<br />
Ho John Lee<br />
hojohn . lee @ microsoft . com<br />
twitter.com/hjl<br />
Past: Bing Twitter (v1), SocialQuant, trading, investing/consulting (China, India)<br />
HP Labs, MIT, Stanford, Harvard<br />
Current: Bing Social Search &#8211; graph and time series analysis, data mining<br />
Twitter, Facebook, new products, technical planning<br />
3. What can we do by observing social networks?<br />
On the internet, no one knows you’re a dog.<br />
But in social networks, we can tell if you act like a dog, what groups you belong to, and some of your interests<br />
4. How many Twitter users are there?<br />
from a search on twopular, May 2009<br />
5. Graph analysis for relevance and ranking<br />
Spam marketing campaign<br />
(teeth whitening)<br />
Naturally connected community (#smx)<br />
Real time relevance needs data mining to filter and rank based on history<br />
Spammy communities can be highly visible<br />
Social graph, topic/concept graph, and behavior/gesture graphs are all useful tools<br />
6. Information diffusion in the graph<br />
Observed incidence network of retweets in Twitter<br />
Kwak, Lee, et al, What is Twitter, a Social Network or a News Media? WWW2010<br />
Information flow and behaviors form an implicit interaction graph<br />
7. Topic / sentiment range, volume, trend analysis<br />
What is the baseline rate of mentions / sentiment per unit time?<br />
Look for changes in attention flow around a subject, location, topic<br />
Watch for correlated signals from multiple sources<br />
Consider source relevance and authority as well<br />
8. Applying graph analysis<br />
Attention flow vs information flow<br />
Leads to utility functions, cost functions<br />
Variable diffusion rates by actor / network / info type<br />
Predicting interests and affiliations<br />
Content creation follows attention<br />
Self-organized communities of attention<br />
If there’s no content, you can ask for some<br />
Observable propagation of information<br />
9. Clustering and fuzzing properties and identities<br />
* Frequently used terms can identify interests, affinities, latent query intent<br />
* But can potentially be used to identify likely individual users!<br />
* Infochaff – fuzzing out identity, behavior, properties<br />
10. Thank You<br />
Ho John Lee<br />
hojohn . lee @ microsoft . com<br />
twitter.com/hjl</p>
<blockquote><p><a href="http://socialgraphsymp.com/panels/">RESEARCH: Insights from the latest social graph studies</a><br />
Moderator: Eric Siegel – President at Prediction Impact and Conference Chair at Predictive Analytics World<br />
Speakers:<br />
Sharad Goel – Research Scientist at Yahoo<br />
Ho John Lee – Principal Program Manager at Microsoft<br />
DJ Patil – Chief Scientist at LinkedIn<br />
Marc Smith – Chief Social Scientist at Connected Action Consulting Group</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/06/07/slides-from-the-social-graph-symposium-panel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for February 4th through February 11th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/02/11/bookmarks-for-february-4th-through-february-11th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/02/11/bookmarks-for-february-4th-through-february-11th/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 17:00:14 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[application]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[career]]></category>
		<category><![CDATA[comics]]></category>
		<category><![CDATA[crime]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[demographics]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[fraud]]></category>
		<category><![CDATA[funny]]></category>
		<category><![CDATA[gametheory]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[innovation]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[memcache]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[nigeria]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[organization]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[redis]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[risk]]></category>
		<category><![CDATA[sales]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[xkcd]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1544</guid>
		<description><![CDATA[<p>These are my links for February 4th through February 11th:</p>
<ul>
<li><a href="http://www.schneier.com/blog/archives/2010/02/interview_with_16.html">Schneier on Security: Interview with a Nigerian Internet Scammer</a> &#8211; &#34;We had something called the recovery approach. A few months after the original scam, we would approach the victim</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for February 4th through February 11th:</p>
<ul>
<li><a href="http://www.schneier.com/blog/archives/2010/02/interview_with_16.html">Schneier on Security: Interview with a Nigerian Internet Scammer</a> &#8211; &quot;We had something called the recovery approach. A few months after the original scam, we would approach the victim again, this time pretending to be from the FBI, or the Nigerian Authorities. The email would tell the victim that we had caught a scammer and had found all of the details of the original scam, and that the money could be recovered. Of course there would be fees involved as well. Victims would often pay up again to try and get their money back.&quot;</li>
<li><a href="http://xkcd.com/696/">xkcd &#8211; Frequency of Strip Versions of Various Games</a> &#8211; n = Google hits for &quot;strip &lt;game name&gt;&quot; / Google hits for &quot;&lt;game name&gt;&quot;</li>
<li><a href="http://petewarden.typepad.com/searchbrowser/2010/02/how-to-split-up-the-us.html">PeteSearch: How to split up the US</a> &#8211; Visualization of social network clusters in the US. &quot;information by location, with connections drawn between places that share friends. For example, a lot of people in LA have friends in San Francisco, so there&#39;s a line between them.
<p>Looking at the network of US cities, it&#39;s been remarkable to see how groups of them form clusters, with strong connections locally but few contacts outside the cluster. For example Columbus, OH and Charleston WV are nearby as the crow flies, but share few connections, with Columbus clearly part of the North, and Charleston tied to the South.&quot;</li>
<li><a href="http://www.linux-mag.com/cache/7496/1.html">Redis: Lightweight key/value Store That Goes the Extra Mile | Linux Magazine</a> &#8211; Sort of like memcache. &quot;Calling redis a key/value store doesn&rsquo;t quite due it justice. It&rsquo;s better thought of as a &ldquo;data structures&rdquo; server that supports several native data types and operations on them. That&rsquo;s pretty much how creator Salvatore Sanfilippo (known as antirez) describes it in the documentation. Let&rsquo;s dig in and see how it works.&quot;</li>
<li><a href="http://www.nytimes.com/2010/02/04/opinion/04brass.html?partner=rss&amp;emc=rss">Op-Ed Contributor &#8211; Microsoft&rsquo;s Creative Destruction &#8211; NYTimes.com</a> &#8211; Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually developed a system to thwart innovation. Despite having one of the largest and best corporate laboratories in the world, and the luxury of not one but three chief technology officers, the company routinely manages to frustrate the efforts of its visionary thinkers.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/02/11/bookmarks-for-february-4th-through-february-11th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for January 30th through February 4th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/02/04/bookmarks-for-january-30th-through-february-4th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/02/04/bookmarks-for-january-30th-through-february-4th/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 17:00:12 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[browser]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[career]]></category>
		<category><![CDATA[cookie]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[davinci]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[eff]]></category>
		<category><![CDATA[engineering]]></category>
		<category><![CDATA[equations]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[innovation]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jsmath]]></category>
		<category><![CDATA[latex]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[mathml]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[organization]]></category>
		<category><![CDATA[policy]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[renaissance]]></category>
		<category><![CDATA[resume]]></category>
		<category><![CDATA[risk]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[society]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[tex]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[tracking]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1541</guid>
		<description><![CDATA[<p>These are my links for January 30th through February 4th:</p>
<ul>
<li><a href="http://www.nytimes.com/2010/02/04/opinion/04brass.html?partner=rss&#38;emc=rss">Op-Ed Contributor &#8211; Microsoft&#8217;s Creative Destruction &#8211; NYTimes.com</a> &#8211; Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for January 30th through February 4th:</p>
<ul>
<li><a href="http://www.nytimes.com/2010/02/04/opinion/04brass.html?partner=rss&amp;emc=rss">Op-Ed Contributor &#8211; Microsoft&rsquo;s Creative Destruction &#8211; NYTimes.com</a> &#8211; Unlike other companies, Microsoft never developed a true system for innovation. Some of my former colleagues argue that it actually developed a system to thwart innovation. Despite having one of the largest and best corporate laboratories in the world, and the luxury of not one but three chief technology officers, the company routinely manages to frustrate the efforts of its visionary thinkers.</li>
<li><a href="http://gizmodo.com/5460442/leonardo-da-vincis-resume-explains-why-hes-the-renaissance-man-for-the-job">Leonardo da Vinci&#8217;s Resume Explains Why He&#8217;s The Renaissance Man For the Job &#8211; Davinci &#8211; Gizmodo</a> &#8211; At one time in history, even da Vinci himself had to pen a resume to explain why he was a qualified applicant. Here&#39;s a translation of his letter to the Duke of Milan, delineating his many talents and abilities. &quot;Most Illustrious Lord, Having now sufficiently considered the specimens of all those who proclaim themselves skilled contrivers of instruments of war, and that the invention and operation of the said instruments are nothing different from those in common use: I shall endeavor, without prejudice to any one else, to explain myself to your Excellency, showing your Lordship my secret, and then offering them to your best pleasure and approbation to work with effect at opportune moments on all those things which, in part, shall be briefly noted below..The document, written when da Vinci was 30, is actually more of a cover letter than a resume; he leaves out many of his artistic achievements and instead focuses on what he can provide for the Duke in technologies of war.</li>
<li><a href="http://www.math.union.edu/~dpvc/jsMath/welcome.html">jsMath: jsMath Home Page</a> &#8211; The jsMath package provides a method of including mathematics in HTML pages that works across multiple browsers under Windows, Macintosh OS X, Linux and other flavors of unix. It overcomes a number of the shortcomings of the traditional method of using images to represent mathematics: jsMath uses native fonts, so they resize when you change the size of the text in your browser, they print at the full resolution of your printer, and you don&#39;t have to wait for dozens of images to be downloaded in order to see the mathematics in a web page. There are also advantages for web-page authors, as there is no need to preprocess your web pages to generate any images, and the mathematics is entered in TeX form, so it is easy to create and maintain your web pages. Although it works best with the TeX fonts installed, jsMath will fall back on a collection of image-based fonts (which can still be scaled or printed at high resolution) or unicode fonts when the TeX fonts are not available.</li>
<li><a href="http://joshduck.com/blog/2010/01/29/abusing-the-cache-tracking-users-without-cookies/">Josh on the Web &raquo; Blog Archive &raquo; Abusing the Cache: Tracking Users without Cookies</a> &#8211; To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first &#8211; and only &#8211; time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code.
<p>One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.</li>
<li><a href="http://panopticlick.eff.org/index.php?action=log&amp;js=yes">Panopticlick</a> &#8211; Your browser fingerprint appears to be unique among the 342,943 tested so far.
<p>Currently, we estimate that your browser has a fingerprint that conveys at least 18.39 bits of identifying information.</p>
<p>The measurements we used to obtain this result are listed below. You can read more about the methodology here, and about some defenses against fingerprinting here</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/02/04/bookmarks-for-january-30th-through-february-4th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for January 23rd through January 30th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/01/31/bookmarks-for-january-23rd-through-january-30th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/01/31/bookmarks-for-january-23rd-through-january-30th/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 20:00:13 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[authentication]]></category>
		<category><![CDATA[browser]]></category>
		<category><![CDATA[crypto]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[datasets]]></category>
		<category><![CDATA[eff]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[federal]]></category>
		<category><![CDATA[gov20]]></category>
		<category><![CDATA[government]]></category>
		<category><![CDATA[hashing]]></category>
		<category><![CDATA[hmac]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[md5]]></category>
		<category><![CDATA[open]]></category>
		<category><![CDATA[opendata]]></category>
		<category><![CDATA[password]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[reference]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[sha1]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[society]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[tracking]]></category>
		<category><![CDATA[us]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1532</guid>
		<description><![CDATA[<p>These are my links for January 23rd through January 30th:</p>
<ul>
<li><a href="http://gizmodo.com/5460442/leonardo-da-vincis-resume-explains-why-hes-the-renaissance-man-for-the-job">Leonardo da Vinci&#8217;s Resume Explains Why He&#8217;s The Renaissance Man For the Job &#8211; Davinci &#8211; Gizmodo</a> &#8211; At one time in history, even da Vinci himself had to</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for January 23rd through January 30th:</p>
<ul>
<li><a href="http://gizmodo.com/5460442/leonardo-da-vincis-resume-explains-why-hes-the-renaissance-man-for-the-job">Leonardo da Vinci&#8217;s Resume Explains Why He&#8217;s The Renaissance Man For the Job &#8211; Davinci &#8211; Gizmodo</a> &#8211; At one time in history, even da Vinci himself had to pen a resume to explain why he was a qualified applicant. Here&#39;s a translation of his letter to the Duke of Milan, delineating his many talents and abilities. &quot;Most Illustrious Lord, Having now sufficiently considered the specimens of all those who proclaim themselves skilled contrivers of instruments of war, and that the invention and operation of the said instruments are nothing different from those in common use: I shall endeavor, without prejudice to any one else, to explain myself to your Excellency, showing your Lordship my secret, and then offering them to your best pleasure and approbation to work with effect at opportune moments on all those things which, in part, shall be briefly noted below..The document, written when da Vinci was 30, is actually more of a cover letter than a resume; he leaves out many of his artistic achievements and instead focuses on what he can provide for the Duke in technologies of war.</li>
<li><a href="http://www.math.union.edu/~dpvc/jsMath/welcome.html">jsMath: jsMath Home Page</a> &#8211; The jsMath package provides a method of including mathematics in HTML pages that works across multiple browsers under Windows, Macintosh OS X, Linux and other flavors of unix. It overcomes a number of the shortcomings of the traditional method of using images to represent mathematics: jsMath uses native fonts, so they resize when you change the size of the text in your browser, they print at the full resolution of your printer, and you don&#39;t have to wait for dozens of images to be downloaded in order to see the mathematics in a web page. There are also advantages for web-page authors, as there is no need to preprocess your web pages to generate any images, and the mathematics is entered in TeX form, so it is easy to create and maintain your web pages. Although it works best with the TeX fonts installed, jsMath will fall back on a collection of image-based fonts (which can still be scaled or printed at high resolution) or unicode fonts when the TeX fonts are not available.</li>
<li><a href="http://joshduck.com/blog/2010/01/29/abusing-the-cache-tracking-users-without-cookies/">Josh on the Web &raquo; Blog Archive &raquo; Abusing the Cache: Tracking Users without Cookies</a> &#8211; To track a user I make use of three URLs: the container, which can be any website; a shim file, which contains a unique code; and a tracking page, which stores (and in this case displays) requests. The trick lies in making the browser cache the shim file indefinitely. When the file is requested for the first &#8211; and only &#8211; time a unique identifier is embedded in the page. The shim embeds the tracking page, passing it the unique ID every time it is loaded. See the source code.
<p>One neat thing about this method is that JavaScript is not strictly required. It is only used to pass the message and referrer to the tracker. It would probably be possible to replace the iframes with CSS and images to gain JS-free HTTP referrer logging but would lose the ability to store messages so easily.</li>
<li><a href="http://panopticlick.eff.org/index.php?action=log&amp;js=yes">Panopticlick</a> &#8211; Your browser fingerprint appears to be unique among the 342,943 tested so far.
<p>Currently, we estimate that your browser has a fingerprint that conveys at least 18.39 bits of identifying information.</p>
<p>The measurements we used to obtain this result are listed below. You can read more about the methodology here, and about some defenses against fingerprinting here</li>
<li><a href="http://benlog.com/articles/2008/06/19/dont-hash-secrets/">Benlog &raquo; Don&rsquo;t Hash Secrets</a> &#8211; If I tell you that SHA1(foo) is X, then it turns out in a lot of cases to be quite easy for you to determine what SHA1(foo || bar) is. You don&rsquo;t need to know what foo is. because SHA1 is iterative and works block by block, if you know the hash of foo, then you can extend the computation to determine the hash of foo || bar
<p>That means that if you know SHA1(secret || message), you can compute SHA1(secret || message || ANYTHING), which is a valid signature for message || ANYTHING. So to break this system, you just need to see one signature from SuperAnnoyingPoke, then you can impersonate SuperAnnoyingPoke for lots of other messages.</p>
<p>What you should be using is HMAC: Hash-function Message Authentication Code. You don&rsquo;t need to know exactly how it works, just need to know that HMAC is specifically built for message authentication codes and the use case of SuperAnnoyingPoke/MyFace. Under the hood, what&rsquo;s approximately going on is two hashes, with the secret combined after the first hash</li>
<li><a href="http://www.data.gov/ogd">Data.gov &#8211; Featured Datasets: Open Government Directive Agency</a> &#8211; Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act request logs, Treasury TARP and derivative activity logs, crime, income, agriculture datasets.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/01/31/bookmarks-for-january-23rd-through-january-30th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for January 20th through January 23rd</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/01/23/bookmarks-for-january-20th-through-january-23rd/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/01/23/bookmarks-for-january-20th-through-january-23rd/#comments</comments>
		<pubDate>Sun, 24 Jan 2010 00:00:09 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[bot]]></category>
		<category><![CDATA[california]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[electricity]]></category>
		<category><![CDATA[fun]]></category>
		<category><![CDATA[health]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[mashup]]></category>
		<category><![CDATA[monitor]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[outage]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[pge]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[power]]></category>
		<category><![CDATA[reporting]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[status]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1529</guid>
		<description><![CDATA[<p>These are my links for January 20th through January 23rd:</p>
<ul>
<li><a href="http://www.data.gov/ogd">Data.gov &#8211; Featured Datasets: Open Government Directive Agency</a> &#8211; Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for January 20th through January 23rd:</p>
<ul>
<li><a href="http://www.data.gov/ogd">Data.gov &#8211; Featured Datasets: Open Government Directive Agency</a> &#8211; Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act request logs, Treasury TARP and derivative activity logs, crime, income, agriculture datasets.</li>
<li><a href="http://www.techcrunch.com/2010/01/22/twitter-bot-love/">All Your Twitter Bot Needs Is Love</a> &#8211; The bot&rsquo;s name? Jason Thorton. He&rsquo;s been humming along for months now, sending out over 1250 tweets to some 174 followers. His tweets, while not particularly creative, manage to be both believable and timely. And he&rsquo;s powered by a single word: Love.
<p>Thorton is the creation of developer Ryan Merket, who built him as a side project in around three hours. Merket has just posted the code that powers him, and has also divulged how he made Thorton seem somewhat realistic: the bot looks for tweets with the word &ldquo;love&rdquo; in them and tweets them as its own.</li>
<li><a href="http://ryanmerket.com/blog/2010/01/22/building-a-twitter-bot/">Building a Twitter Bot</a> &#8211; &quot;Meet Jason Thorton. To people who know Jason, he is a successful entrepreneur in San Francisco who tweets 4-5 times a day. But Jason has a secret, he&rsquo;s not really a human, he&rsquo;s the product of my simple algorithm in PHP
<p>Jason tweets A LOT about the word &ldquo;love&rdquo; &#8211; that&rsquo;s because Jason actually steals tweets from the public timeline that contain the word &ldquo;love&rdquo; and posts them as his own</p>
<p>Jason also @replies to people who use the word &ldquo;love&rdquo; in their tweets, and asks them random questions or says something arbitrary</p>
<p>It took me about 3 hours to code Jason, imagine what a real engineer could do with real AI algorithms? Now realize that it&rsquo;s already a reality. Sites like Twitter are full of side projects, company initiatives, spambots and AI robots. When the free flow of information becomes open, the amount of disinformation increases. Theres a real need for someone to vet the people we &lsquo;meet&rsquo; on social sites &#8211; will be interesting to see how this market grows in the next year</li>
<li><a href="http://api-status.com/">Website monitoring status &#8211; Public API Status</a> &#8211; Health monitor for 26 APIs from popular Web services, including Google Search, Google Maps, Bing, Facebook, Twitter, SalesForce, YouTube, Amazon, eBay and others</li>
<li><a href="http://www.pge.com/myhome/customerservice/energystatus/outagemap/">PG&amp;E Electrical System Outage Map</a> &#8211; This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/01/23/bookmarks-for-january-20th-through-january-23rd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for January 17th through January 20th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/01/20/bookmarks-for-january-17th-through-january-20th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/01/20/bookmarks-for-january-17th-through-january-20th/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 21:00:08 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[audio]]></category>
		<category><![CDATA[availability]]></category>
		<category><![CDATA[Blogging]]></category>
		<category><![CDATA[california]]></category>
		<category><![CDATA[clips]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[dataset]]></category>
		<category><![CDATA[ecosystem]]></category>
		<category><![CDATA[effects]]></category>
		<category><![CDATA[electricity]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[fun]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[history]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[mp3]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[outage]]></category>
		<category><![CDATA[pge]]></category>
		<category><![CDATA[power]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[socialmedia]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[sound]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[traffic]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[usability]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1522</guid>
		<description><![CDATA[<p>These are my links for January 17th through January 20th:</p>
<ul>
<li><a href="http://www.pge.com/myhome/customerservice/energystatus/outagemap/">PG&#38;E Electrical System Outage Map</a> &#8211; This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for January 17th through January 20th:</p>
<ul>
<li><a href="http://www.pge.com/myhome/customerservice/energystatus/outagemap/">PG&amp;E Electrical System Outage Map</a> &#8211; This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.</li>
<li><a href="http://www.avc.com/a_vc/2010/01/twittercom-vs-the-twitter-ecosystem.html">Twitter.com vs The Twitter Ecosystem</a> &#8211; Fred Wilson comments on some data from John Borthwick indicating Twitter ecosystem use = 3-5x Twitter.com directly.
<p>&quot;John&#39;s chart estimates that Twitter.com is about 20mm uvs a month in the US (comScore has it at 60mm uvs worldwide) and the Twitter ecosystem at about 60mm uvs in the US.</p>
<p>That says that across all web services, not just AVC, the Twitter ecosystem is about 3x Twitter.com. And on this blog, whose audience is certainly power users, that ratio is 5x.&quot;</li>
<li><a href="http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/">Chris Walshaw :: Research :: Partition Archive</a> &#8211; Welcome to the University of Greenwich Graph Partitioning Archive. The archive consists of the best partitions found to date for a range of graphs and its aim is to provide a benchmark, against which partitioning algorithms can be tested, and a resource for experimentation.
<p>The partition archive has been in operation since the year 2000 and includes results from most of the major graph partitioning software packages. Researchers developing experimental partitioning algorithms regularly submit new partitions for possible inclusion. </p>
<p>Most of the test graphs arise from typical partitioning applications, although the archive also includes results computed for a graph-colouring test suite [Wal04] contained in a separate annex.</p>
<p>The archive was originally set up as part of a research project into very high quality partitions and authors wishing to refer to the partitioning archive should cite the paper [SWC04].</li>
<li><a href="http://tpgblog.com/2010/01/17/quickux-usability-page-load-time-twitter/">Twitter&rsquo;s Crawl &laquo; The Product Guy</a> &#8211; &quot;A list of incidents that affected the Page Load Time of the Twitter product, distinguishing between total downtime, and partial downtime and information inaccessibility, based upon the public posts on Twitters blog.
<p>http://status.twitter.com/archive</p>
<p>I did my best to not double count any problems, but it was difficult since many of the problems occur so frequently, and it is often difficult to distinguish, from these status blog posts alone, between a persisting problem being experienced or fixed, from that of a new emergence of a similar or same problem. Furthermore, I also excluded the impact on Page Load Time arising from scheduled maintenance/downtime &ndash; periods of time over which the user expectation would be most aligned with the product&rsquo;s promise of Page Load Time. &quot;</li>
<li><a href="http://www.soundboard.com/index.aspx">Soundboard.com</a> &#8211; Soundboard.com is the web&#39;s largest catalog of free sounds and soundboards &#8211; in over 20 categories, for mobile or PC. 252,858 free sounds on 17,171 soundboards from movies to sports, sound effects, television, celebrities, history and travel. Or build, customize, embed and manage your own</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/01/20/bookmarks-for-january-17th-through-january-20th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for December 31st through January 17th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/01/18/bookmarks-for-december-31st-through-january-16th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/01/18/bookmarks-for-december-31st-through-january-16th/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 08:00:00 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[address]]></category>
		<category><![CDATA[ai]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[banking]]></category>
		<category><![CDATA[collaborative]]></category>
		<category><![CDATA[competition]]></category>
		<category><![CDATA[content]]></category>
		<category><![CDATA[contest]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[dataset]]></category>
		<category><![CDATA[demographic]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[docx]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[email]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[finance]]></category>
		<category><![CDATA[game]]></category>
		<category><![CDATA[geek]]></category>
		<category><![CDATA[gis]]></category>
		<category><![CDATA[global]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[gremlin]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[hack]]></category>
		<category><![CDATA[hacks]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[http]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[information]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[lists]]></category>
		<category><![CDATA[machinelearning]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[money]]></category>
		<category><![CDATA[msword]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[online]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[parody]]></category>
		<category><![CDATA[prediction]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[probability]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[proxy]]></category>
		<category><![CDATA[putty]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[random]]></category>
		<category><![CDATA[rdf]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[retrieval]]></category>
		<category><![CDATA[school]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[semantic]]></category>
		<category><![CDATA[series]]></category>
		<category><![CDATA[sna]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[socialnetworking]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsearch]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[ssh]]></category>
		<category><![CDATA[starcraft]]></category>
		<category><![CDATA[statistical]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[teaching]]></category>
		<category><![CDATA[terrorism]]></category>
		<category><![CDATA[testing]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[time]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[tunnel]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[word]]></category>
		<category><![CDATA[xml]]></category>
		<category><![CDATA[xpath]]></category>
		<category><![CDATA[xss]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/archives/2010/01/18/bookmarks-for-december-31st-through-january-16th/</guid>
		<description><![CDATA[<p>These are my links for December 31st through January 17th:</p>
<ul>
<li><a href="http://www.khanacademy.org/">Khan Academy</a> &#8211; The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
<p>We have 1000+ videos on YouTube covering</p></li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for December 31st through January 17th:</p>
<ul>
<li><a href="http://www.khanacademy.org/">Khan Academy</a> &#8211; The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
<p>We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.</li>
<li><a href="http://eis.ucsc.edu/StarCraftAICompetition">StarCraft AI Competition | Expressive Intelligence Studio</a> &#8211; AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.<br />
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units.   An introduction to the Broodwar API is available here.    Instructions for building a bot that communicates with a remote process are available here.  There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:<br />
    * Planning<br />
    * Data Mining<br />
    * Machine Learning<br />
    * Case-Based Reasoning</li>
<li><a href="http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html">Measuring Measures: Learning About Statistical Learning</a> &#8211; A &quot;quick start guide&quot; for statistical and machine learning systems, good collection of references.</li>
<li><a href="http://escholarship.org/uc/item/8589j79h">Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006)</a> &#8211; Berkowitz, Steven D., Woodward, Lloyd H., &amp; Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman&rsquo;s &ldquo;Steve Berkowitz: A Network Pioneer has passed away,&rdquo; in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003</li>
<li><a href="http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/">SSH Tunneling through web filters | s-anand.net</a> &#8211; Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.</li>
<li><a href="http://github.com/msanders/pydroid">PyDroid GUI automation toolkit &#8211; GitHub</a> &#8211; What is Pydroid?
<p>Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.<br />
Why use Pydroid?</p>
<p>    * Testing a GUI application for bugs and edge cases<br />
          o You might think your app is stable, but what happens if you press that button 5000 times?<br />
    * Automating games<br />
          o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.<br />
    * Freaking out friends and family<br />
          o Well maybe this isn&#39;t really a practical use, but&#8230;</li>
<li><a href="http://robjhyndman.com/TSDL/index.htm">Time Series Data Library</a> &#8211; More data sets &#8211; &quot;This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport &amp; Tourism Tree-rings Utilities&quot;</li>
<li><a href="http://blog.textwise.com/?p=222">How informative is Twitter? &raquo; SemanticHacker Blog</a> &#8211; &quot;We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (&rdquo;tweets&rdquo;) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL.&quot;</li>
<li><a href="http://github.com/tinkerpop/gremlin">Gremlin &#8211; a Turing-complete, graph-based programming language &#8211; GitHub</a> &#8211; Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
<p>    * TinkerGraph in-memory graph<br />
    * Neo4j graph database<br />
    * Sesame 2.0 compliant RDF stores<br />
    * MongoDB document database</p>
<p>The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.</li>
<li><a href="http://www.bobhobbs.com/files/kr_lovecraft.html">The C Programming Language: 4.10 &#8211; by Kernighan &amp; Ritchie &amp; Lovecraft</a> &#8211; void Rlyeh<br />
      (int mene[], int wgah, int nagl) {<br />
      int Ia, fhtagn;<br />
      if (wgah&gt;=nagl) return;<br />
      swap (mene,wgah,(wgah+nagl)/2);<br />
      fhtagn = wgah;<br />
      for (Ia=wgah+1; Ia&lt;=nagl; Ia++)<br />
      if (mene[Ia]&lt;mene[wgah])<br />
      swap (mene,++fhtagn,Ia);<br />
      swap (mene,wgah,fhtagn);<br />
      Rlyeh (mene,wgah,fhtagn-1);<br />
      Rlyeh (mene,fhtagn+1,nagl);</p>
<p>      } // PH&#39;NGLUI MGLW&#39;NAFH CTHULHU!</li>
<li><a href="http://maxklein.posterous.com/how-to-convert-email-addresses-into-name-age">How to convert email addresses into name, age, ethnicity, sexual orientation &#8211; This is so Meta</a> &#8211; &quot;Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest.&quot;</li>
<li><a href="http://msdn.microsoft.com/en-us/security/sdl-tools-download.aspx">Microsoft Security Development Lifecycle (SDL): Tools Repository</a> &#8211; A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.</li>
<li><a href="http://analyticsx.com/">Analytics X Prize &#8211; Home</a> &#8211; Forecast the murder rate in Philadelphia &#8211; The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities.  It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods.  Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.</li>
<li><a href="http://petewarden.typepad.com/searchbrowser/2010/01/how-to-find-user-information-from-an-email-address.html">PeteSearch: How to find user information from an email address</a> &#8211; FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.</li>
<li><a href="http://measuringmeasures.blogspot.com/2010/01/beyond-pagerank-learning-with-content.html">Measuring Measures: Beyond PageRank: Learning with Content and Networks</a> &#8211; Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) &quot;way harder.&quot; Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph</li>
<li><a href="http://github.com/mikemaccana/python-docx">mikemaccana&#8217;s python-docx at master &#8211; GitHub</a> &#8211; MIT-licensed Python library to read/write Microsoft Word docx format files. &quot;The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as &#39;WordML&#39;, &#39;Office Open XML&#39; and &#39;Open XML&#39; by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office.&quot;</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/01/18/bookmarks-for-december-31st-through-january-16th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 13th through January 16th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2010/01/16/bookmarks-for-june-13th-through-january-16th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2010/01/16/bookmarks-for-june-13th-through-january-16th/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 07:14:22 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[regex]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1416</guid>
		<description><![CDATA[<p>These are my links for June 13th through January 16th:</p>
<ul>
<li><a href="http://eis.ucsc.edu/StarCraftAICompetition">StarCraft AI Competition &#124; Expressive Intelligence Studio</a> &#8211; AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.<br />
The</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 13th through January 16th:</p>
<ul>
<li><a href="http://eis.ucsc.edu/StarCraftAICompetition">StarCraft AI Competition | Expressive Intelligence Studio</a> &#8211; AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.<br />
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units.   An introduction to the Broodwar API is available here.    Instructions for building a bot that communicates with a remote process are available here.  There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:<br />
    * Planning<br />
    * Data Mining<br />
    * Machine Learning<br />
    * Case-Based Reasoning</li>
<li><a href="http://measuringmeasures.blogspot.com/2010/01/learning-about-statistical-learning.html">Measuring Measures: Learning About Statistical Learning</a> &#8211; A &quot;quick start guide&quot; for statistical and machine learning systems, good collection of references.</li>
<li><a href="http://escholarship.org/uc/item/8589j79h">Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006)</a> &#8211; Berkowitz, Steven D., Woodward, Lloyd H., &amp; Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman&rsquo;s &ldquo;Steve Berkowitz: A Network Pioneer has passed away,&rdquo; in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003</li>
<li><a href="http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/">SSH Tunneling through web filters | s-anand.net</a> &#8211; Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.</li>
<li><a href="http://github.com/msanders/pydroid">PyDroid GUI automation toolkit &#8211; GitHub</a> &#8211; What is Pydroid?
<p>Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.<br />
Why use Pydroid?</p>
<p>    * Testing a GUI application for bugs and edge cases<br />
          o You might think your app is stable, but what happens if you press that button 5000 times?<br />
    * Automating games<br />
          o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.<br />
    * Freaking out friends and family<br />
          o Well maybe this isn&#039;t really a practical use, but&#8230;</li>
<li><a href="http://robjhyndman.com/TSDL/index.htm">Time Series Data Library</a> &#8211; More data sets &#8211; &quot;This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport &amp; Tourism Tree-rings Utilities&quot;</li>
<li><a href="http://blog.textwise.com/?p=222">How informative is Twitter? &raquo; SemanticHacker Blog</a> &#8211; &quot;We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (&rdquo;tweets&rdquo;) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL.&quot;</li>
<li><a href="http://github.com/tinkerpop/gremlin">Gremlin &#8211; a Turing-complete, graph-based programming language &#8211; GitHub</a> &#8211; Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
<p>    * TinkerGraph in-memory graph<br />
    * Neo4j graph database<br />
    * Sesame 2.0 compliant RDF stores<br />
    * MongoDB document database</p>
<p>The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.</li>
<li><a href="http://www.bobhobbs.com/files/kr_lovecraft.html">The C Programming Language: 4.10 &#8211; by Kernighan &amp; Ritchie &amp; Lovecraft</a> &#8211; void Rlyeh<br />
      (int mene[], int wgah, int nagl) {<br />
      int Ia, fhtagn;<br />
      if (wgah&gt;=nagl) return;<br />
      swap (mene,wgah,(wgah+nagl)/2);<br />
      fhtagn = wgah;<br />
      for (Ia=wgah+1; Ia&lt;=nagl; Ia++)<br />
      if (mene[Ia]&lt;mene[wgah])<br />
      swap (mene,++fhtagn,Ia);<br />
      swap (mene,wgah,fhtagn);<br />
      Rlyeh (mene,wgah,fhtagn-1);<br />
      Rlyeh (mene,fhtagn+1,nagl);</p>
<p>      } // PH&#039;NGLUI MGLW&#039;NAFH CTHULHU!</li>
<li><a href="http://maxklein.posterous.com/how-to-convert-email-addresses-into-name-age">How to convert email addresses into name, age, ethnicity, sexual orientation &#8211; This is so Meta</a> &#8211; &quot;Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest.&quot;</li>
<li><a href="http://msdn.microsoft.com/en-us/security/sdl-tools-download.aspx">Microsoft Security Development Lifecycle (SDL): Tools Repository</a> &#8211; A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.</li>
<li><a href="http://analyticsx.com/">Analytics X Prize &#8211; Home</a> &#8211; Forecast the murder rate in Philadelphia &#8211; The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities.  It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods.  Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.</li>
<li><a href="http://petewarden.typepad.com/searchbrowser/2010/01/how-to-find-user-information-from-an-email-address.html">PeteSearch: How to find user information from an email address</a> &#8211; FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.</li>
<li><a href="http://measuringmeasures.blogspot.com/2010/01/beyond-pagerank-learning-with-content.html">Measuring Measures: Beyond PageRank: Learning with Content and Networks</a> &#8211; Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) &quot;way harder.&quot; Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph</li>
<li><a href="http://github.com/mikemaccana/python-docx">mikemaccana&#8217;s python-docx at master &#8211; GitHub</a> &#8211; MIT-licensed Python library to read/write Microsoft Word docx format files. &quot;The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as &#039;WordML&#039;, &#039;Office Open XML&#039; and &#039;Open XML&#039; by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office.&quot;</li>
<li><a href="http://www-h.eng.cam.ac.uk/help/tpl/unix/sed.html">Handy one-liners for SED</a> &#8211; Sed expressions are powerful, but somewhat obscure and easy to screw up. A handy cheat sheet for common tasks.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2010/01/16/bookmarks-for-june-13th-through-january-16th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My slides from the Real Time Search Panel at SES Chicago last week</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/12/14/slides-from-real-time-search-panel-at-ses-chicago/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/12/14/slides-from-real-time-search-panel-at-ses-chicago/#comments</comments>
		<pubDate>Mon, 14 Dec 2009 19:30:48 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[social software]]></category>
		<category><![CDATA[bing]]></category>
		<category><![CDATA[graph analysis]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[presentations]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[ses]]></category>
		<category><![CDATA[seschi]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsearch]]></category>
		<category><![CDATA[socialsoftware]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1498</guid>
		<description><![CDATA[<div style="width: 425px; text-align: left;">Although real time search is fairly new, as we end 2009, the ability to index and search fresh results is rapidly becoming a commodity, with Bing, various startups, and now Google all integrating status feeds from</div><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<div style="width: 425px; text-align: left;">Although real time search is fairly new, as we end 2009, the ability to index and search fresh results is rapidly becoming a commodity, with Bing, various startups, and now Google all integrating status feeds from social networking services. The next set of challenges in 2010 will be around providing better relevance, information discovery, and topic exploration for social search, using signals from the dynamic behavior of users and their interaction with the social and topic graphs.</div>
<p>
<div style="width: 425px; text-align: left;">I gave a short talk on real time and social search for a panel at SES Chicago last week. I&#8217;ve been heads down for the past few months working on Bing Twitter Search, so now that the first launch is out the door it was a nice chance to talk with people about some of the work we&#8217;re doing.  There was a lot of interest in the sentiment, trend, and social graph analysis slides (9 and 10). I will write about those in a separate post, but wanted to get the presentation up for those who have been asking about it.</div>
<p><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="What's Different about Real Time and Social Search - HJL Slides For SES Chicago Dec 09" href="http://www.slideshare.net/hojohnlee/hjl-slides-for-ses-chicago-dec-09">What&#8217;s Different about Real Time and Social Search &#8211; HJL Slides For SES Chicago Dec 09</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hjlslidesforseschicagodec09-091214012842-phpapp01&amp;stripped_title=hjl-slides-for-ses-chicago-dec-09" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hjlslidesforseschicagodec09-091214012842-phpapp01&amp;stripped_title=hjl-slides-for-ses-chicago-dec-09" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div id="__ss_2712833" style="width: 425px; text-align: left;">
<div style="font-size: 11px; font-family: tahoma,arial; height: 26px; padding-top: 2px;">View more <a style="text-decoration:underline;" href="http://www.slideshare.net/">presentations</a> from <a style="text-decoration:underline;" href="http://www.slideshare.net/hojohnlee">Ho John Lee</a>.</div>
</div>
<div style="width: 425px; text-align: left;">
<h2>What&#8217;s Different about Real Time and Social Search &#8211; HJL Slides For SES Chicago Dec 09 &#8211; Presentation Transcript</h2>
<p><!-- disable_ad_section_start(weight=0.5) --></p>
<ol>
<li>What’s different about real time and social search?<br />
Ho John Lee<br />
Principal Program Manager<br />
Bing Social Search<br />
Search Engine Strategies<br />
Chicago – December 7, 2009</li>
<li>What’s Real Time Search Good For, Anyway?</li>
<li>Twitter is Great for Watching Uninformed Panics Unfold Live<br />
…or finding balloons<br />
<a href="http://xkcd.com/574/">http://xkcd.com/574/</a></li>
<li>Some characteristics of Twitter / Social media<br />
Immediacy, Sentiment, Brevity<br />
Not always accurate<br />
Feelings, reactions, impressions<br />
Context is often essential to determine meaning<br />
Gestural &#8211; @user, #hashtag, RT, favorites, follows<br />
Self-organizing communities of attention and authority<br />
Content follows attention<br />
People talk about what others are talking about<br />
Observations and commentary from everywhere<br />
If there’s no content, you can ask for some<br />
Extreme head and tail coverage<br />
Low relevance “noise” can become “signal” in aggregate</li>
<li>Your product or brand could suddenly be at the center of a huge conversation<br />
Tiger Woods<br />
Balloon Boy<br />
Breaking Story<br />
Persistent Story<br />
Big Story<br />
Bigger Story</li>
<li>Some characteristics of Real time / Social Search
<ul>
<li>Real time and social search is qualitatively different from traditional web search</li>
<li>Differences in ranking, relevance, use model</li>
<li>Social graph, user behavior, location, event correlation and other input signals</li>
<li>Real time search is frequently about discovery, not search per se</li>
<li>“what is everyone talking about”, followed by “what are people saying about ”</li>
<li>Top real time and social search results will usually differ from top web search results</li>
</ul>
</li>
<li>Bing Twitter Search at a glance<br />
Top Tweets<br />
Top Shared Links<br />
Tweets/Sentiment per link<br />
Adult /Spam filter; Tweets/Links ranking &amp; relevance</li>
<li>Bing Fall 2009: Twitter vertical, News, MSN, Maps<br />
MSN Local Edition<br />
Page 2: Tweets or Links<br />
Page 1: Tweets &amp; Links<br />
Twitter Answer on News SERP<br />
MSN Hot Topics</li>
<li>Topic / sentiment range, volume, trend analysis<br />
What is the baseline rate of mentions / sentiment per unit time?<br />
Changes in attention flow around a subject, location, topic<br />
Watch for correlated signals from multiple sources<br />
Consider source relevance and authority as well</li>
<li>Graph analysis for relevance and ranking<br />
Spam marketing campaign<br />
Naturally connected community<br />
Spammy communities are highly visible – don’t be part of one!</li>
<li><a href="http://www.bing.com/maps/explore/#5872/style=auto&amp;lat=37.372002&amp;lon=-122.026001&amp;z=11&amp;pid=5874/5003/0.40326=s&amp;o=&amp;a=0">Bing Twitter Maps Demo</a></li>
<li>To rise above the noise, there is more to do as search gets more social<br />
Plus…</li>
<li>Thank You<br />
Ho John Lee<br />
hojohn . lee @ microsoft.com<br />
<a href="http://twitter.com/hjl">twitter.com/hjl</a></li>
</ol>
<div style="width: 425px; text-align: left;"><a href="http://www.aimclearblog.com/2009/12/07/reatime-search-sifting-relevance-from-noise/">Here&#8217;s a writeup on the real-time session from aimClear</a>, which includes a <a href="http://www.aimclearblog.com/wp-content/uploads/2009/12/realtime-search.png">Batman-camera-angle photo of me</a>.</div>
<blockquote>
<div style="width: 425px; text-align: left;">The session was moderated by <a href="http://www.webmama.com/webmama-about/barbara-coll.htm">Barbara Coll</a>, CEO, WebMama.com Inc., with panelists <a href="http://twitter.com/williamfischer">Bill Fischer</a>, Co-Founder &amp; Director, Workdigital, Ltd., <a href="http://www.novarising.com/">Rob Walk</a>, Managing Partner, NovaRising, <a href="http://www.nathanstoll.com/blog/">Nathan Stoll</a>, Co-Founder, Aardvark, and  <a href="../">Ho John Lee</a>, Principal Program Manager, Social and Real Time Search, Microsoft Bing.</div>
</blockquote>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/12/14/slides-from-real-time-search-panel-at-ses-chicago/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A last look at Twitter userbase growth (through June 2009)</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/07/13/a-last-look-at-twitter-userbase-growth-through-june-2009/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/07/13/a-last-look-at-twitter-userbase-growth-through-june-2009/#comments</comments>
		<pubDate>Tue, 14 Jul 2009 06:03:54 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Front Page]]></category>
		<category><![CDATA[socialmedia]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1484</guid>
		<description><![CDATA[<p>A number of people have been asking about updates to the <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">earlier</a> <a href="http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/">posts</a> on Twitter&#8217;s user profile population as well as some statistical analysis.  I&#8217;m <a href="http://www.hojohnlee.com/weblog/archives/2009/07/12/when-you-come-to-a-fork-in-the-road/">joining the Microsoft Bing search team</a> so I probably won&#8217;t be sharing&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>A number of people have been asking about updates to the <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">earlier</a> <a href="http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/">posts</a> on Twitter&#8217;s user profile population as well as some statistical analysis.  I&#8217;m <a href="http://www.hojohnlee.com/weblog/archives/2009/07/12/when-you-come-to-a-fork-in-the-road/">joining the Microsoft Bing search team</a> so I probably won&#8217;t be sharing as much data in the future, but I wanted to get a couple of charts out first.</p>
<p>Here&#8217;s an updated look at Twitter&#8217;s user base growth, through June 2009. This survey has many spam accounts pruned out, so the actual number of user profiles at any point in time is probably higher than the graph plotted here. Up and to the right, heading past 13M is the main takeaway. Also note that the majority of Twitter profiles have been created within the past few months. Compare with the <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">graph through May 2009</a></p>
<p><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-userbase-june09.png"><img class="aligncenter size-full wp-image-1485" title="twitter-userbase-june09" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-userbase-june09.png" alt="twitter-userbase-june09" width="570" height="336" /></a></p>
<p>Here&#8217;s the corresponding estimate of new user accounts per day. That first big spike is the Oprah show featuring Twitter.  Not sure exactly which media events go with the more recent spike, likely some combination of Ashton Kutcher vs CNN and other celebrities on a campaign to get more followers.  As a reminder, the graphs don&#8217;t really drop off at the  right edge, that&#8217;s just from new users not being discovered immediately.</p>
<p><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-userbase-rate-june09.png"><img class="aligncenter size-full wp-image-1486" title="twitter-userbase-rate-june09" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-userbase-rate-june09.png" alt="twitter-userbase-rate-june09" width="574" height="335" /></a></p>
<p>Unfortunately I probably won&#8217;t be putting together any stats visualizations here as I transition the SocialQuant work to Microsoft Bing. But  I&#8217;m looking forward to help bring some interesting applications for Twitter and other social media on the Bing platform, and hope you&#8217;ll be able to enjoy some results there in the near future.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/07/13/a-last-look-at-twitter-userbase-growth-through-june-2009/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When you come to a fork in the road&#8230;</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/07/12/when-you-come-to-a-fork-in-the-road/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/07/12/when-you-come-to-a-fork-in-the-road/#comments</comments>
		<pubDate>Mon, 13 Jul 2009 06:01:39 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Front Page]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[hjl]]></category>
		<category><![CDATA[me]]></category>
		<category><![CDATA[meta]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[realtime]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsearch]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1457</guid>
		<description><![CDATA[<p>As some of you know, I have been exploring a variety of paths forward for SocialQuant, my real time social search and analytics project. My family, friends, and colleagues have given me much support, patience, and advice during this process,&#8230;</p>]]></description>
			<content:encoded><![CDATA[<div id="attachment_1461" class="wp-caption aligncenter" style="width: 250px"><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/crossroads-IMG_6123.JPG"><img class="size-medium wp-image-1461" title="crossroads-IMG_6123" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/crossroads-IMG_6123-240x320.jpg" alt="Crossroads of the World at the Beach Bar, Waikiki" width="240" height="320" /></a><p class="wp-caption-text">Crossroads of the World at the Beach Bar, Waikiki</p></div>
<p>As some of you know, I have been exploring a variety of paths forward for SocialQuant, my real time social search and analytics project. My family, friends, and colleagues have given me much support, patience, and advice during this process, which has reached a crossroads, and as Yogi Berra says, &#8220;When you come to a fork in the road, take it!&#8221;</p>
<p>The rise of Twitter, Facebook, and other social media, combined with web-based applications, smartphones, and cloud computing have all <a href="http://www.techcrunch.com/crunchup/agenda.html">set the stage</a> for new applications and use models based on social discovery, collaboration, and communications, in addition to traditional search. What we&#8217;re all calling &#8220;real time search&#8221; lately isn&#8217;t exactly real time, nor is it exactly search, in which you find a definitive/authoritative answer. Much of the opportunity revolves around discovering people, discussions, and events that are relevant to you and bringing it to your attention in a timely, actionable fashion. Information streams from social media are transient, unreliable, and noisy. At the same time, the sheer volume of data can help provide the basis for building better filters. As an added bonus, you can ask questions to people in the social graph itself, and there are numerous examples of communities of interest forming around current events such as Barack Obama&#8217;s inauguration, the Iran elections, or even Michael Jackson&#8217;s funeral, all of which help surface information content, opinion, and sentiment that were previously inaccessible online. One interesting aspect of real time social media is that it&#8217;s not just algorithmic, it&#8217;s based on human connections and emotions. So a message  that &#8220;feels right&#8221; from people you trust can be more relevant than one that is &#8220;correct&#8221; at times.</p>
<p>The challenge then is in filtering and ranking the massive flow of information in a way that helps direct the user&#8217;s limited (and non-expanding) time and attention in a way that&#8217;s most valuable to them. With today&#8217;s information technology, amazing things are possible with limited resources. I personally have more computing and storage resources than the facility we launched HP&#8217;s original photo site with (for millions of dollars), at a fraction of the cost, routinely pushing around datasets of millions of rows on the local development servers. Unfortunately, that&#8217;s just the ante to get started on the problem. Running ranking, clustering, and semantic analysis for filtering the ever-growing stream of social media eventually requires web scale computing, even with careful problem selection and data pruning. The bar is also <a href="http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/">going up every day</a> as <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">the social media user base grows</a>, and as <a href="http://searchengineland.com/what-is-real-time-search-definitions-players-22172">well funded teams make progress on their platforms</a> (+<a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google</a>).  So very shortly, to be competitive in real time, social search and discovery is going to require access to lots of data and either <a href="http://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006">getting a datacenter</a> or working with someone who has one.</p>
<p>In my case, I have recently chosen the latter path, and will be joining the <a href="http://www.bing.com/">Microsoft Bing search</a> team, focusing on real time and social search. Microsoft itself has been showing signs of a renaissance, with <a href="http://www.nytimes.com/2009/07/09/technology/personaltech/09pogue.html?hpw">search relaunching</a>, Windows 7 looking leaner, <a href="http://www.microsoft.com/azure/windowsazure.mspx">Azure becoming non-vaporous</a>, more <a href="http://www.bing.com/developers">web APIs</a> getting published, <a href="http://www.forbes.com/2009/07/11/microsoft-cloud-computing-intelligent-technology-microsoft.html">core online applications starting to turn up</a>, and a <a href="http://www.youtube.com/watch?v=VUawhjxLS2I">cool Office 2010 video.</a> Even <a href="http://minimsft.blogspot.com/2009/07/microsoft-has-turned-corner.html">Mini-Microsoft is getting positive</a> recently. And <a href="http://dashes.com/anil/2009/07/googles-microsoft-moment.html">Google is starting to have &#8220;bigness&#8221; issues</a>.</p>
<p>I look forward to working with Sean Suchter and the Microsoft Bing search team (and likely expanding their carbon footprint) in pursuit of new applications and services as the social media and online application space evolves.</p>
<p>You can <a href="http://www.hojohnlee.com/weblog/archives/2009/02/27/why-im-not-connected-to-you-on-facebook-or-linkedin-but-do-follow-on-twitter-and-friendfeed/">follow along</a> on Twitter (<a href="http://twitter.com/hjl">@hjl</a>). As always, any and all opinions here are solely mine and do not reflect the position of any past, present, or future employer, partner, or business associate.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/07/12/when-you-come-to-a-fork-in-the-road/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Follow suggested users, attract instant spamcloud</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/07/06/follow-suggested-users-attract-instant-spamcloud/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/07/06/follow-suggested-users-attract-instant-spamcloud/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 20:01:13 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Blogging]]></category>
		<category><![CDATA[Front Page]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[spam]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1442</guid>
		<description><![CDATA[<p>Despite <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">Twitter&#8217;s amazing growth rate</a>, there is general agreement that the <a href="http://friendfeed.com/scobleizer/42398a61/interesting-analysis-of-twitter-suggested">Suggested Users List and the new user experience has shortcomings</a>. As an experiment, I created a new Twitter account. I wanted to see what the experience might&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>Despite <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">Twitter&#8217;s amazing growth rate</a>, there is general agreement that the <a href="http://friendfeed.com/scobleizer/42398a61/interesting-analysis-of-twitter-suggested">Suggested Users List and the new user experience has shortcomings</a>. As an experiment, I created a new Twitter account. I wanted to see what the experience might look like for someone interested in, but otherwise completely unfamiliar with the service. During the signup process, it automatically picks some suggested users (apparently random), which I selected all of, about a dozen or so. Then it asked for my email credentials to check for other people I know on Twitter, which I declined, since I generally don&#8217;t give web applications access to my email services. Then I went back to &#8220;Suggested Users&#8221; under the &#8220;Find People&#8221; section, and selected all of them. In total, the Suggested Users list got me up to 237 friends in my incoming stream.</p>
<p>Within a few minutes of completing this process, I already had 13 spam followers offering affiliate links for cameras, porn, and twitter followers. A day later I was up to 41 spam followers, plus 4 follow-backs from accounts I followed in addition to the Suggested Users List.</p>
<p><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-newuser-spam-090705.png"><img class="aligncenter size-full wp-image-1443" title="twitter-newuser-spam-090705" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/07/twitter-newuser-spam-090705.png" alt="twitter-newuser-spam-090705" width="551" height="424" /></a>There are two different issues here: 1) finding a set of interesting / relevant people for new users to follow, and 2) limiting the impact of spam and affiliate marketers, who appear to be scanning the follower lists of the Suggested Users to identify new accounts to spam.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/07/06/follow-suggested-users-attract-instant-spamcloud/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Twitter&#8217;s user growth per day</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/#comments</comments>
		<pubDate>Tue, 23 Jun 2009 20:35:20 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Front Page]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1440</guid>
		<description><![CDATA[<p>Here is a companion to the <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">Twitter user population growth chart</a> from last week. This  chart shows an estimate of the number of new users per day. The dashed blue bar is the 2009 US inauguration of Barack Obama,&#8230;</p>]]></description>
			<content:encoded><![CDATA[<div id="attachment_1434" class="wp-caption aligncenter" style="width: 637px"><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/06/twitter-userbase-growthrate-may09-annotated.png"><img class="size-full wp-image-1434" title="twitter-userbase-growthrate-may09-annotated" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/06/twitter-userbase-growthrate-may09-annotated.png" alt="Twitter estimated new users per day through May 2009" width="627" height="380" /></a><p class="wp-caption-text">Twitter estimated new users per day through May 2009</p></div>
<p>Here is a companion to the <a href="http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/">Twitter user population growth chart</a> from last week. This  chart shows an estimate of the number of new users per day. The dashed blue bar is the 2009 US inauguration of Barack Obama, and the extreme spike is the Oprah Winfrey show featuring Twitter.</p>
<p>The data used for this chart isn&#8217;t as complete for the last week or so at the right hand edge, i.e. the rate of new user signups hasn&#8217;t gone to zero, and in fact remains quite high, not 100k users per day, but well above the &#8220;pre-mainstream adoption&#8221; user signup rates, in the range of 30-50K users/day. As of mid June, Twitter has more than 8M user accounts that have been created.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/23/twitters-user-growth-per-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter&#8217;s amazing user growth</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/#comments</comments>
		<pubDate>Thu, 18 Jun 2009 20:01:02 +0000</pubDate>
		<dc:creator>Ho John Lee</dc:creator>
				<category><![CDATA[Front Page]]></category>
		<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1418</guid>
		<description><![CDATA[If you signed up before February 2009, you can consider yourself something of an early adopter on Twitter, and among the earliest 15% or so of the entire user population.]]></description>
			<content:encoded><![CDATA[<div id="attachment_1417" class="wp-caption aligncenter" style="width: 565px"><a href="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/06/twitter-userbase-may09-annotated.png"><img class="size-full wp-image-1417" title="twitter-userbase-may09-annotated" src="http://www.hojohnlee.com/weblog/wp-content/uploads/2009/06/twitter-userbase-may09-annotated.png" alt="Twitter estimated userbase through May 2009" width="555" height="342" /></a><p class="wp-caption-text">Twitter estimated userbase through May 2009</p></div>
<p>The graph above shows an estimate of Twitter&#8217;s user population from its launch in March 2006 through May 2009, based on a sample of around 6 million observed user profiles. The dashed blue line is around the 2009 US inauguration of Barack Obama and where the transition from early adopter to early mass audience seems to have taken off.</p>
<p>The entire user population of Twitter appears to have reached 1 million sometime in January but today there are several accounts that have over 1M followers <strong>each</strong>.</p>
<p>Stated another way, if you signed up before February 2009, you can consider yourself something of an early adopter on Twitter, and among the earliest 15% or so of the entire user population.</p>
<p>The numbers in this survey are inexact but representative, taken from research I&#8217;ve been doing for SocialQuant and FailWatch.  There is some survivor bias built in, since I&#8217;m pruning spam and suspended accounts. Only Twitter knows the true state of the user base and the social graph, of course.</p>
<p>The initial Twitter users tend to know each other more in real  life, since much of the social network grew from friends of founders, SWSX attendees, and the San Francisco / Silicon Valley tech community. The more recent (post-Obama)  arrivals tend not to have connections to those networks, and often don&#8217;t know anyone else to follow. They arrive via mass media and celebrity campaigns, and end up following mass media and celebrities, either from the suggested users list or because those are the only people they know of.</p>
<p>If you look carefully, you can see the rate of increase slows down toward the end of the graph. There was a huge ramp in  new user signups around the time of the Oprah show, which has receded somewhat. This has led to blog posts about Twitter&#8217;s impending demise, but looking back, there have been previous surges in the user base (typically around SXSW etc) which led to a peak, then a drop in new user signups to an off-peak but higher-than-before average. So far the current surge is the largest, but seems to be following the pattern. In the absence of any  new driver, user growth should continue at an off-peak but higher level, until the next big jump, or something better comes along.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/18/twitters-amazing-user-growth/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 11th through June 12th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/12/bookmarks-for-june-11th-through-june-12th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/12/bookmarks-for-june-11th-through-june-12th/#comments</comments>
		<pubDate>Fri, 12 Jun 2009 16:00:14 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[aws]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[influence]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[pagerank]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[screenscraping]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[seo]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[trend]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1411</guid>
		<description><![CDATA[<p>These are my links for June 11th through June 12th:</p>
<ul>
<li><a href="http://www.justinparks.com/using-google-search-to-find-interesting-twitter-users/">Using Google Search to find interesting Twitter Users &#124; Social Media Stuff &#124; Justin Parks</a> &#8211; Ideas for queries on Google to find Twitter users / content.</li>
<li><a href="http://blog.davidziegler.net/post/122176962/a-python-script-to-automatically-extract-excerpts-from">David Ziegler&#8217;s</a></li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 11th through June 12th:</p>
<ul>
<li><a href="http://www.justinparks.com/using-google-search-to-find-interesting-twitter-users/">Using Google Search to find interesting Twitter Users | Social Media Stuff | Justin Parks</a> &#8211; Ideas for queries on Google to find Twitter users / content.</li>
<li><a href="http://blog.davidziegler.net/post/122176962/a-python-script-to-automatically-extract-excerpts-from">David Ziegler&#8217;s Blog &#8211; A Python Script to Automatically Extract Excerpts From Articles</a> &#8211; Some notes on cleaning up web page content to extract a text excerpt of the intended content.</li>
<li><a href="http://www.scienceforseo.com/ranking-algorithms/papers-on-pagerank-you-should-read/">Papers on PageRank you should read | Science for SEO</a> &#8211; A list of interesting PageRank and search relevance ranking papers. June 2009.</li>
<li><a href="http://aicoder.blogspot.com/2009/02/tunkrank-scoring-improvement.html">aicoder: TunkRank Scoring Improvement</a> &#8211; On influence ranking models for Twitter.</li>
<li><a href="http://github.com/datawrangling/trendingtopics/tree/master">datawrangling&#8217;s trendingtopics at master &#8211; GitHub</a> &#8211; This repository contains the full source code for Trendingtopics.org, built by Data Wrangling to demonstrate how Hadoop &amp; EC2 can power a data driven website. The trend statistics and time series data that run the site are updated periodically by launching a temporary EC2 cluster running the Cloudera Hadoop Distribution. Our initial seed data consists of the raw wikipedia database content dump along with hourly traffic logs for all articles collected from the wikipedia squid proxy (curated by Domas Mituzas). We made the first 7 months of this hourly data for all articles available as an Amazon Public Dataset. The current trend calculations are run with Hadoop Streaming and Hive. The output produced by these Hadoop jobs is loaded into MySQL and indexed to power the live site.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/12/bookmarks-for-june-11th-through-june-12th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 9th through June 10th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/10/bookmarks-for-june-9th-through-june-10th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/10/bookmarks-for-june-9th-through-june-10th/#comments</comments>
		<pubDate>Thu, 11 Jun 2009 06:00:09 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[benchmarks]]></category>
		<category><![CDATA[collaboration]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[datasets]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[reference]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[tuning]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1406</guid>
		<description><![CDATA[<p>These are my links for June 9th through June 10th:</p>
<ul>
<li><a href="http://developer.yahoo.net/blogs/hadoop/2009/06/yahoo_distribution_of_hadoop.html">Announcing the Yahoo! Distribution of Hadoop (Hadoop and Distributed Computing at Yahoo!)</a> &#8211; Yahoo releases its internal version of Hadoop, a source-only distribution of Apache Hadoop tested and  used</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 9th through June 10th:</p>
<ul>
<li><a href="http://developer.yahoo.net/blogs/hadoop/2009/06/yahoo_distribution_of_hadoop.html">Announcing the Yahoo! Distribution of Hadoop (Hadoop and Distributed Computing at Yahoo!)</a> &#8211; Yahoo releases its internal version of Hadoop, a source-only distribution of Apache Hadoop tested and  used in production at Yahoo.</li>
<li><a href="http://tables.googlelabs.com/public/faq.html">Google Fusion Tables FAQ</a> &#8211; Sort of like extra-large Google Docs spreadsheets, up to 100MB per table, 250MB per user. One interesting wrinkle is that it doesn&#39;t actually delete your dataset when you &quot;delete&quot; it, so the data is still available for derived tables that other users have built.</li>
<li><a href="http://www.slideshare.net/markwkm/filesystem-performance-from-a-database-perspective">Filesystem Performance from a Database Perspective</a> &#8211; Presentation on performance benchmarks on linux filesystems (ext2, ext3, reiserfs, xfs, etc)</li>
<li><a href="http://www.slideshare.net/selenamarie/what-assumptions-make-filesystem-io-from-a-database-perspective">What Assumptions Make: Filesystem I/O from a database perspective</a> &#8211; Slide presentation comparing linux file system performance across various formats (ext2, ext3, etc), RAID configurations, readahead buffer sizes</li>
<li><a href="http://www.artfulsoftware.com/infotree/queries.php?&amp;bw=1133">MySQL &#8211; Common Queries Tree</a> &#8211; A collection of common queries implemented in MySQL</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/10/bookmarks-for-june-9th-through-june-10th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 6th through June 8th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/08/bookmarks-for-june-6th-through-june-8th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/08/bookmarks-for-june-6th-through-june-8th/#comments</comments>
		<pubDate>Mon, 08 Jun 2009 15:00:29 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[browser]]></category>
		<category><![CDATA[charting]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[flot]]></category>
		<category><![CDATA[generator]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[hi5]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jquery]]></category>
		<category><![CDATA[json]]></category>
		<category><![CDATA[latin]]></category>
		<category><![CDATA[library]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[msft]]></category>
		<category><![CDATA[myspace]]></category>
		<category><![CDATA[orkut]]></category>
		<category><![CDATA[phrases]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[socialmedia]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[text]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1402</guid>
		<description><![CDATA[<p>These are my links for June 6th through June 8th:</p>
<ul>
<li><a href="http://www.inrebus.com/latinmottogenerator.php">Latin motto generator: make your own catchy slogans!</a> &#8211; Create your own life mottos and slogans in Latin! (Learning Latin not required,  some vague idea for a desired motto</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 6th through June 8th:</p>
<ul>
<li><a href="http://www.inrebus.com/latinmottogenerator.php">Latin motto generator: make your own catchy slogans!</a> &#8211; Create your own life mottos and slogans in Latin! (Learning Latin not required,  some vague idea for a desired motto a plus)</li>
<li><a href="http://www.techcrunch.com/2009/06/07/a-map-of-social-network-dominance/">A Map Of Social (Network) Dominance</a> &#8211; Using Alexa and Google Trend data, Cosenza color-coded the map based on which social network is the most popular in each country. All of the light green countries belong to Facebook. But there are still pockets of resistance in Russia (where V Kontakte rules), China (QQ), Brazil and India (Orkut), Central America, Peru, Mongolia, and Thailand (hi5), South Korea (Cyworld), Japan (Mixi), the Middle East (Maktoob), and the Philippines (Friendster).</li>
<li><a href="http://blog.programmableweb.com/2009/06/08/microsoft-releases-bing-api-with-no-usage-quotas/">Microsoft Releases Bing API &#8211; With No Usage Quotas</a> &#8211; Updated search API, with no quotas and some improvements.<br />
    * Developers can now request data in JSON and XML formats. The SOAP interface that the Live Search API required has also been retained.<br />
    * Requested data can be narrowed to one of the following source types: web, news, images, phonebook, spell-checker, related queries, and Encarta instant answer.<br />
    * It is now possible to send requests in OpenSearch-compliant RSS format for web, news, image and phonebook queries.<br />
    * Client applications will be able to combine any number of different data source types into a single request with a single query string.</li>
<li><a href="http://verwon.com/twitter-limits-getting-ridiculous/">Twitter Limits Getting Ridiculous! &laquo; Verwon&#8217;s Blog</a> &#8211; Anecdotal reports of Twitter users running into problems with rate limiting,  either API or max posts/tweets/follows/directs.</li>
<li><a href="http://code.google.com/p/flot/">flot &#8211; Google Code</a> &#8211; Flot is a pure Javascript plotting library for jQuery. It produces graphical plots of arbitrary datasets on-the-fly client-side. The focus is on simple usage (all settings are optional), attractive looks and interactive features like zooming and mouse tracking. The plugin is known to work with Internet Explorer 6/7/8, Firefox 2.x+, Safari 3.0+, Opera 9.5+ and Konqueror 4.x+. If you find a problem, please report it. Drawing is done with the canvas tag introduced by Safari and now available on all major browsers, except Internet Explorer where the excanvas Javascript emulation helper is used.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/08/bookmarks-for-june-6th-through-june-8th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 3rd through June 4th</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/04/bookmarks-for-june-3rd-through-june-4th/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/04/bookmarks-for-june-3rd-through-june-4th/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 07:00:16 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[academic]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[authentication]]></category>
		<category><![CDATA[Business]]></category>
		<category><![CDATA[cloudcomputing]]></category>
		<category><![CDATA[course]]></category>
		<category><![CDATA[crypto]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[encryption]]></category>
		<category><![CDATA[file]]></category>
		<category><![CDATA[geek]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[Humor]]></category>
		<category><![CDATA[industry]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[market]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[msft]]></category>
		<category><![CDATA[notes]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[reference]]></category>
		<category><![CDATA[risk]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[socialmedia]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[system]]></category>
		<category><![CDATA[television]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[web]]></category>
		<category><![CDATA[webservices]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1398</guid>
		<description><![CDATA[<p>These are my links for June 3rd through June 4th:</p>
<ul>
<li><a href="http://blog.marcua.net/post/117671929/mit-database-systems-6-830-ta-course-notes">MIT Database Systems (6.830) TA Course Notes &#8211; marcua&#8217;s blog</a> &#8211; Fall 2008 course notes on database systems, spanning history of databases and data structures up through mapreduce style</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 3rd through June 4th:</p>
<ul>
<li><a href="http://blog.marcua.net/post/117671929/mit-database-systems-6-830-ta-course-notes">MIT Database Systems (6.830) TA Course Notes &#8211; marcua&#8217;s blog</a> &#8211; Fall 2008 course notes on database systems, spanning history of databases and data structures up through mapreduce style systems.</li>
<li><a href="http://news.cnet.com/8301-13860_3-10257936-56.html">Ray Ozzie&#8217;s cloud hangs over the Valley | Beyond Binary &#8211; CNET News</a> &#8211; Notes from Ray Ozzie talk on Microsoft and the cloud at Churchill Club, June 2009. Would be interesting to hear more of his thoughts on Google Wave, given his past work on Notes and Groove.</li>
<li><a href="http://www.matasano.com/log/1749/typing-the-letters-a-e-s-into-your-code-youre-doing-it-wrong/">Matasano Chargen &raquo; Blog Archive &raquo; Typing The Letters A-E-S Into Your Code? You&rsquo;re Doing It Wrong!</a> &#8211; An epic discussion of shortcomings in various common encryption implementations for web applications, presented in the form of a somewhat random screenplay.</li>
<li><a href="http://www.avc.com/a_vc/2009/06/is-twitter-a-substitute-for-set-top-box-data.html">Is Twitter A Substitute For Set Top Box Data?</a> &#8211; The post describes a little experiment Simulmedia did to analyze TV channel surfers using only Twitter posts as their data source. Simulmedia explains why they turned to Twitter data for this work:</li>
<li><a href="http://www.linux-mag.com/cache/7345/1.html">NILFS: A File System to Make SSDs Scream | Linux Magazine</a> &#8211; A log structured (as opposed to journaling) file system with some advantages for write performance and shorter crash recovery times. Good for large working file systems?</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/04/bookmarks-for-june-3rd-through-june-4th/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for June 1st through June 2nd</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/06/02/bookmarks-for-june-1st-through-june-2nd/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/06/02/bookmarks-for-june-1st-through-june-2nd/#comments</comments>
		<pubDate>Tue, 02 Jun 2009 20:00:15 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[academic]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[community]]></category>
		<category><![CDATA[culture]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[datamining]]></category>
		<category><![CDATA[demographicsl]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[economics]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jquery]]></category>
		<category><![CDATA[longtail]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[trends]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1393</guid>
		<description><![CDATA[<p>These are my links for June 1st through June 2nd:</p>
<ul>
<li><a href="http://www.jqplot.com/index.html">jqPlot &#8211; Pure Javascript Plotting</a> &#8211; jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for June 1st through June 2nd:</p>
<ul>
<li><a href="http://www.jqplot.com/index.html">jqPlot &#8211; Pure Javascript Plotting</a> &#8211; jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.</li>
<li><a href="http://blogs.harvardbusiness.org/cs/2009/06/new_twitter_research_men_follo.html">New Twitter Research: Men Follow Men and Nobody Tweets &#8211; Conversation Starter &#8211; HarvardBusiness.org</a> &#8211; &quot;Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This &quot;follower split&quot; suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%.&quot;</li>
<li><a href="http://www.shirky.com/writings/powerlaw_weblog.html">Shirky: Power Laws, Weblogs, and Inequality</a> &#8211; 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. &quot;Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. &quot;</li>
<li><a href="http://well-formed.eigenfactor.org/">well-formed.eigenfactor.org : Visualizing information flow in science</a> &#8211; Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.</li>
<li><a href="http://msdn.microsoft.com/en-us/library/dd251056.aspx">Bing API, Version 2.0</a> &#8211; Updated API documentation for Microsoft Bing (formerly Live Search) web services.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/06/02/bookmarks-for-june-1st-through-june-2nd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bookmarks for May 30th through May 31st</title>
		<link>http://www.hojohnlee.com/weblog/archives/2009/05/31/bookmarks-for-may-30th-through-may-31st/</link>
		<comments>http://www.hojohnlee.com/weblog/archives/2009/05/31/bookmarks-for-may-30th-through-may-31st/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 06:00:15 +0000</pubDate>
		<dc:creator>site admin</dc:creator>
				<category><![CDATA[Links]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[bandwidth]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[cdn]]></category>
		<category><![CDATA[creativity]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[digital]]></category>
		<category><![CDATA[discovery]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[engineering]]></category>
		<category><![CDATA[entrepreneurship]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[graphics]]></category>
		<category><![CDATA[images]]></category>
		<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[invention]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[latency]]></category>
		<category><![CDATA[learning]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[manipulation]]></category>
		<category><![CDATA[mapping]]></category>
		<category><![CDATA[media]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[Photos]]></category>
		<category><![CDATA[photoshop]]></category>
		<category><![CDATA[policy]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[psychology]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[ruby]]></category>
		<category><![CDATA[scala]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[science]]></category>
		<category><![CDATA[social]]></category>
		<category><![CDATA[socialnetworks]]></category>
		<category><![CDATA[socialsoftware]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://www.hojohnlee.com/weblog/?p=1388</guid>
		<description><![CDATA[<p>These are my links for May 30th through May 31st:</p>
<ul>
<li><a href="http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster">Scaling Twitter: Making Twitter 10000 Percent Faster &#124; High Scalability</a> &#8211; Collection of links to presentations and interviews regarding Twitter&#39;s architecture, implementation plans, and performance issues, from spring 2009.</li></ul><p>&#8230;</p>]]></description>
			<content:encoded><![CDATA[<p>These are my links for May 30th through May 31st:</p>
<ul>
<li><a href="http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster">Scaling Twitter: Making Twitter 10000 Percent Faster | High Scalability</a> &#8211; Collection of links to presentations and interviews regarding Twitter&#39;s architecture, implementation plans, and performance issues, from spring 2009.</li>
<li><a href="http://thelastpsychiatrist.com/2009/05/the_difference_between_an_amat.html">The Last Psychiatrist: The Difference Between An Amateur, A Scientist, And A Genius</a> &#8211; An amateur is full of wonder and speculation, tinkering towards the truth but suffering from a lack of knowledge and idleness; he&#39;s not even sure if someone else has already made these discoveries.  &quot;Is this a worthwhile pursuit?&quot;
<p>A scientist performs experiments to confirm or disprove a hypothesis, and in that way he grinds out the truth.</p>
<p>A genius has three abilities, which are actually the union of amateur and scientist: 1. to know the state of the art, what is known and what is not known. 2.  To be able to think &quot;out of the box&quot;.  3.  To be disciplined enough to concentrate on the tedium of a formal investigation of his wondrous speculations.</li>
<li><a href="http://www.cs.princeton.edu/gfx/pubs/Barnes_2009_PAR/index.php">PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing</a> &#8211; Research paper on sort of &quot;super healing brush&quot; for manipulating digital images, allows splicing together different sections of the image and automatically selecting similar textures to make the seam transitions work better.</li>
<li><a href="http://www.lightbluetouchpaper.org/2009/05/20/attack-of-the-zombie-photos/">Light Blue Touchpaper &raquo; Blog Archive &raquo; Attack of the Zombie Photos</a> &#8211; Social networking and sharing sites have challenges implementing and managing access control policies at large scale, and content delivery networks add another wrinkle.</li>
<li><a href="http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-locations/">Map of all Google data center locations | Royal Pingdom</a> &#8211; Where in the world is your search being served from? An attempt to assemble a list of known Google data centers worldwide.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.hojohnlee.com/weblog/archives/2009/05/31/bookmarks-for-may-30th-through-may-31st/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
