|
|
Ho John Lee | January 23rd, 2010 | Comments are closed
These are my links for January 20th through January 23rd:
- Data.gov – Featured Datasets: Open Government Directive Agency – Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act request logs, Treasury TARP and derivative activity logs, crime, income, agriculture datasets.
- All Your Twitter Bot Needs Is Love – The bot’s name? Jason Thorton. He’s been humming along for months now, sending out over 1250 tweets to some 174 followers. His tweets, while not particularly creative, manage to be both believable and timely. And he’s powered by a single word: Love.
Thorton is the creation of developer Ryan Merket, who built him as a side project in around three hours. Merket has just posted the code that powers him, and has also divulged how he made Thorton seem somewhat realistic: the bot looks for tweets with the word “love” in them and tweets them as its own.
- Building a Twitter Bot – "Meet Jason Thorton. To people who know Jason, he is a successful entrepreneur in San Francisco who tweets 4-5 times a day. But Jason has a secret, he’s not really a human, he’s the product of my simple algorithm in PHP
Jason tweets A LOT about the word “love” – that’s because Jason actually steals tweets from the public timeline that contain the word “love” and posts them as his own
Jason also @replies to people who use the word “love” in their tweets, and asks them random questions or says something arbitrary
It took me about 3 hours to code Jason, imagine what a real engineer could do with real AI algorithms? Now realize that it’s already a reality. Sites like Twitter are full of side projects, company initiatives, spambots and AI robots. When the free flow of information becomes open, the amount of disinformation increases. Theres a real need for someone to vet the people we ‘meet’ on social sites – will be interesting to see how this market grows in the next year
- Website monitoring status – Public API Status – Health monitor for 26 APIs from popular Web services, including Google Search, Google Maps, Bing, Facebook, Twitter, SalesForce, YouTube, Amazon, eBay and others
- PG&E Electrical System Outage Map – This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.
site admin | June 8th, 2009 | Comments are closed
These are my links for June 6th through June 8th:
- Latin motto generator: make your own catchy slogans! – Create your own life mottos and slogans in Latin! (Learning Latin not required, some vague idea for a desired motto a plus)
- A Map Of Social (Network) Dominance – Using Alexa and Google Trend data, Cosenza color-coded the map based on which social network is the most popular in each country. All of the light green countries belong to Facebook. But there are still pockets of resistance in Russia (where V Kontakte rules), China (QQ), Brazil and India (Orkut), Central America, Peru, Mongolia, and Thailand (hi5), South Korea (Cyworld), Japan (Mixi), the Middle East (Maktoob), and the Philippines (Friendster).
- Microsoft Releases Bing API – With No Usage Quotas – Updated search API, with no quotas and some improvements.
* Developers can now request data in JSON and XML formats. The SOAP interface that the Live Search API required has also been retained.
* Requested data can be narrowed to one of the following source types: web, news, images, phonebook, spell-checker, related queries, and Encarta instant answer.
* It is now possible to send requests in OpenSearch-compliant RSS format for web, news, image and phonebook queries.
* Client applications will be able to combine any number of different data source types into a single request with a single query string.
- Twitter Limits Getting Ridiculous! « Verwon’s Blog – Anecdotal reports of Twitter users running into problems with rate limiting, either API or max posts/tweets/follows/directs.
- flot – Google Code – Flot is a pure Javascript plotting library for jQuery. It produces graphical plots of arbitrary datasets on-the-fly client-side. The focus is on simple usage (all settings are optional), attractive looks and interactive features like zooming and mouse tracking. The plugin is known to work with Internet Explorer 6/7/8, Firefox 2.x+, Safari 3.0+, Opera 9.5+ and Konqueror 4.x+. If you find a problem, please report it. Drawing is done with the canvas tag introduced by Safari and now available on all major browsers, except Internet Explorer where the excanvas Javascript emulation helper is used.
site admin | June 4th, 2009 | Comments are closed
These are my links for June 3rd through June 4th:
site admin | June 2nd, 2009 | Comments are closed
These are my links for June 1st through June 2nd:
- jqPlot – Pure Javascript Plotting – jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.
- New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – HarvardBusiness.org – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
- Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
- well-formed.eigenfactor.org : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
- Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.
site admin | May 31st, 2009 | Comments are closed
These are my links for May 30th through May 31st:
- Scaling Twitter: Making Twitter 10000 Percent Faster | High Scalability – Collection of links to presentations and interviews regarding Twitter's architecture, implementation plans, and performance issues, from spring 2009.
- The Last Psychiatrist: The Difference Between An Amateur, A Scientist, And A Genius – An amateur is full of wonder and speculation, tinkering towards the truth but suffering from a lack of knowledge and idleness; he's not even sure if someone else has already made these discoveries. "Is this a worthwhile pursuit?"
A scientist performs experiments to confirm or disprove a hypothesis, and in that way he grinds out the truth.
A genius has three abilities, which are actually the union of amateur and scientist: 1. to know the state of the art, what is known and what is not known. 2. To be able to think "out of the box". 3. To be disciplined enough to concentrate on the tedium of a formal investigation of his wondrous speculations.
- PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing – Research paper on sort of "super healing brush" for manipulating digital images, allows splicing together different sections of the image and automatically selecting similar textures to make the seam transitions work better.
- Light Blue Touchpaper » Blog Archive » Attack of the Zombie Photos – Social networking and sharing sites have challenges implementing and managing access control policies at large scale, and content delivery networks add another wrinkle.
- Map of all Google data center locations | Royal Pingdom – Where in the world is your search being served from? An attempt to assemble a list of known Google data centers worldwide.
site admin | May 5th, 2009 | Comments are closed
These are my links for May 4th through May 5th:
- Influential Nodes in a Diffusion Model for Social Networks (icalp05-inf.pdf) – Kempe, Kleinberg, Tardos. Algorithm for greedy approximation of most influential nodes in social network (63% of optimal) under various conditions.
- Maximizing the Spread of Influence through a Social Network (kdd03-inf.pdf) – Kempe, Kleinberg, Tardos. Maximizing propagation by selecting most influential nodes is NP-hard, but a greedy approximation can work well (63% of optimal) under various conditions.
- Notification Strategies for Social Networks – Discussion on approaches to maximizing use of a limited number of notifications within social networks e.g. Facebook
- James Smith • loopj.com » Blog Archive » jQuery Plugin: Tokenizing Autocomplete Text Entry – Looks handy – "This is a jQuery plugin to allow users to select multiple items from a predefined list, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook."
- Google Code FAQ – Using cURL to interact with Google data services – Step by step tutorial on using curl with Google data APIs.
- Behind The Business Plan Of Pirates Inc. : NPR – It takes around $250K to fund a Somali pirate operation. About 20 percent goes to pay off officials who look the other way. About 50 percent is for expenses and payroll. The leader of an attack makes $10,000 to $20,000 (the average Somali family lives on $500 a year). The initial investor — who put in $250,000 of seed capital — gets 30 percent, sometimes up to $500,000.
site admin | April 29th, 2009 | Comments are closed
These are my links for April 28th through April 29th:
- With YQL Execute, the Internet becomes your database (Yahoo! Developer Network Blog) – Use Yahoo to query and assemble data from around the internet, manipulate resulting XML recordsets with server side Javascript.
- Glimmer: a jQuery Interactive Design Tool – Articles – MIX Online – "Makes jQuery accessible through a visual tool. The objective for Glimmer is pretty simple: to enable the power of jQuery through an interactive design surface. If jQuery is the "write less, do more” JavaScript library, then Glimmer is the “write none, do more” jQuery design tool. Glimmer has three core audiences: power users, designers and developers."
- Inside Facebook Reports: Why Hasn’t Facebook Grown More in China? – A look at Chinese consumer internet and social media usage, QQ, 51, Xiaonei, Kaixin, and some reasons why there are only around 300,000 Facebook users in China today.
- Facebook maps the swine flu hysteria | The Web Services Report – CNET News – Visualizing interest in swine flu by mapping percentages of mentions on Facebook wall pages, using data from Lexicon.
- Develop Twitter API application in django and deploy on Google App Engine — The Uswaretech Blog – Django Web Development – Walkthrough of a sample Twitter application on Google App Engine, using Django/Python.
site admin | April 17th, 2009 | Comments are closed
These are my links for April 15th through April 17th:
- Paul Buchheit: Make your site faster and cheaper to operate in one easy step – Compress text files with gzip to reduce file size/bandwidth, the incremental cpu cost is usually low relative to the performance gain from lower network cost. Friendfeed uses nginx in front of main web servers for this.
- Jabbify – Free Comet web service and browser client for simple chat and streaming status applications.
- TinEye Image Search Engine – Idée Inc. – The Visual Search Company – Finds references to images online, starting with an original image. Attempts to use image analysis to be independent of scaling, cropping, and other common manipulations.
- All That Twitters Isn’t Gold: A Popular Web Application in Search of a Business Plan – Knowledge@Wharton – Business school take on Twitter and high growth, non-revenue consumer web startups.
- Almost Viral: A Hybrid Acquisition Strategy – "By being almost viral you can grow very cheaply, control your rate of growth and demographics, and get enough traffic to conduct meaningful experiments. Need to grow more slowly? Just decrease your daily ad spend. Need statistically significant results more quickly? Increase your daily ad spend. With a viral coefficient of 0.9 you’ve dealt with your acquisition risk. Rather than going fully viral and dealing with the operational difficulties, it might be worth your time to deal with other market risks: retention, engagement, and monetization. "
site admin | March 8th, 2009 | Comments are closed
These are my links for March 6th through March 8th:
- Wolfram Blog : Wolfram|Alpha Is Coming! –
- Wolfram Alpha is Coming — and It Could be as Important as Google | Twine –
- Wolfram Alpha — it’s like plugging into an electronic brain » VentureBeat –
- If browsers were women – Sharenator.org – "[Chrome] Extremely skinny, but very cool and friendly. However, when it comes to the bedroom, she is very inexperienced and has little to offer. [IE] For most, she's the first woman they tried. She's really easy but can get you infected." etc etc
- Rough Type: Nicholas Carr’s Blog: The coming of the megacomputer – Nick Carr commentary on Rick Rashid's statement that 20% of servers were going to major cloud data centers. Also some interesting discussion in comments.
- FT.com | Tech Blog | How many computers does the world need? – According to Microsoft research chief Rick Rashid, around 20 per cent of all the servers sold around the world each year are now being bought by a small handful of internet companies – he named Microsoft, Google, Yahoo and Amazon.
- The New Hot Cuisine: Korean – WSJ.com – Korean food is slowly making its way into mainstream awareness, both high end (French Laundry, Le Bernardin) and everyday (CPK, Kogi BBQ).
- WriteOnIt – Fake pictures – Build fake magazine covers, newspapers, and photos.
site admin | February 28th, 2009 | Comments are closed
These are my links for February 27th through February 28th:
site admin | February 26th, 2009 | Comments are closed
These are my links for February 26th from 10:39 to 20:05:
site admin | February 26th, 2009 | Comments are closed
These are my links for February 25th through February 26th:
site admin | February 21st, 2009 | Comments are closed
These are my links for February 21st from 13:59 to 21:55:
- Non Sequitur — Gocomics.com – "Hi. My name is Bob, and I'm a Twitter addict…"
- A Tutorial on Support Vector Machines for Pattern Recognition – Christopher J.C. Burges (PDF) – Appeared in: Data Mining and Knowledge Discovery 2, 121-167, 1998. The tutorial starts with an overview of the concepts of VC dimension and structural risk
minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable
data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss
when SVM solutions are unique and when they are global. We describe how support vector training can
be practically implemented, and discuss in detail the kernel mapping technique which is used to construct
SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large
(even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian
radial basis function kernels. While very high VC dimension would normally bode ill for generalization
performance, there are several arguments which support the observed high accuracy of SVMs,
which we review.
- Data Mining Research – dataminingblog.com: Data Miners on Twitter – A list of data mining people on twitter.
- YouTube – The Crisis of Credit Visualized – Part 1 – Nice animated video attempting to present a simplified explanation of the credit crisis and the relationship between home mortgage lending, bank leverage, and risk.
- “10 Obstacles to Cloud Computing” by UC Berkeley & How GoGrid Hurdles Them | GoGrid Blog – Another commentary on the recent UCB cloud computing overview paper
site admin | February 21st, 2009 | Comments are closed
These are my links for February 20th through February 21st:
- xkcd – A Webcomic – Online Communities – A map of online communities (circa 2007?)
- State of OpenSocial – weekend Apps Feb 20 2009 – Google Docs – Kevin Marks overview of OpenSocial as of February 2009.
- Massive Scrape of Twitter’s Friend Graph « blog.infochimps.org – Sample dataset for research on social graphs. "The infochimps have gathered a massive scrape of the Twitter friend graph. Right now it weighs in at about 2.7M users, 10M tweets, 58M edges."
- getting theinfo: data sets (theinfo) – Another list of publicly accessible data collections online
- Some Datasets Available on the Web » Data Wrangling Blog – List of many research datasets and resources related to data analysis available online, last updated February 2009.
- ICWSM 2009 – International AAAI Conference on Weblogs and Social Media – May 17 – 20, 2009, San Jose, California. This interdisciplinary conference brings together researchers and industry leaders interested in creating and analyzing social media. Past conferences have included technical papers from areas such as computer science, linguistics, psychology, statistics, sociology, multimedia and semantic web technologies.
Ho John Lee | July 26th, 2006 | 2 comments
This evening I’m getting slow response or connection timeouts from Google for the past half hour or so (20:30 – 21:00 PDT). Usually this means that the local network is having problems, but other major sites (Yahoo, CNN) are running as quickly as ever, along with various SSH sessions around the world, so it seems to be specific to Google.
So far I get slow or no response from the main search page, Gmail, Adsense, Adwords, Analytics, and Finance.
Pages that do respond are coming back in 10+ seconds, and some pages are loading without graphics or with templates only and no content.
Anyone else seeing these problems? This is the first time I’ve seen Google unusable for more than a minute or two. (Unlike this site, which has been bouncing up and down due to problems at Dreamhost lately).
Ho John Lee | December 12th, 2005 | 4 comments
Wow! Although the basic idea is straightforward, crawling and indexing for a general purpose search engine requires huge resources. Web crawlers are effectively downloading copies of the entire internet over and over, turning them over to indexing applications which scan the contents for structure and meaning.
The sheer scale of the task is a substantial barrier to entry for anyone wanting to develop a new indexing or retrieval application. Some projects have narrowed the problem domain, which can reduce the problem scope to a manageable level, but this announcement from Alexa looks like it may offer an exciting alternative for building new search applications.
John Batelle writes:
Alexa, an Amazon-owned search company started by Bruce Gilliat and Brewster Kahle (and the spider that fuels the Internet Archive), is going to offer its index up to anyone who wants it (details are not up yet, but soon). Alexa has about 5 billion documents in its index – about 100 terabytes of data.
…
Anyone can also use Alexa’s servers and processing power to mine its index to discover things – perhaps, to outsource the crawl needed to create a vertical search engine, for example. Or maybe to build new kinds of search engines entirely, or …well, whatever creative folks can dream up. And then, anyone can run that new service on Alexa’s (er…Amazon’s) platform, should they wish.
The service will be priced on a usage basis: $1 per CPU hour, $1 per GB stored or uploaded, $1 per 50GB data processed.
There’s no announcement posted on the Alexa or Amazon sites yet, it’s apparently due out overnight. (Updated 12-13-2005 00:25 – the site is up now)
Not every search and retrieval application is necessarily going to fit onto the way Alexa has built their crawler and indexing infrastructure, or onto any other search engine platform, for that matter. But opening up access to more of the platform should make it possible for a lot of new ideas to be tried out quickly without having to build yet another crawler for each project. Up to this point, many search ideas can’t be evaluated without working at one of the major search engines. I suspect most development teams would prefer to get access to Google’s crawl and index data, but I’m certainly looking forward to seeing what’s available at Alexa when they get their documentation online in the morning.
More from Om Malik, TechCrunch, ReadWrite Web
Ho John Lee | November 3rd, 2005 | 1 comment

I came across a cryptic link to mturk.com on supr.c.ilio.us, asking “Isn’t that how the Matrix came to be?”
Amazon Mechanical Turk provides a web services API for computers to integrate “artificial, artificial intelligence” directly into their processing by making requests of humans. Developers use the Amazon Mechanical Turk web services API to submit tasks to the Amazon Mechanical Turk web site, approve completed tasks, and incorporate the answers into their software applications. To the application, the transaction looks very much like any remote procedure call: the application sends the request, and the service returns the results. In reality, a network of humans fuels this artificial, artificial intelligence by coming to the web site, searching for and completing tasks, and receiving payment for their work.
All software developers need to do is write normal code. The pseudo code below illustrates how simple this can be.
read (photo);
photoContainsHuman = callMechanicalTurk(photo);
if (photoContainsHuman == TRUE) {
acceptPhoto;
}
else {
rejectPhoto;
}
Given the source of the link, I was a little skeptical at first read, but it appears to be a legitimate beta project that just launched yesterday at Amazon. At least, the documentation links point back into Amazon Web Services, and at least one person seems to know someone there.
This is an interesting idea that should find some useful applications. Spammers have supposedly been doing something like this to defeat the image-based Turing tests used to screen comment posting systems, offering access to porn in exchange for solving the puzzles, and there are other anecdotes of using low cost offshore labor for similar tasks. Having a simpler web service interface for finding a human key operator somewhere will probably allow smaller and more experimental applications to emerge.
Update 11-04-2005 08:09 PST – Slashdot, TechDirt, Google Blogoscoped on Mechanical Turk, pointer to BoingBoing on porn puzzles and spam, captcha.net
|
|