|
|
Ho John Lee | January 23rd, 2010 | Comments are closed
These are my links for January 20th through January 23rd:
- Data.gov – Featured Datasets: Open Government Directive Agency – Datasets required under the Open Government Directive through the end of the day, January 22, 2010. Freedom of Information Act request logs, Treasury TARP and derivative activity logs, crime, income, agriculture datasets.
- All Your Twitter Bot Needs Is Love – The bot’s name? Jason Thorton. He’s been humming along for months now, sending out over 1250 tweets to some 174 followers. His tweets, while not particularly creative, manage to be both believable and timely. And he’s powered by a single word: Love.
Thorton is the creation of developer Ryan Merket, who built him as a side project in around three hours. Merket has just posted the code that powers him, and has also divulged how he made Thorton seem somewhat realistic: the bot looks for tweets with the word “love” in them and tweets them as its own.
- Building a Twitter Bot – "Meet Jason Thorton. To people who know Jason, he is a successful entrepreneur in San Francisco who tweets 4-5 times a day. But Jason has a secret, he’s not really a human, he’s the product of my simple algorithm in PHP
Jason tweets A LOT about the word “love” – that’s because Jason actually steals tweets from the public timeline that contain the word “love” and posts them as his own
Jason also @replies to people who use the word “love” in their tweets, and asks them random questions or says something arbitrary
It took me about 3 hours to code Jason, imagine what a real engineer could do with real AI algorithms? Now realize that it’s already a reality. Sites like Twitter are full of side projects, company initiatives, spambots and AI robots. When the free flow of information becomes open, the amount of disinformation increases. Theres a real need for someone to vet the people we ‘meet’ on social sites – will be interesting to see how this market grows in the next year
- Website monitoring status – Public API Status – Health monitor for 26 APIs from popular Web services, including Google Search, Google Maps, Bing, Facebook, Twitter, SalesForce, YouTube, Amazon, eBay and others
- PG&E Electrical System Outage Map – This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.
Ho John Lee | January 20th, 2010 | Comments are closed
These are my links for January 17th through January 20th:
- PG&E Electrical System Outage Map – This map shows the current outages in our 70,000-square-mile service area. To see more details about an outage, including the cause and estimated time of restoration, click on the color-coded icon associated with that outage.
- Twitter.com vs The Twitter Ecosystem – Fred Wilson comments on some data from John Borthwick indicating Twitter ecosystem use = 3-5x Twitter.com directly.
"John's chart estimates that Twitter.com is about 20mm uvs a month in the US (comScore has it at 60mm uvs worldwide) and the Twitter ecosystem at about 60mm uvs in the US.
That says that across all web services, not just AVC, the Twitter ecosystem is about 3x Twitter.com. And on this blog, whose audience is certainly power users, that ratio is 5x."
- Chris Walshaw :: Research :: Partition Archive – Welcome to the University of Greenwich Graph Partitioning Archive. The archive consists of the best partitions found to date for a range of graphs and its aim is to provide a benchmark, against which partitioning algorithms can be tested, and a resource for experimentation.
The partition archive has been in operation since the year 2000 and includes results from most of the major graph partitioning software packages. Researchers developing experimental partitioning algorithms regularly submit new partitions for possible inclusion.
Most of the test graphs arise from typical partitioning applications, although the archive also includes results computed for a graph-colouring test suite [Wal04] contained in a separate annex.
The archive was originally set up as part of a research project into very high quality partitions and authors wishing to refer to the partitioning archive should cite the paper [SWC04].
- Twitter’s Crawl « The Product Guy – "A list of incidents that affected the Page Load Time of the Twitter product, distinguishing between total downtime, and partial downtime and information inaccessibility, based upon the public posts on Twitters blog.
http://status.twitter.com/archive
I did my best to not double count any problems, but it was difficult since many of the problems occur so frequently, and it is often difficult to distinguish, from these status blog posts alone, between a persisting problem being experienced or fixed, from that of a new emergence of a similar or same problem. Furthermore, I also excluded the impact on Page Load Time arising from scheduled maintenance/downtime – periods of time over which the user expectation would be most aligned with the product’s promise of Page Load Time. "
- Soundboard.com – Soundboard.com is the web's largest catalog of free sounds and soundboards – in over 20 categories, for mobile or PC. 252,858 free sounds on 17,171 soundboards from movies to sports, sound effects, television, celebrities, history and travel. Or build, customize, embed and manage your own
Ho John Lee | January 18th, 2010 | Comments are closed
These are my links for December 31st through January 17th:
- Khan Academy – The Khan Academy is a not-for-profit organization with the mission of providing a high quality education to anyone, anywhere.
We have 1000+ videos on YouTube covering everything from basic arithmetic and algebra to differential equations, physics, chemistry, biology and finance which have been recorded by Salman Khan.
- StarCraft AI Competition | Expressive Intelligence Studio – AI bot warfare competition using a hacked API to run StarCraft, will be held at AIIDE2010 in October 2010.
The competition will use StarCraft Brood War 1.16.1. Bots for StarCraft can be developed using the Broodwar API, which provides hooks into StarCraft and enables the development of custom AI for StarCraft. A C++ interface enables developers to query the current state of the game and issue orders to units. An introduction to the Broodwar API is available here. Instructions for building a bot that communicates with a remote process are available here. There is also a Forum. We encourage submission of bots that make use of advanced AI techniques. Some ideas are:
* Planning
* Data Mining
* Machine Learning
* Case-Based Reasoning
- Measuring Measures: Learning About Statistical Learning – A "quick start guide" for statistical and machine learning systems, good collection of references.
- Berkowitz et al : The use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems (2006) – Berkowitz, Steven D., Woodward, Lloyd H., & Woodward, Caitlin. (2006). Use of formal methods to map, analyze and interpret hawala and terrorist-related alternative remittance systems. Originally intended for publication in updating the 1988 volume, eds., Wellman and Berkowitz, Social Structures: A Network Approach (Cambridge University Press). Steve died in November, 2003. See Barry Wellman’s “Steve Berkowitz: A Network Pioneer has passed away,” in Connections 25(2), 2003. It has not been possible to add the updating of references or of the quality of graphics that might have been possible if Berkowitz were alive. An early version of the article appeared in the Proceedings of the Session on Combating Terrorist Networks: Current Research in Social Network Analysis for the New War Fighting Environment. 8th International Command and Control Research and Technology Symposium. National Defense University, Washington, D.C June 17-19, 2003
- SSH Tunneling through web filters | s-anand.net – Step by step tutorial on using Putty and an EC2 instance to set up a private web proxy on demand.
- PyDroid GUI automation toolkit – GitHub – What is Pydroid?
Pydroid is a simple toolkit for automating and scripting repetitive tasks, especially those involving a GUI, with Python. It includes functions for controlling the mouse and keyboard, finding colors and bitmaps on-screen, as well as displaying cross-platform alerts.
Why use Pydroid?
* Testing a GUI application for bugs and edge cases
o You might think your app is stable, but what happens if you press that button 5000 times?
* Automating games
o Writing a script to beat that crappy flash game can be so much more gratifying than spending hours playing it yourself.
* Freaking out friends and family
o Well maybe this isn't really a practical use, but…
- Time Series Data Library – More data sets – "This is a collection of about 800 time series drawn from many different fields.Agriculture Chemistry Crime Demography Ecology Finance Health Hydrology Industry Labour Market Macro-Economics Meteorology Micro-Economics Miscellaneous Physics Production Sales Simulated series Sport Transport & Tourism Tree-rings Utilities"
- How informative is Twitter? » SemanticHacker Blog – "We undertook a small study to characterize the different types of messages that can be found on Twitter. We downloaded a sample of tweets over a two-week period using the Twitter streaming API. This resulted in a corpus of 8.9 million messages (”tweets”) posted by 2.6 million unique users. About 2.7 million of these tweets, or 31%, were replies to a tweet posted by another user, while half a million (6%) were retweets. Almost 2 million (22%) of the messages contained a URL."
- Gremlin – a Turing-complete, graph-based programming language – GitHub – Gremlin is a Turing-complete, graph-based programming language developed in Java 1.6+ for key/value-pair multi-relational graphs known as property graphs. Gremlin makes extensive use of the XPath 1.0 language to support complex graph traversals. This language has applications in the areas of graph query, analysis, and manipulation. Connectors exist for the following data management systems:
* TinkerGraph in-memory graph
* Neo4j graph database
* Sesame 2.0 compliant RDF stores
* MongoDB document database
The documentation for Gremlin can be found at this location. Finally, please visit TinkerPop for other software products.
- The C Programming Language: 4.10 – by Kernighan & Ritchie & Lovecraft – void Rlyeh
(int mene[], int wgah, int nagl) {
int Ia, fhtagn;
if (wgah>=nagl) return;
swap (mene,wgah,(wgah+nagl)/2);
fhtagn = wgah;
for (Ia=wgah+1; Ia<=nagl; Ia++)
if (mene[Ia]<mene[wgah])
swap (mene,++fhtagn,Ia);
swap (mene,wgah,fhtagn);
Rlyeh (mene,wgah,fhtagn-1);
Rlyeh (mene,fhtagn+1,nagl);
} // PH'NGLUI MGLW'NAFH CTHULHU!
- How to convert email addresses into name, age, ethnicity, sexual orientation – This is so Meta – "Save your email list as a CSV file (just comma separate those email addresses). Upload this file to your facebook account as if you wanted to add them as friends. Voila, facebook will give you all the profiles of all those users (in my test, about 80% of my email lists have facebook profiles). Now, click through each profile, and because of the new default facebook settings, which makes all information public, about 95% of the user info is available for you to harvest."
- Microsoft Security Development Lifecycle (SDL): Tools Repository – A collection of previously internal-only security tools from Microsoft, including anti-xss, fuzz test, fxcop, threat modeling, binscope, now available for free download.
- Analytics X Prize – Home – Forecast the murder rate in Philadelphia – The Analytics X Prize is an ongoing contest to apply analytics, modeling, and statistics to solve the social problems that affect our cities. It combines the fields of statistics, mathematics, and social science to understand the root causes of dysfunction in our neighborhoods. Understanding these relationships and discovering the most highly correlated variables allows us to deploy our limited resources more effectively and target the variables that will have the greatest positive impact on improvement.
- PeteSearch: How to find user information from an email address – FindByEmail code released as open-source. You pass it an email address, and it queries 11 different public APIs to discover what information those services have on the user with that email address.
- Measuring Measures: Beyond PageRank: Learning with Content and Networks – Conclusion: learning based on content and network data is the current state of the art There is a great paper and talk about personalization in Google News they use content for this purpose, and then user click streams to provide personalization, i.e. recommend specific articles within each topical cluster. The issue is content filtering is typically (as we say in research) "way harder." Suppose you have a social graph, a bunch of documents, and you know that some users in the social graph like some documents, and you want to recommend other documents that you think they will like. Using approaches based on Networks, you might consider clustering users based on co-visitaion (they have co-liked some of the documents). This scales great, and it internationalizes great. If you start extracting features from the documents themselves, then what you build for English may not work as well for the Chinese market. In addition, there is far more data in the text than there is in the social graph
- mikemaccana’s python-docx at master – GitHub – MIT-licensed Python library to read/write Microsoft Word docx format files. "The docx module reads and writes Microsoft Office Word 2007 docx files. These are referred to as 'WordML', 'Office Open XML' and 'Open XML' by Microsoft. They can be opened in Microsoft Office 2007, Microsoft Mac Office 2008, OpenOffice.org 2.2, and Apple iWork 08. The module was created when I was looking for a Python support for MS Word .doc files, but could only find various hacks involving COM automation, calling .net or Java, or automating OpenOffice or MS Office."
site admin | June 2nd, 2009 | Comments are closed
These are my links for June 1st through June 2nd:
- jqPlot – Pure Javascript Plotting – jqPlot is a plotting plugin for the jQuery Javascript framework. jqPlot produces beautiful line and bar charts with many features including: Numerous chart style options. Date axes with customizable formatting. Rotated axis text. Automatic trend line computation. Tooltips and data point highlighting. Sensible defaults for ease of use.
- New Twitter Research: Men Follow Men and Nobody Tweets – Conversation Starter – HarvardBusiness.org – "Although men and women follow a similar number of Twitter users, men have 15% more followers than women. Men also have more reciprocated relationships, in which two users follow each other. This "follower split" suggests that women are driven less by followers than men, or have more stringent thresholds for reciprocating relationships. This is intriguing, especially given that females hold a slight majority on Twitter: we found that men comprise 45% of Twitter users, while women represent 55%."
- Shirky: Power Laws, Weblogs, and Inequality – 2003 article on popularity / traffic on blogs, which was then the latest emerging social media format. "Once a power law distribution exists, it can take on a certain amount of homeostasis, the tendency of a system to retain its form even against external pressures. Is the weblog world such a system? Are there people who are as talented or deserving as the current stars, but who are not getting anything like the traffic? Doubtless. Will this problem get worse in the future? Yes. "
- well-formed.eigenfactor.org : Visualizing information flow in science – Some nice visualization ideas using hierarchical clustering to explore patterns in citation networks.
- Bing API, Version 2.0 – Updated API documentation for Microsoft Bing (formerly Live Search) web services.
site admin | May 31st, 2009 | Comments are closed
These are my links for May 30th through May 31st:
- Scaling Twitter: Making Twitter 10000 Percent Faster | High Scalability – Collection of links to presentations and interviews regarding Twitter's architecture, implementation plans, and performance issues, from spring 2009.
- The Last Psychiatrist: The Difference Between An Amateur, A Scientist, And A Genius – An amateur is full of wonder and speculation, tinkering towards the truth but suffering from a lack of knowledge and idleness; he's not even sure if someone else has already made these discoveries. "Is this a worthwhile pursuit?"
A scientist performs experiments to confirm or disprove a hypothesis, and in that way he grinds out the truth.
A genius has three abilities, which are actually the union of amateur and scientist: 1. to know the state of the art, what is known and what is not known. 2. To be able to think "out of the box". 3. To be disciplined enough to concentrate on the tedium of a formal investigation of his wondrous speculations.
- PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing – Research paper on sort of "super healing brush" for manipulating digital images, allows splicing together different sections of the image and automatically selecting similar textures to make the seam transitions work better.
- Light Blue Touchpaper » Blog Archive » Attack of the Zombie Photos – Social networking and sharing sites have challenges implementing and managing access control policies at large scale, and content delivery networks add another wrinkle.
- Map of all Google data center locations | Royal Pingdom – Where in the world is your search being served from? An attempt to assemble a list of known Google data centers worldwide.
site admin | May 21st, 2009 | Comments are closed
These are my links for May 21st from 06:07 to 22:34:
site admin | May 12th, 2009 | Comments are closed
These are my links for May 8th through May 12th:
site admin | May 5th, 2009 | Comments are closed
These are my links for May 4th through May 5th:
- Influential Nodes in a Diffusion Model for Social Networks (icalp05-inf.pdf) – Kempe, Kleinberg, Tardos. Algorithm for greedy approximation of most influential nodes in social network (63% of optimal) under various conditions.
- Maximizing the Spread of Influence through a Social Network (kdd03-inf.pdf) – Kempe, Kleinberg, Tardos. Maximizing propagation by selecting most influential nodes is NP-hard, but a greedy approximation can work well (63% of optimal) under various conditions.
- Notification Strategies for Social Networks – Discussion on approaches to maximizing use of a limited number of notifications within social networks e.g. Facebook
- James Smith • loopj.com » Blog Archive » jQuery Plugin: Tokenizing Autocomplete Text Entry – Looks handy – "This is a jQuery plugin to allow users to select multiple items from a predefined list, using autocompletion as they type to find each item. You may have seen a similar type of text entry when filling in the recipients field sending messages on facebook."
- Google Code FAQ – Using cURL to interact with Google data services – Step by step tutorial on using curl with Google data APIs.
- Behind The Business Plan Of Pirates Inc. : NPR – It takes around $250K to fund a Somali pirate operation. About 20 percent goes to pay off officials who look the other way. About 50 percent is for expenses and payroll. The leader of an attack makes $10,000 to $20,000 (the average Somali family lives on $500 a year). The initial investor — who put in $250,000 of seed capital — gets 30 percent, sometimes up to $500,000.
site admin | April 30th, 2009 | Comments are closed
These are my links for April 30th from 05:57 to 07:10:
- SIGUSR2 > The Power That is GNU Emacs – "If you've never been convinced before that Emacs is the text editor in which dreams are made from, or that inside Emacs there are unicorns manipulating your text, don't expect me to convince you."
- How To Be A Successful Evil Overlord – 100 remedies for the fatal flaws exhibited by famous evil overlords of the past. Also some business executives, I think.
- Google Could Have Caught Swine Flu Early | Wired Science – Google’s search data may have been able to provide an early warning of the swine flu outbreak — if the company had been looking in the right place. Last week, at the request of the Centers for Disease Control, Google took a retroactive look at its search data from Mexico. And there the team found a pre-media bump in telltale flu-related search terms (you know, “influenza + phlegm + coughing”) that was inconsistent with standard, seasonal flu trends.
- What Twitter Looks Like For Twitter Employees (SCREENSHOTS) – Some screen shots of current admin tools at Twitter for managing user accounts, blocks, whitelisting, suspensions, and user stats such as # follow attempts, # updates, #directs, etc
- Twitter Aggregator Sawhorse Media Raises Seed Round, Launches Pets, Celeb Sites | paidContent.org – "Channelized" feeds from curated lists of twitter sources.
site admin | April 28th, 2009 | Comments are closed
These are my links for April 28th from 05:35 to 14:24:
- Official Google Blog: Adding search power to public data – Interesting. Wonder if the underlying public data sets will eventually become available on Google App Engine as well, sort of like the public data sets available for use with Amazon EC2 applications.
- MySQL And Search At Craigslist – Jeremy Zawodny's slides on MySQL, Sphinx, and free text search implementation at Craigslist, from last week's MySQL conference.
- Skew, The Frontend Engineer’s Misery @ Irrational Exuberance – For mashups and the like, the distinction between a FE engineer and web dev is rather small in terms of technical skills; they are both using the same skillset, they are both interacting with APIs, and so on. However, there are important distinctions between the two: 1. web developers tend to move in small groups or as individuals, whereas fe engineers work in larger groups, 2. web developers tend to design a product on top of an existing backend service (api, etc), while fe engineers are usually working in parallel with the backend being developed.
- Study: Twitter Audience Does Not Have A Return Policy – Over 60 percent of people who sign up to use the popular (and tremendously discussed) micro-blogging platform do not return to using it the following month, according to new data released by Nielsen Online. In other words, Twitter currently has just a 40 percent retention rate, up from just 30 percent in previous months–indicating an “I don’t get it factor” among new users that is reminiscent of the similarly-over hyped Second Life from a few years ago.
- Hey Americans, Appreciate Your Freedom Of Speech : NPR – Firoozeh Dumas on the underappreciated freedoms of speech and expression we have in the US vs journalists and bloggers in Iran.
site admin | April 17th, 2009 | Comments are closed
These are my links for April 15th through April 17th:
- Paul Buchheit: Make your site faster and cheaper to operate in one easy step – Compress text files with gzip to reduce file size/bandwidth, the incremental cpu cost is usually low relative to the performance gain from lower network cost. Friendfeed uses nginx in front of main web servers for this.
- Jabbify – Free Comet web service and browser client for simple chat and streaming status applications.
- TinEye Image Search Engine – Idée Inc. – The Visual Search Company – Finds references to images online, starting with an original image. Attempts to use image analysis to be independent of scaling, cropping, and other common manipulations.
- All That Twitters Isn’t Gold: A Popular Web Application in Search of a Business Plan – Knowledge@Wharton – Business school take on Twitter and high growth, non-revenue consumer web startups.
- Almost Viral: A Hybrid Acquisition Strategy – "By being almost viral you can grow very cheaply, control your rate of growth and demographics, and get enough traffic to conduct meaningful experiments. Need to grow more slowly? Just decrease your daily ad spend. Need statistically significant results more quickly? Increase your daily ad spend. With a viral coefficient of 0.9 you’ve dealt with your acquisition risk. Rather than going fully viral and dealing with the operational difficulties, it might be worth your time to deal with other market risks: retention, engagement, and monetization. "
site admin | April 15th, 2009 | Comments are closed
These are my links for April 13th through April 15th:
site admin | April 10th, 2009 | Comments are closed
These are my links for April 9th through April 10th:
site admin | February 28th, 2009 | Comments are closed
These are my links for February 27th through February 28th:
site admin | February 21st, 2009 | Comments are closed
These are my links for February 20th through February 21st:
- xkcd – A Webcomic – Online Communities – A map of online communities (circa 2007?)
- State of OpenSocial – weekend Apps Feb 20 2009 – Google Docs – Kevin Marks overview of OpenSocial as of February 2009.
- Massive Scrape of Twitter’s Friend Graph « blog.infochimps.org – Sample dataset for research on social graphs. "The infochimps have gathered a massive scrape of the Twitter friend graph. Right now it weighs in at about 2.7M users, 10M tweets, 58M edges."
- getting theinfo: data sets (theinfo) – Another list of publicly accessible data collections online
- Some Datasets Available on the Web » Data Wrangling Blog – List of many research datasets and resources related to data analysis available online, last updated February 2009.
- ICWSM 2009 – International AAAI Conference on Weblogs and Social Media – May 17 – 20, 2009, San Jose, California. This interdisciplinary conference brings together researchers and industry leaders interested in creating and analyzing social media. Past conferences have included technical papers from areas such as computer science, linguistics, psychology, statistics, sociology, multimedia and semantic web technologies.
site admin | February 17th, 2009 | Comments are closed
These are my links for February 16th through February 17th:
- Top 100 Network Security Tools – Many many security testing and hacking tools.
- FRONTLINE: inside the meltdown: watch the full program – "On Thursday, Sept. 18, 2008, the astonished leadership of the U.S. Congress was told in a private session by the chairman of the Federal Reserve that the American economy was in grave danger of a complete meltdown within a matter of days. "There was literally a pause in that room where the oxygen left," says Sen. Christopher Dodd"
- The Dark Matter of a Startup – "Every successful startup that I have seen has someone within their ranks that just kinda “does stuff.” No one really knows specifically what they do, but its vital to the success of the startup."
- Why I Hate Frameworks – "A hammer?" he asks. "Nobody really buys hammers anymore. They're kind of old fashioned…we started selling schematic diagrams for hammer factories, enabling our clients to build their own hammer factories, custom engineered to manufacture only the kinds of hammers that they would actually need."
- Mining The Thought Stream – Lots of comments around what is Twitter good for and how will it make money, revolving around real/near-time search, analytics, marketing, etc.
- Understanding Web Operations Culture – the Graph & Data Obsession … – Comparison of traffic at Flickr, Google, Twitter, last.fm during the Obama inauguration. "One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate."
Ho John Lee | February 6th, 2006 | 6 comments

This evening I’ve been looking over Prosper (formerly known as CircleOne), a social lending site similar to Zopa, which provides an eBay-like marketplace for borrowers and lenders to transact loans.
Prosper manages credit scoring, loan servicing, and provides social and economic incentives for borrower groups to build their reputation as good lending risks. All loans are 36 months with no prepayment penalty, and Prosper charges a 1% origination fee from the borrower, and 0.5% loan servicing fee from the lender. Groups with good reputations can get some incentive payments for loan performance and lower rates over time.
In addition to using credit scoring, social lending and finance approaches can be effective and much less risky than the borrowers might otherwise appear. Informal lending clubs are common among many Asian and other immigrant communities, and something like this might provide an online venue for a more transparent and widely accessible model. It’s harder to bail out on a debt if everyone knows about it and is also a creditor.
Aside from the philosophical and community aspects, looking at this from the lender’s perspective, it looks like it would behave sort of like buying 36-month bonds with a call option. It’s not like you can do a lot of due diligence on the borrowers, and for the amounts involved it’s not going to be worth the effort, so some combination of reputation and diversification is needed. If there were lots of good-but-unrated credit risks in the borrowing pool, you could build a portfolio of sub-prime loans and possibly achieve something in the range of junk bond returns.
Returning to the philosophical, I like the idea of community and socially based lending, because it values good reputations and provides social incentives for people to perform. On the other hand, it looks like a lot of work for a prospective lender. If they’re looking purely at a financial investment, it’s a lot easier going after a portfolio of bonds or a bond fund, so I think you’d have to want to support the model to participate. The borrower’s case looks much more straightforward, since consumer credit tends to be readily available, but expensive. The listings posted on the site so far include a number of “pay off my credit card” loans, which seems quite sensible.
In a slightly different context, it would be interesting to see something similar which matched borrower groups in relatively poor and developing areas with lenders in relatively wealthy areas. Grameen Bank has done amazing things with microfinance in Bangaladesh, in the sense of helping the borrowing communities building new businesses and opportunities for themselves and making a positive economic return for the bank.
In another different context, it would be interesting to see some kind of market making approach for investing in and financing speculative early stage startups. This wouldn’t work for capital intensive projects, but perhaps there’s some standard terms that could be worked out for asset-light software startups (aka web 2.0). The levels of funding required are too small to justify the level of effort, and there is (or should be) a high mortality rate, which would argue for a lighterweight way to build portfolios of small investments.
(via TechCrunch)
See also:
Zopa – eBay for money?
Ho John Lee | December 3rd, 2005 | 1 comment

Spent a few hours this afternoon at Chris Heuer’s BrainJam event. Wasn’t able to make it to the morning sessions, but arrived in time for the end of lunch and the “youth user panel”, consisting of four college students. They all love Facebook. Not sure how representative they are of the general student demographic, since two of them are trying to put together a web startup. They all use free online music and movie access, mostly through sharing within the dorm networks.
During the Q&A I asked for the panel members’ thoughts on privacy and about having their college lives online in perpituity. They’re vaguely concerned, but I don’t think the topic is really raising red flags for them. I think the high school and college users have more confidence in Facebook, MySpace, Xanga and others keeping their data private and/or it not making any difference to them in the future as social norms change. Part of it is that people are simply making things up on their pages, for the sake of attracting attention, and part of it is them not caring or not understanding that their web pages, chat transcripts, and even VOIP are mostly staying online forever. I think there’s going to be a lot of interesting conflicts in the future as people start running into their past personae 5, 10, 15 years later in a societal context that hasn’t adjusted yet to perpetual transparency.
Afterwards the group broke out into smaller topical discussions. The first session I went to was on the 2-way RSS proposal from Microsoft (Simple Sharing Extensions, SSE). I’m starting to think of SSE as a way for MSFT to use an RSS container for solving the sync problem for applications like Windows Mobile syncing a device and a desktop, or Active Directory performing distributed synchronization of directory data. I’m not really seeing a federated publishing model based on this, an idea that was floated in the conversation. It really feels like it solves an application sync problem for structured data.
The session on “what to do with all the data?” quickly turned into a discussion on privacy, transparency, and DRM. I’m personally disinclined to depend on trusting anyone’s DRM system to manage my criticall personal data, or for allowing anyone to indexing my private data in a way that eventually gets exposed to the world. One point of view expressed in this discussion was that the world would be better off if everyone just got used to the idea that everything they did was recorded and visible to the world (the Global Panopticon), although I think the majority disargreed that this would actually make people behave better. Personally, I think that documenting everything would break a lot of the ambiguity in relationships and conversations that allow the formation of reasonable opinions, by forcing people into adhering to “statements” and “positions” that were nothing more than passing conversation or exploration of a topic. This was part of my thinking behind asking the college kids about privacy. In real life, there are normally various social transitions that call for stepping away or de-emphasizing some aspects of one’s life, in favor of new ones. It doesn’t make the past behaviors and activities go away, but the combination of search engines and infinite, cheap storage is likely to keep some aspects of these folks’ “past” life in their face for a long time, which may make it harder to move forward.
Someone mentioned the idea of “privacy parity”, i.e. you can ask for my data, but I can see that you’re asking for it, sort of like being able to find out when someone has requested your credit report. This is interesting, but there are substantial asymmetries in the value of that information to each party. A bit of parity that would be very interesting would be a feed of who’s seen my site URLs and excerpts in a search results page — not the clickthrough, which I can already see, but when it’s turned up on the page at all.
A few of us continued a sidebar discussion on search, social networks, trust, and attention networks, and eventually got kicked out into the lobby where we were free to speculate on Google’s plan for world domination next to a huge globe in the SRI lobby. I haven’t bumped into anyone yet doing work on integrating the attention, social, and trust data into search. Doing this on a Google/Yahoo/Microsoft scale looks hard, because of the sheer scale, but I’m getting the sense that doing a custom search engine biased by the social / attention data inputs for a limited subject domain (100-1000′sGB) and a relatively small social / atttention network (1000′s – people you know or have heard of) is becoming more reasonable because of cheaper / faster / better IT hardware and because more of the data is actually becoming available now. Still chewing on this. I just came across Danah Boyd’s post on attention networks vs social networks yesterday, which concisely explains the directed vs undirected graph property which underlies part of the ranking algorithms that would be needed.
Perhaps someone’s already done this for a research project.
If Google Desktop were open source, it might be a logical place to insert a modified ranking algorithm based on attention, tags and social networks and also to insert an SSE-style interface to allow peer-to-peer federation of local search queries and results. This would keep the search index data local to “me” and “my documents”, but allow sharing with other clients that I trust. Perhaps it’s just an age thing. The college kids didn’t seem to mind having all of their documents on public servers, are counting on robots.txt to keep them out the global search engines, and apparently think that access controls on sites like Facebook will keep their personal postings out the of the public realm. For me, I still think twice sometimes about posting to my del.icio.us bookmarks list and keep anything really critical on physical media in a safe deposit box in a vault. So while I’ve gone from being Ungoogleable to Google search stardom, there’s a good portion of my digital life which is “dark matter” to the search engines. I’d like to find a way to fix it for myself, and share information with people I trust, and refine my searches over the public internet, but without having to give Google or anyone else all of my personal data.
 |
 |
Took a few photos, photos from others will probably turn up tagged with “brainjams“
Update 12-04-2005 21:15 PST: Audio from the Youth Panel discussion on Chris’s blog
KRON-4 television piece on BrainJams. Looks like I missed the hula hoop part in the morning. I also seem to have mostly missed the non-profit community-oriented discussion, as you can see from my notes. Perhaps that’s what was going on when we got kicked out into the lobby for being too loud…
Ho John Lee | November 3rd, 2005 | 1 comment

I came across a cryptic link to mturk.com on supr.c.ilio.us, asking “Isn’t that how the Matrix came to be?”
Amazon Mechanical Turk provides a web services API for computers to integrate “artificial, artificial intelligence” directly into their processing by making requests of humans. Developers use the Amazon Mechanical Turk web services API to submit tasks to the Amazon Mechanical Turk web site, approve completed tasks, and incorporate the answers into their software applications. To the application, the transaction looks very much like any remote procedure call: the application sends the request, and the service returns the results. In reality, a network of humans fuels this artificial, artificial intelligence by coming to the web site, searching for and completing tasks, and receiving payment for their work.
All software developers need to do is write normal code. The pseudo code below illustrates how simple this can be.
read (photo);
photoContainsHuman = callMechanicalTurk(photo);
if (photoContainsHuman == TRUE) {
acceptPhoto;
}
else {
rejectPhoto;
}
Given the source of the link, I was a little skeptical at first read, but it appears to be a legitimate beta project that just launched yesterday at Amazon. At least, the documentation links point back into Amazon Web Services, and at least one person seems to know someone there.
This is an interesting idea that should find some useful applications. Spammers have supposedly been doing something like this to defeat the image-based Turing tests used to screen comment posting systems, offering access to porn in exchange for solving the puzzles, and there are other anecdotes of using low cost offshore labor for similar tasks. Having a simpler web service interface for finding a human key operator somewhere will probably allow smaller and more experimental applications to emerge.
Update 11-04-2005 08:09 PST – Slashdot, TechDirt, Google Blogoscoped on Mechanical Turk, pointer to BoingBoing on porn puzzles and spam, captcha.net
|
|