|
|
site admin | April 23rd, 2009 | Comments are closed
These are my links for April 20th through April 23rd:
- What I’ve Learned from Hacker News – Paul Graham on social dynamics and managing Hacker News, user submitted comments and ranking (voting up/down) , editorial intervention and moderators, project goals.
- SEOmoz | Reddit, Stumbleupon, Del.icio.us and Hacker News Algorithms Exposed! – Looking at variations on algorithms for ranking items on social news aggregators
- NGINX + PHP-FPM + APC = Awesome – Walkthrough on setting up cached PHP web server on nginx with apc.
- Particletree » PHP Quick Profiler – Lightweight tool for profiling PHP code.
- MySQL’s Full-Text Formulas – Database Journal –
- http://www.acapela-group.com/text-to-speech-interactive-demo.html – Online text-to-speech demo, with various male and female speakers, plus a few translations.
- Dealing with Duplicate Person Data – Proud to Use Perl – Classifying likely duplicate entries in name/address contact data using Levenshtein distance and tables of nickname synonym and assigned distance weights.
- Web Security Horror Stories: The Director’s Cut at <head> – Presentation slides from a talk by Simon Willison on cross site scripting, SQL injection, referer forgery, and clickjacking attacks on web applications.
site admin | April 9th, 2009 | Comments are closed
These are my links for April 9th from 08:07 to 17:53:
- IP address geolocation SQL database – IP address geolocation with MySQL by Marc-Andre Caron. He's done all the necessary legwork to solve this problem, putting together a free, monthly-updated MySQL dataset that will allow you to derive country, region, city, zip, latitude, and longitude from an IP address.
- Del.icio.us Finally Gets Some Respect from Yahoo – Probably Too Late – ReadWriteWeb –
- In the Event That You Have Accidentally Swallowed the Higgs Boson by Michael Rottman – The Morning News – "7. Do you feel protons decaying? Grand Unification may be occurring near your vital organs. "
- FT.com / Companies / UK companies – Dotcom veterans in Twitter ‘brains trust’ – "Mr Read has brought together a “brains trust” of advisers to Twitter Partners, including Brent Hoberman and Martha Lane Fox, founders of Lastminute.com; Saul Klein, a partner at Index Ventures, the London venture capitalists; and Toby Coppel, the former European vice-president at Yahoo."
- byteonic.com » What you cannot do using Java in Google App Engine – List of some restrictions on Java code running on GAE
The automatic nightly link posts from del.icio.us stopped working properly sometime last year. The links would get posted, but had extra “\n” inserted at every line break. Here’s an example. An unexpected side effect of having “ugly” link posts is that I mostly stopped posting links to del.icio.us for a while.
As part of the recent blog platform update, I’ve switched from the del.icio.us “experimental” nightly blog posting to Postalicious, which seems to be working nicely, you can see the new link post style (and the old ones too, unless I get around to cleaning them up) here.
The nice visualization of my del.icio.us tags reminds me:
If you normally reading my blog using a browser, you might want to take a look at the “Recent Links” section.
That category is automatically generated from del.icio.us. I got tired of seeing post after post titled “Links for xxxx-xx-xx” on the home page, so they don’t show up there, and they don’t show up on the recent posts listing either.
I normally try to add a descriptive comment and tags, so the links section is somewhat readable. At the moment, del.icio.us has a maximum of something like 255 characters for the comment, which can lead to terse and/or truncated prose, though.
If you normally read this site through a feedreader such as Bloglines or Rojo, the link posts are included in the full feed, but excluded from the index-page-only feed.

Here is a visualization of my del.icio.us tags, by Kunal Anand, who’s been collecting del.icio.us tags and turning them into interesting pictures. Here’s the short explanation he sent along:
1. Each dot represents a tag (aka a node)
2. Each line represents an intersection between tags
3. The center of the visualization (denoted by a colored gradient), represents the heavy set of intersections
It appears that I have a fairly consistent set of regularly used tags, and a fairly even distribution of less used tags that intersect with the most common ones.
For comparison, see visualizations of tags from Brad Feld, Tom Coates, Pete Freitag
Del.icio.us is testing out private bookmarks now.
I’ve been playing with a private instance of Scuttle ever since del.icio.us was purchased by Yahoo a few months back, but have continued using del.icio.us for posting public links anyway.
My del.icio.us links are automatically posted here (except when one end or the other is out of service for some reason), don’t know if that would include the private ones or not. Also don’t know exactly where the private bookmarks might be visible, aside from in one’s own account. I’ll have to give it a try.
This evening’s SearchSIG featured a panel discussion on tagging and social bookmarking.
L-R: Joshua Schachter (del.icio.us), Kevin Rose (Digg), Michael Tanne (Wink), Manish Chandra (Kaboodle)
Charlene Li (from Forrester) moderated.
The room at Yahoo was full — standing room only. A quick show of hands indicated nearly everyone in the room had used tagging services before.
Some discussion about “how can we trust the tags”, tag spam (Charlene’s term was “spag”), discerning intent from user tagging and other actions, and the problems of tagging users and the range of social gestures built into the various systems.
Joshua used the example of receiving LinkedIn connection requests from someone whose name you don’t recognize. You don’t want to accept it, because you don’t know who it is. You don’t want to reject it, because it would be rude, and you might actually know them. So he has a huge backlog of random connection requests piling up in his inbox.
Someone in the audience commented that between keyworded search and tagging, people are starting to lose grammar, and instead come up with “restaurant san francisco cool” instead of complete sentences.
Participation rates: Wink assumed 5-8% of their users would tag, actual is 30-40% active (but they’re just launching and are picking up a lot of knowledgeable early adopters from word of mouth). Digg has around 20% of their traffic from registered users (they don’t exactly tag, just digg). Kevin says Digg has around 140K registered users, generating around 4M pageviews per day.
Charlene wrapped up the Q&A with some predictions for the upcoming year:
1. The rise of some sort of social link and social standing system to “rate” users
2. Some sort of social “disaster” will occur on one of the new services, despite best efforts to prevent social disease from creeping in.
3. Today’s companies are mostly small, smart, startups. In a year there will be a different cast of characters from mainstream media, search engines, bigger players.
Thanks to Jeff Clavier and Dave McClure for organizing another great session.
Ho John Lee | December 11th, 2005 | 2 comments
Last Friday’s announcement that Yahoo is buying del.icio.us has probably got more than a few people thinking about the future of the service and whether they want to keep using it. In any case, as with all of the interesting and useful web services out there, it’s good to take time now and then to back up your personal data, in case something goes sideways and the service becomes unavailable or unusable for whatever reason.
I’m personally planning on continuing to use del.icio.us, although there are a number of interesting tagged bookmarking alternatives out there, including running your own.
The first step is to get your personal bookmark data, which can be obtained through the del.icio.us API. You can retrieve all your saved bookmarks at del.icio.us/api/posts/all, which will return an XML file that can be saved to your local system and used as a backup or to import your bookmarks into another web application elsewhere.
The next step is to decide what you want to do with the data. Some alternative tagged bookmarking solutions include:
The following services are based on open source projects, so you can (or in some cases have to) run your own bookmarking system.
Yahoo already runs MyWeb2.0, which presumably will begin to merge with del.icio.us at some point. It has a lot of interesting features, but hasn’t had enough to get me to switch over up to this point. I’ve been wanting private bookmarks and tags on del.icio.us for a while, although I think I’ll be moving those off my desktop onto a roll-your-own server solution.
Any more suggestions? Reply in the comments and I’ll pull them up to the main post.
Here’s an extensive list of free bookmark managers at lights.com (via David Beisel)
Ho John Lee | December 9th, 2005 | 6 comments
Yahoo continues down the path of more tagging and more collaborative content. Having already purchased Flickr, this morning they’re acquiring del.icio.us (terms undislosed):
From Joshua Schachter at the del.icio.us blog:
We’re proud to announce that del.icio.us has joined the Yahoo! family. Together we’ll continue to improve how people discover, remember and share on the Internet, with a big emphasis on the power of community. We’re excited to be working with the Yahoo! Search team – they definitely get social systems and their potential to change the web. (We’re also excited to be joining our fraternal twin Flickr!)
From Jeremy Zawodny at Yahoo Search Blog:
And just like we’ve done with Flickr, we plan to give del.icio.us the resources, support, and room it needs to continue growing the service and community. Finally, don’t be surprised if you see My Web and del.icio.us borrow a few ideas from each other in the future.
From Lisa McMillan, an enthusiastic user of all 3 services (comment on the del.icio.us blog):
Yahoo that’s delicious! I live here. I live in flickr. I live at yahoo. This is insane. You deserve this success dude. Just please g-d don’t let me lose my bookmarks I’m practically my own search engine. LOL
Tagged bookmarking sites such as del.icio.us can provide a rich source of input data for developing contextual and topical search. The early adopters that have used del.icio.us up to this point are unlikely to bookmark spam or very uninteresting pages, and the aggregate set of bookmarks and tags is likely to expose clustering of links and related tags which can be used to refine search results by improving estimates of user intent. Individuals are becoming their own search engine in a very personal, narrow way, which could be coupled to general purpose search engines such as Yahoo or Google.
I think Google needs to identify resources it can use to incorporate more user feedback into search results. Looking over the users’ shoulders via AdSense is interesting but inadequate on its own because there are a lot of sites that will never be AdSense publishers. Explicit input capturing the user’s intent, whether through tagging, voting, posting, publishing, is a strong indication of relevance and interest by that user. I think the basic Google philosophy of letting the algorithm do everything is much more scalable, but it looks like time to capture more human input into the algorithms.
In a recent post, I pointed out some work at Yahoo on computing conditional search ranking based on user intent. The range of topics on del.icio.us tends to be predictably biased, but for the areas that it covers well, I’d be looking for some opportunities to improve search results based on what humans thought was interesting. As far as I know, Google doesn’t have any assets in this space. Maybe Blogger or Orkut, but those are very noisy inputs.
This seems like a great move by Yahoo on multiple fronts, and I am very interested to see how this plays out.
See also:
Update 12-12-2005 12:30 PST: No hard numbers, but something like $10-15MM with earnouts looks plausible. More posts, analysis, and reader comments: Om Malik, John Batelle, Paul Kedrosky.
Ho John Lee | September 23rd, 2005 | 5 comments
This note captures some thoughts in progress, feel free to chip in with your comments…
Here’s a feature wish list for link tagging:
- Private-only links – only I can see them at all
- Group-only links – only members of the group can see them
- Group-only tags – only members of the group can see my application of a set of tags
- Unattributed links – link counts and tags are visible to the public, but not the contributor or comments
Tagged bookmarking services such as del.icio.us allow individuals to save and organize their own collection of web links, along with user-defined short descriptions and tags. This is already convenient for the individual user, but the interesting part comes from being able to search the entire universe of saved bookmarks by user-defined tags as an alternative or adjunct to conventional search engines.
Bits of collective wisdom embodied in a community can be captured through aggregating user actions representing their attention, i.e. the click streams, bookmarks, tags, and other incremental choices that are incidental to whatever they happened to be doing online. The result of a tag search are typically much smaller, but are often more focused or topically relevant than a search on Google or Yahoo.
It’s also interesting to browse the bookmarks of other people who have tagged or saved similar items. To some extent the bookmark and tag collection can be treated as a proxy for that person’s set of interests and attention.
In a similar fashion, clicking on a link (or actually purchasing an item), can be treated as a indication of interest. This is part of what makes Google Adsense, Yahoo Publisher Network, and Amazon’s Recommendations work. The individual decisions are incidental to any one person’s experience, and taken on their own have little value, but can be combined to form information sets which are mutually beneficial to the individual and the aggregator. Web 2.0 thrives on the sharing of “privately useless but socially valuable” information, the contribution of individuals toward a shared good.
In the case of bookmarking services, the exchange of values is: I get a convenient way to save my links, and del.icio.us gets my link and tag data to be shared with other users
One problem I run into regularly is that everything is public on del.icio.us. For most links I add, I am happy to share them, along with the fact that I looked at them, cared to save it, and any comments and tags I might add. Del.icio.us starts out with the assumption that everyone who bookmarked something there would want to share. As I use it more regularly, though, I sometimes find situations where I want to save something, but not necessarily in public. Typically either
a) don’t want to make the URL visible to the public, or
b) don’t mind sharing the link, but don’t want to leave a detailed trail open to the public.
The first case, in which I’d like to save a link for my private use, is arguably just private information and shouldn’t actually be in a “social bookmarks” system to begin with. However, there is a social variant of the private link, which is when I’d like to share my link data with a group, but not all users. This might be people such as members of a project team, or family or friends. It’s analogous to the various photo sharing models, in which photos are typically shared to the public, or with varying systems of restrictions.
The second case, in which I’m willing to share my link data, but would like to do so without attribution, is interesting. In thinking about my link bookmarking, I find that I’m actually willing to share my link, and possibly my tag and comment data, but don’t want to have someone browse my bookmark list and find the aggregated collection there, as it probably introduces too much transparency into what I’m working on. At some point in time, it’s also likely that I would be happy to make the link data fully visible, tags, comments, and all, perhaps after some project or activity is completed and the presence of that information is no longer as sensitive.
The feature wish list above would address some of the not-quite-public link data problems, while continuing to accrete community contributed data. In the meantime, I’m still accumulating links back behind the firewall.
Another useful change to existing systems would be to aggregate tag or search results based on a selected set of users to improve relevance. This is along the lines of Memeorandum, which uses a selected set of more-authoritative blogs as a starting point to gauge relevance of blog posts. In the tagged search case, it would be interesting if I could select a number of people as “better” or “more relevant” at generating useful links, and return search results with ranking biased toward search nodes that were in the neighborhood of links that were tagged by my preferred community of taggers.
It’s possible to subscribe to specific tags or users on del.icio.us, but what I had in mind was more like being able to tag the users as “favorites” or by topic and then rank my search results based on their link and tag neighborhoods. I don’t actually want to look at all of their bookmarks all the time.
Something similar might also work with search result page clickthroughs. These sorts of approaches seem attractive, but also seem too messy to scale very well.
Unattributed links may be too vulnerable to spamming to be useful. One possible fix could be to filter unattributed links based on the authority of the source, without disclosing the source to the public.
I was at the Techcrunch meetup last night, didn’t have a chance to talk with the del.icio.us folks who were apparently around somewhere, but Ofer Ben-Shachar from Raw Sugar did mention that they were looking at providing some sort of group-only access option for their tagging system.
A lot of this could be hacked onto the existing systems to solve the end user problem easily, but some of the initial approaches that come to mind start to break the social value creation, and I think those could be preserved while making better provisions for “private” or “group” restrictions by working on the platform side.
Ho John Lee | September 11th, 2005 | Leave a comment

Revealicious (via Social Software Weblog):
Revealicious is a set of graphic visualisations for your del.icio.us account that allow you to browse, search and select tags, as well as viewing posts matching them.
There are three different visualization modes, SpaceNav, TagsCloud, and Grouper, which depict the relationship of tag use and frequency among your del.icio.us bookmark collection.
The Revealicious page also has links to a previous project by one of the authors, DeliciousSoup, and a post elsewhere with an extensive list of del.icio.us tools.
Del.icio.us is interesting / frustrating in that it has almost no user interface, but exposes enough of an API for 3rd parties to try building their own applications on top of the data.
I don’t find these visualizations particularly useful on my own bookmarks, but they point toward interesting ways of exploring large sets of tags and other link relationships. Plus they look cool.
Ho John Lee | September 1st, 2005 | 1 comment
Rashmi just posted some thoughts about the Lazy Sheep bookmarklet.
From the Lazy Sheep page:
Using the tags and descriptions shared by other del.icio.us users, Lazy Sheep makes tagging a page a one-click operation. In order to best suit any user, Lazy Sheep also includes a comprehensive set of options that can be configured to your exact specifications.
Rashmi’s comments:
It makes some sense at the individual level – I can gain from the wisdom of the others, without doing any work. But even at the individual level, there are disadvantages. First, the auto-tags might not capture my idiosyncratic associations (reducing findability when I look for the article later on). Second, it replaces the self-knowledge with social knowledge. Instead of a moment of reflection on my current interests, I simply find out how others think about the topic. Social knowledge in the context of self-knowledge is a beautiful thing, mere social knowledge just encourages the sheep mentality (which is the point of the bookmarklet I guess).
At the social level (which is what worries me more), if enough people started doing this, the value of del.icio.us would be diluted. We would loose some of the richness of the longtail, and just reinforce what the majority is saying. The first few people who tagged the article would set the trend – others would merely follow.
I seem to be having a lot of conversations with people lately about tagging and group search. I think of the auto tagging embodied in Lazy Sheep as an amplifier for the biases of the first few taggers. A less problematic solution would be to only use your own tags as input to the Lazy Sheep, or perhaps to select some “similar-thinking” taggers as a starting point.
I’ve been thinking about something like the latter for building a better personal search and tagging system. I’d like to be able to bias the search results based on the attention choices of people I think might be relevant, not the entire world. On the other hand, I don’t want to give up my entire clickstream for public consumption.
An aside on the tagging bias issue: Hal Abelson mentioned to me the other day that “IRC” and “Mouse” are closely related in some tag relatedness searches, because “IRC” associated with “Chat”, and “Chat” in French is “Cat”, which related to “Mouse”.
In my case, I consciously tend not to look at what tags have already been applied, because I’m hoping in the future to apply some sort of clustering or other relatedness filters on my own bookmarks to improve searches if I eventually accumulate enough data and motivation.
I think auto tagging can be very helpful, but it might be like using PowerPoint templates: after a while everything starts turning out the same way if you’re not careful.
Del.icio.us now has a search feature!
If you are logged in, hit “search” and then be sure you check off “entire site.”
It’s slow, but far better than nothing or the various hacks that have been floating around in the absence of a search function for del.icio.us.
Also added: Recommendations
If you have more than ten urls saved in a tag, you will be offered several urls as well as other people’s tags that the system thinks you will be interested in. The URL pages also now offer the chance to see other related URLs.
|
|