Yahoo goes after more tagging assets, buys del.icio.us
Yahoo continues down the path of more tagging and more collaborative content. Having already purchased Flickr, this morning they’re acquiring del.icio.us (terms undislosed):
From Joshua Schachter at the del.icio.us blog:
We’re proud to announce that del.icio.us has joined the Yahoo! family. Together we’ll continue to improve how people discover, remember and share on the Internet, with a big emphasis on the power of community. We’re excited to be working with the Yahoo! Search team - they definitely get social systems and their potential to change the web. (We’re also excited to be joining our fraternal twin Flickr!)
From Jeremy Zawodny at Yahoo Search Blog:
And just like we’ve done with Flickr, we plan to give del.icio.us the resources, support, and room it needs to continue growing the service and community. Finally, don’t be surprised if you see My Web and del.icio.us borrow a few ideas from each other in the future.
From Lisa McMillan, an enthusiastic user of all 3 services (comment on the del.icio.us blog):
Yahoo that’s delicious! I live here. I live in flickr. I live at yahoo. This is insane. You deserve this success dude. Just please g-d don’t let me lose my bookmarks
I’m practically my own search engine. LOL
Tagged bookmarking sites such as del.icio.us can provide a rich source of input data for developing contextual and topical search. The early adopters that have used del.icio.us up to this point are unlikely to bookmark spam or very uninteresting pages, and the aggregate set of bookmarks and tags is likely to expose clustering of links and related tags which can be used to refine search results by improving estimates of user intent. Individuals are becoming their own search engine in a very personal, narrow way, which could be coupled to general purpose search engines such as Yahoo or Google.
I think Google needs to identify resources it can use to incorporate more user feedback into search results. Looking over the users’ shoulders via AdSense is interesting but inadequate on its own because there are a lot of sites that will never be AdSense publishers. Explicit input capturing the user’s intent, whether through tagging, voting, posting, publishing, is a strong indication of relevance and interest by that user. I think the basic Google philosophy of letting the algorithm do everything is much more scalable, but it looks like time to capture more human input into the algorithms.
In a recent post, I pointed out some work at Yahoo on computing conditional search ranking based on user intent. The range of topics on del.icio.us tends to be predictably biased, but for the areas that it covers well, I’d be looking for some opportunities to improve search results based on what humans thought was interesting. As far as I know, Google doesn’t have any assets in this space. Maybe Blogger or Orkut, but those are very noisy inputs.
This seems like a great move by Yahoo on multiple fronts, and I am very interested to see how this plays out.
See also:
- Personalization, Intent, and modifying PageRank calculations
- Five principles of user generated content - Trust, Attention, Relevance, Authority, and Intent
Update 12-12-2005 12:30 PST: No hard numbers, but something like $10-15MM with earnouts looks plausible. More posts, analysis, and reader comments: Om Malik, John Batelle, Paul Kedrosky.
Tags: search, tagging, del.icio.us, yahoo, google, collaboration, socialsoftware, bookmarking, web2.0, business



























December 9th, 2005 at 6:07 pm
Great article, I really enjoyed it.
Google has some pretty sophisticated clustering algorithms to try and do this computationally. Is there really enough user data to make a big difference? After all isn’t spending millions and millions building a user community a riskier move than using that same amount of money to research algorithms and purchase several super computers?
Google has started tagging in GMail and in Personalized Bookmarks. Not to mention Google base… I think they are waiting to see what happens before jumping in with two feet.
Also, what happens when spammers can hire thousands of people in India and China to start spamming these networks? It happened to Usenet, what’s to prevent it from happening again?
You can read more of my reponse to the acquisition in my response to the news.
December 9th, 2005 at 7:31 pm
Google does have a bookmarking feature with tagging, but it’s kind of hidden away in the personalized search stuff. Not as easy to access as del.icio.us (yet).
December 9th, 2005 at 11:41 pm
Yahoo! has chosen to grow via acquisitions. Google merely opts to build their own tools. That aside, it seems that they are focusing on two different areas.
Yahoo! is clearly looking to build an extend their lifestyle brand:
1. Flickr = photos
2. del.icio.us = links
3. Upcoming = events
I bet that they’ll go for Odeo next. Sure they offer a podcasting engine now - but why not incorporate an existing tool?
I agree, it will be interesting to see how this turns out.
December 10th, 2005 at 7:19 am
Ho John Lee, you said, “The early adopters that have used del.icio.us up to this point are unlikely to bookmark spam or very uninteresting pages.”
I think it is a dangerous assumption that this can continue as tagging tries to enter the mainstream. As sites like del.icio.us become more popular, the reward from spamming them increases substantially.
For more on tagging and its potential impact on search, see my previous weblog post, “Questioning tags”:
http://glinden.blogspot.com/2005/04/questioning-tags.html
December 10th, 2005 at 9:08 am
Greg - I agree on the hazards of blindly taking tags at face value, especially as tagging moves from the early adopters to a broader, more visible audience. Jeff also commented on the problem of thousands of spammers coming on board like Usenet back when the AOL users landed. In the absense of filters, most newsgroups became unusable within weeks.
I think metrics around reputation and relationship to trusted sources can keep the signal to noise ratio higher, at least for using bookmarking and tagging as input to search, since spamming patterns can be identified and blocked over time.
In the limited volume of traffic I see here, there are several prominent clusters of spam commenter topics and network signatures which can be used to filter most of the noise automatically. I’m not sure what the del.icio.us team put in place a while back after spam started floating to the “Popular” list but it’s clearly helped reduce the noise. Plus, all posts are tied to a user name, providing another tool for screening.
Applications that rely on real-time or near-time input (buzz trackers, etc) have a harder problem since they generally have to make a decision on the spot.
Bookmarking and tagging by a honest user should be a useful indicator for relevance and reveal some clues about topical intent as well. Up to this point, the average del.icio.us (or any tagging service) user is generally “honest” and somewhat well informed. A rapid expansion to millions of new users on Yahoo may dilute the average quality of the incoming bookmarks, and is likely to amplify the popularity of the popular bookmarks, since they are both “interesting” and visible. It might work out that all bookmarks with more than a few votes are worth paying attention to. This would be analogous to the Bloglines metrics around the number of feeds with more than N subscribers, hardly any have more than 50 subs.
Perhaps there are natural size limits for a useful social tagging community within a given topic domain, below which there aren’t enough eyeballs to capture interesting content and provide enough votes, and above which the items, tags, and taggers need to be clustered down to manageable size.
Still thinking about this. Also continuing to look at how to manage personal bookmarking, tag and attention data, with an eye toward search. I’m sure it’s on a lot of people’s minds this weekend.
December 11th, 2005 at 7:41 pm
How (and where) to download your del.icio.us bookmarks
Last Friday’s announcement that Yahoo is buying del.icio.us has probably got more than a few people thinking about the future of the service and whether they want to keep using it. In any case, as with all of the interesting and useful web serv…