The previous entry was about “powerlaws in the use of tags on del.icio.us”. Then at http://del.icio.us/tag/powerlaw, i found Pietro Speroni’s great post Tagclouds and cultural changes that (also) introduces cloudalicious, a one-night project of Terrell Russell. Cloudalicious shows the evolution in time of the tags used to tag any page on del.icio.us. Very very cool!!!
I tried to find a URL that was showing a non-converging behaviour but I failed. (Pietro was already providing some examples of sites presenting interesting trends in tags use.) Are your able to find at least one controversial URL? A site for which there was a great swift in time in the tags used for it.
For your information, I already tried with sites tagged on del.icio.us under controversial tags (such as abortion, scientology, jew), I tried with microsoft.com (as I was thinking may people would have tagged it as evil but this is not the case [in general people tend to tag what they like and less what they don’t like in order not to increase the visibility of it, so I tried with “terri schiavo blog” that was very visible for a short period of time and I was suspecting the “tasteless” or “awful” tags were much more and growing over time but this is not the case]).
The only one with a little bit of variance over time I was able to find is boingboing.net. See cloudalicious for http://boingboing.netDel.icio.users seem to recognize it as a news site as time passes by. And it also seems that Del.icio.users are moving from “blogs” to “blog” as tag (common pattern or just for boingboing?).
There is some variance also with http://del.icio.us itself: see cloudalicious for http://del.icio.us
So I just repeat the small challenge: Can you find a URL that presents non-converging tags use?
Small suggestion for Terrell Russell (I write it here since I was not able to find his email address on his web site). [I’m sure he probably has already figured out by itself this suggestion since he was so good to put together in one night a great tool!]
Cloudalicious interface at the moment asks for These URLs (that) can be found at del.icio.us – they’re the red “and X other people” links. (for example, http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b represents http://boingboing.net/).
ec08a8ddfda4f2f9cad3a142dc49e23b is the md5sum of http://boingboing.net/
There are 2 easy way to obtain it automatically: (1) run md5sum on the server, (2) use http://del.icio.us/url?url=http://… (in which http://… can be replaced by the website we want to cloudicious).
In this way, users could enter in the Cloudicious interface, the real URL they are interested in (http://boingboing.net) and not the less easy to find (http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b)
A bookmarklet and a greasemonkey extension (working on the site the user is browsing) are left as easy exercise for the reader as well ;-)
Lastly, let me mention that one of the key point of Clay Shirky in Ontology is Overrated: Categories, Links, and Tags that is also present is Pietro’s post is that the correct way of categorizing something does not exist (initial Yahoo! approach was trying to force this and failed and librarians still (must) try to adopt this semplifying but wrong assumption). Instead there are as many correct ways of categorizing a thing as there are users. This resonates with my study on controversial users on Epinions (pdf): the idea that there is a global value of trustworthiness/reputation for every user/peer in the system does not make sense but still most of the papers in the reputation/trust literature start with this wrong and misleading assumption.
UPDATE: I just found it now but Pietro in
On Tag Clouds, Metric, Tag Sets and Power Laws was already mentioning that the paper by Clay Shirky “Power Laws, Weblogs, and Inequality” started to be tagged as longtail only after the article from Wired: The Long Tail came out. See cloudalicious for http://www.shirky.com/writings/powerlaw_weblog.html.
I read the wonderful Ontology is Overrated: Categories, Links, and Tags by Clay Shirky (highly recommended! Read it all!). Near the end, he speaks about “Tag Distributions on del.icio.us” and shows a graph that resembles a powerlaw (even if this is about only 2 hours of activity of 64 del.icio.users). After 2 weeks of powerlaws, I see powerlaws everywhere and I thought “let’s try to test the hypothesis on a bigger dataset from del.icio.us”. Well, few googling-minutes told me that many people had already had this idea and already performed tests on del.icio.us.
And of course many of them can be found looking at http://del.icio.us/tag/powerlaw (the del.icio.us page that shows all the URLs tagged under “powerlaw”) [this is kind of uber-cool-self-referentialism].
Among the many, I just cite http://www.cozy.org/d/
(from which the image shown here is taken), where 84 popular URLs are studied and shown to exhibit a powerlaw structure (in the tags used for them). I suspect the value of del.icio.us can be found in the long tail of tagging as well.
Each dot on the log-log charts represent a tag. The most used tag appears to the left while the least appears to the right. All charts have the same x and y range, .5 to 1350; so the slope of these lines is about -1.
Some weeks ago, Tantek was introducing a new microformat hReview.
We are pleased to announce the first public draft (v0.1) of hReview, jointly co-authored by representatives from America Online, CommerceNet Labs, Microsoft, Six Apart, Technorati, and Yahoo!. hReview is an open microformat standard for publishing and indexing distributed reviews on the Web. This standard enables users to contribute, identify, and aggregate review content on their own web sites and blogs as well as on community sites.
I didn’t have time yet to dig into it but it is good that they analyzed previous attempts (I was trying to use RVW by Alf Eaton and to keep my list on Allconsuming but I didn’t put too much effort into this) and that they ask for Feedback; almost all the links are to Wikipages so you can edit them directly there.
In general I really appreciate the work of Technorati (I also wrote a paper backing their proposal of VoteLinks, submitted to Web Intelligence 2005: “Page-reRank: using trusted links to re-rank authority” (pdf)).
Some other link I’ll try to digest later on: jluster on hreview, hreview on technorati, hreview on del.icio.us, organizedshopping on hreview, adriancuthbert suggested to use this_is_an_hreview as common tag (tagspace?).
It would be great to have this format widely adopted so that the amount of decentralized published reviews will become soon huge and I will have a large amount a data for what I’m working on in my PhD: Trust-aware decentralized Recommender Systems. If interested, check my (a bit outdated) PhD proposal at my papers page.
I tend to be enthusiastic about folksonomy and forget considering in what they are good and in what they are not, basically I forget to keep asking myself questions instead of blatantly state “Here we need a folksonomy! Yeahhey!!!”. Anyway, as a sort of balance, you might want to read a post by Gene Smith and one by danah that are more critics than I am (unfortunately).
Social Capital and Social Networks – Bridging Boundaries conference seems interesting. Moreover there is no registration fee and Junior scholars, graduate students and assistant professors, are invited to apply to attend the conference and receive lodging, meals, and up to $400 in travel expenses. The application deadline was May 5, 2005 (oops). I cannot make it but if you are in US, it is worth checking it.
The interface of Rojo is totally unusable (at least to me), i don’t understand the interface metaphors. What attracted me was the ability to tag your friends. So a curiosity: how would you tag me?
Our vision is that the next generation of feed reading requires new forms of organization so we built in the ability to tag your world, your content, your feeds, and even your friends.
We were used to organize our bookmarks in folders, then del.icio.us came and we now appreciate folksonomies (flat taxonomies, just a set of free keywords you can attach to URLs). We are used to operating systems that allow us to categorize files (knowledge) on folders, would it make sense to have an operating system that allows us to categorize files only based on taxonomy (just add keywords to any file, all the files are in a flat pool)? I don’t know.
What I know is that the total lack of concurrency in the Operating Systems domain (actually just one global monopoly) is depriving all of us of new ideas, new paradigms, progress. If you compare it with the vibrant Web, where a new idea gets implemented and proposed almost daily, you can maybe see how far we would be if there were a free market for Operating Systems.
Anyway, how could we call it? What about FolkOS? FolkOS, the Folksonomy Operating System, I can already see the advertisements…. And, yes, I patented the idea, I got every possible TradeMark and not only on Earth. I patented FolkOS also on Venus and Alpha Centauri (venusians and alphacentaurians be aware! Don’t use my patented ideas! I have the best lawyers of the galaxy!).
[I tend to overload my emails of smilies (for expressing when I’m joking) but I don’t like them on blog posts, so I’m not sure my 4 readers understand when I (try to) make a joke. So, just to be sure, this is a joke … I think patenting computational ideas is a total nonsense (maybe a video can help in understanding why)].
I’ll be in Trieste at the Abdus Salam ICTP (Unesco funded school) during next 2 weeks (16 – 28 May 2005) for the School and Workshop on Structure and Function of Complex Networks (i was advertising about it time ago and I got accepted). I’m so excited. The list of speakers is simply great (see below) and there are participants from all over the world, in fact “Although the main purpose of the Centre is to help research workers from developing countries, a limited number of students and post-doctoral scientists from developed countries are also welcome to attend.“.
If you happen to be there and want to discuss a bit about blogosphere, trust, reputation, social software, social networks, languages, globalization, … just whatever, please contact me!
A paper of mine titled “Controversial Users demand Local Trust Metrics: an Experimental Study on Epinions.com Community” (pdf) got accepted for the Twentieth National Conference on Artificial Intelligence (AAAI-05)! Cool! The email I received this morning says “Your paper was one of 148 accepted to AAAI-05, out of 803 submissions. AAAI is a highly selective conference, and you are to be congratulated on your paper’s acceptance.” This means acceptance rate is 18%. Let me know if you like/dislike the paper or want to discuss its topic a bit. I think controversiality is an important theme and I think there are too many papers that assume that every user/agent has a global goodness value that is the same for everyone (there are some users that are bad for everyone and the goal of the technique is to spot them out). This assumption is unrealistic: just think of Bush or Berlusconi … some people like them (yeah, I know it’s kinda incredible) and some other don’t. My paper hopefully provide some evidence about this intuitive phenomena. You might also want to check other papers of mine.
Title: Controversial Users demand Local Trust Metrics: an Experimental Study on Epinions.com Community
Abstract: In today’s connected world it is possible and very common to interact with unknown people, whose reliability is unknown. Trust Metrics are a recently proposed technique for answering questions such as “Should I trust this user?”. However, most of the current research
assumes that every user has a global quality score and that the goal of the technique is just to predict this correct value. We show, on data from a real and large user community, epinions.com, that such an assumption is not realistic because there is a signicant
portion of what we call controversial users, users who are trusted and distrusted by many. A global agreement about the trustworthiness value of these users cannot exist. We argue, using computational experiments, that the existence of controversial users (a normal phenomena in societies) demands Local Trust Metrics, techniques able to predict the trustworthiness of an user in a personalized way, depending on the very personal view of the judging user.
While I think PageRank is a very clever (though simple) idea, I’m not very sure about HITS. What this algorithms are for? For predicting the quality of a page on the Web based on all the links between pages. PageRank assumes that a page linked by many pages and linked by pages of high quality (recursive!) has a good quality, i.e. it is an authority. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs.
So, why do I appreciate PageRank and less HITS? Because the latter can be easily attacked. The PageRank of this page depends only on the pages linking to this page and I cannot easily force everyone on the web to link to this page. It depends on what other pages decide to link and I have no power over it.
Conversely, according to HITS, the hubness of this page depend on the pages this page link to, and I have total power over the pages I link to! Do I want this page to become an hub about cars? It is enough to link to (what I think are) cars authorities: bmw, mercedes, ferrari, ford, renault, … (fiat is better not). Then do I want to exploit the hubness score this page got? I would simply link also to crappyCarsISell.com. HITS thinks this page is an hub and, since an hub by definition points to authorities, hence HITS thinks crappyCarsISell.com is a car authority.
What matters is Direction of links! I have no control on links that go in my page but I have total control in links that go out of my page. Anyway I think the work by Kleinberg is simply great but HITS does not take into account the fact that users will always try to game systems (especially, but not only, if they have an immediate benefit).
… I was almost forgotting the initial reason of this post: I got remind about HITS reading Lexical authorities in an encyclopedic corpus: a case study with Wikipedia by my friend Francesco, whose blog I just discovered today via a comment he left here. And this means one less friend without a blog! Welcome Francesco!