The previous entry was about “powerlaws in the use of tags on del.icio.us”. Then at http://del.icio.us/tag/powerlaw, i found Pietro Speroni’s great post Tagclouds and cultural changes that (also) introduces cloudalicious, a one-night project of Terrell Russell. Cloudalicious shows the evolution in time of the tags used to tag any page on del.icio.us. Very very cool!!!
I tried to find a URL that was showing a non-converging behaviour but I failed. (Pietro was already providing some examples of sites presenting interesting trends in tags use.) Are your able to find at least one controversial URL? A site for which there was a great swift in time in the tags used for it.
For your information, I already tried with sites tagged on del.icio.us under controversial tags (such as abortion, scientology, jew), I tried with microsoft.com (as I was thinking may people would have tagged it as evil but this is not the case [in general people tend to tag what they like and less what they don’t like in order not to increase the visibility of it, so I tried with “terri schiavo blog” that was very visible for a short period of time and I was suspecting the “tasteless” or “awful” tags were much more and growing over time but this is not the case]).
The only one with a little bit of variance over time I was able to find is boingboing.net. See cloudalicious for http://boingboing.netDel.icio.users seem to recognize it as a news site as time passes by. And it also seems that Del.icio.users are moving from “blogs” to “blog” as tag (common pattern or just for boingboing?).
There is some variance also with http://del.icio.us itself: see cloudalicious for http://del.icio.us
So I just repeat the small challenge: Can you find a URL that presents non-converging tags use?
Small suggestion for Terrell Russell (I write it here since I was not able to find his email address on his web site). [I’m sure he probably has already figured out by itself this suggestion since he was so good to put together in one night a great tool!]
Cloudalicious interface at the moment asks for These URLs (that) can be found at del.icio.us – they’re the red “and X other people” links. (for example, http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b represents http://boingboing.net/).
ec08a8ddfda4f2f9cad3a142dc49e23b is the md5sum of http://boingboing.net/
There are 2 easy way to obtain it automatically: (1) run md5sum on the server, (2) use http://del.icio.us/url?url=http://… (in which http://… can be replaced by the website we want to cloudicious).
In this way, users could enter in the Cloudicious interface, the real URL they are interested in (http://boingboing.net) and not the less easy to find (http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b)
A bookmarklet and a greasemonkey extension (working on the site the user is browsing) are left as easy exercise for the reader as well ;-)
Lastly, let me mention that one of the key point of Clay Shirky in Ontology is Overrated: Categories, Links, and Tags that is also present is Pietro’s post is that the correct way of categorizing something does not exist (initial Yahoo! approach was trying to force this and failed and librarians still (must) try to adopt this semplifying but wrong assumption). Instead there are as many correct ways of categorizing a thing as there are users. This resonates with my study on controversial users on Epinions (pdf): the idea that there is a global value of trustworthiness/reputation for every user/peer in the system does not make sense but still most of the papers in the reputation/trust literature start with this wrong and misleading assumption.
UPDATE: I just found it now but Pietro in
On Tag Clouds, Metric, Tag Sets and Power Laws was already mentioning that the paper by Clay Shirky “Power Laws, Weblogs, and Inequality” started to be tagged as longtail only after the article from Wired: The Long Tail came out. See cloudalicious for http://www.shirky.com/writings/powerlaw_weblog.html.
I don’t agree with Clay Shirky’s article on ontologies. In fact, what she founds to be shortcomings of ontologies are not only facts known since the foundation of the discipline itself (let’s say… Aristotle, circa 300 BCE), but in fact the driving motivations of the study of knowledge representation.
OF COURSE everyone’s view of the world is subject to biases and prejudices. Biases are what makes knowledge possible (see Kant). (Shannon’s) information is always “information about a difference”. You have a difference when you notice that something is different than you expected. But… if you “expect” something, you have a bias, and a person with different biases sees different things. Categorizations are meaningful ONLY BEACAUSE they are biased. Successful “knowledge representation” (think of mathematical and formal models, theories, etc.) are always highly biased: they are abstractions that throw away all the stuff that is irrelevant for the intended purpose.
By the way, in my work as a developer of software tools for knowledge management, I found that my customers are always highly biased, aware of their biases, and they want a highly biased information system – which is perfectly functional for them. Notice the different perpective: for Shirky biases are laughable defects (she makes fun of soviet catalogs based on marxism) whereas for me biases are where knowledge is. No bias, no knowledge (this is an very old idea in philosophy).
Here comes the interesing point. The goal of ontologies is to make biases (or “definitions”, or “intended meaning”) EXPLICIT. To formally define them, in order to enable “communication”. By the way, the problems outlined in the article are not only restricted to ontologies: they are, more radically, related to what linguists call “the illusion of communication”. I’m writing this text “hoping” that we share the intented meaning of these words, and so you can understand what I’m saying; this is true only to a certain extent.
This perspective raises a lot of challenges, some of them are identified in the article, and they are part of the research programs dealing with formal ontologies.
By the way, IMHO Folksonomies are overrated. I’m not interested in having a thousand people tag a document as related to the topic X, if they do not agree on what X is. Ok, of course this might be just fine for del.icio.us, but if you want to trasfer your electronic clinical folder from Rome to Tokyo for performing surgery, you probabily would want to be sure that the differences between the two information systems are well understood at the meta-level… which is in fact the ontological level.
Assuming “we share the intented meaning of these words, and so I can understand what you’re saying;”… I think I don’t agree with you. ;-) [well, let not start about the perceived meanings of these 3 small characters “;” “-” and “)” ]
Actually, I think Clay in part shares your views. Biases are the important part: the fact that different people view a thing in different ways. What he challenges is the fact that there could be a “central” identity that defines what is the correct way of categorizing.
If developers would all agree that in a sociery where there are 23 humans/agents, there are 23 different ontologies (ways to think about the world), I think Clay would quite down (at least I would).
Of course the other point is “are taxonomies an artifact easy enough to be used?”. I think that flat labels are much easier (to be understood and used) and this should not be understimated. Am I saying that easiest technology is the one we should adopt? Not always, but Occam still has a lot to teach us…
You says “The goal of ontologies is to make biases (or “definitions”, or “intended meaning”) EXPLICIT.”
I think that if you see my tags on del.icio.us (or my tags on flickr, or the categories I use on my blog) then you would have a clear picture of what my views of the world are (my interests, how I assign meanings to stuff in the world, …), this means that then we can start “negotiating” our meanings, possibly in a automatic way.