I read the wonderful Ontology is Overrated: Categories, Links, and Tags by Clay Shirky (highly recommended! Read it all!). Near the end, he speaks about “Tag Distributions on del.icio.us” and shows a graph that resembles a powerlaw (even if this is about only 2 hours of activity of 64 del.icio.users). After 2 weeks of powerlaws, I see powerlaws everywhere and I thought “let’s try to test the hypothesis on a bigger dataset from del.icio.us”. Well, few googling-minutes told me that many people had already had this idea and already performed tests on del.icio.us.
And of course many of them can be found looking at http://del.icio.us/tag/powerlaw (the del.icio.us page that shows all the URLs tagged under “powerlaw”) [this is kind of uber-cool-self-referentialism].
Among the many, I just cite http://www.cozy.org/d/
(from which the image shown here is taken), where 84 popular URLs are studied and shown to exhibit a powerlaw structure (in the tags used for them). I suspect the value of del.icio.us can be found in the long tail of tagging as well.
Each dot on the log-log charts represent a tag. The most used tag appears to the left while the least appears to the right. All charts have the same x and y range, .5 to 1350; so the slope of these lines is about -1.
Some weeks ago, Tantek was introducing a new microformat hReview.
We are pleased to announce the first public draft (v0.1) of hReview, jointly co-authored by representatives from America Online, CommerceNet Labs, Microsoft, Six Apart, Technorati, and Yahoo!. hReview is an open microformat standard for publishing and indexing distributed reviews on the Web. This standard enables users to contribute, identify, and aggregate review content on their own web sites and blogs as well as on community sites.
I didn’t have time yet to dig into it but it is good that they analyzed previous attempts (I was trying to use RVW by Alf Eaton and to keep my list on Allconsuming but I didn’t put too much effort into this) and that they ask for Feedback; almost all the links are to Wikipages so you can edit them directly there.
In general I really appreciate the work of Technorati (I also wrote a paper backing their proposal of VoteLinks, submitted to Web Intelligence 2005: “Page-reRank: using trusted links to re-rank authority” (pdf)).
Some other link I’ll try to digest later on: jluster on hreview, hreview on technorati, hreview on del.icio.us, organizedshopping on hreview, adriancuthbert suggested to use this_is_an_hreview as common tag (tagspace?).
It would be great to have this format widely adopted so that the amount of decentralized published reviews will become soon huge and I will have a large amount a data for what I’m working on in my PhD: Trust-aware decentralized Recommender Systems. If interested, check my (a bit outdated) PhD proposal at my papers page.
I tend to be enthusiastic about folksonomy and forget considering in what they are good and in what they are not, basically I forget to keep asking myself questions instead of blatantly state “Here we need a folksonomy! Yeahhey!!!”. Anyway, as a sort of balance, you might want to read a post by Gene Smith and one by danah that are more critics than I am (unfortunately).
I forgot about another paper I wrote: Learning Contextualised Weblog Topics (pdf) will be presented at WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics in Chiba, Japan, May 10th 2005. My boss was going to WWW2005 for presenting another paper and so we decided to submit our ongoing work to this workshop to get some feedback. We are still working with the system but we should be ready for prime time soon enough … stay tuned!
[I would have loved to meet Ethan Zuckerman that is the invited speaker at this workshop and whose work on media attention is just delicious. (I even proposed to help him in coding something for monitoring the Italian media world but it’s too bad I’m so lazy)]
If you like, check the paper Learning Contextualised Weblog Topics (pdf)
Abstract: In this paper, we examine how a topic-centric view of the Blogosphere can be created. We characterise the problems in aligning similar concepts created by a set of distributed, autonomous users and describe current iniatives to solve the problem. We introduce the Tagsocratic project, a novel initiave to solve the concept alignment problem using techniques derived from research in language acquisition among distributed, autonomous agents.
The interface of Rojo is totally unusable (at least to me), i don’t understand the interface metaphors. What attracted me was the ability to tag your friends. So a curiosity: how would you tag me?
Our vision is that the next generation of feed reading requires new forms of organization so we built in the ability to tag your world, your content, your feeds, and even your friends.
We were used to organize our bookmarks in folders, then del.icio.us came and we now appreciate folksonomies (flat taxonomies, just a set of free keywords you can attach to URLs). We are used to operating systems that allow us to categorize files (knowledge) on folders, would it make sense to have an operating system that allows us to categorize files only based on taxonomy (just add keywords to any file, all the files are in a flat pool)? I don’t know.
What I know is that the total lack of concurrency in the Operating Systems domain (actually just one global monopoly) is depriving all of us of new ideas, new paradigms, progress. If you compare it with the vibrant Web, where a new idea gets implemented and proposed almost daily, you can maybe see how far we would be if there were a free market for Operating Systems.
Anyway, how could we call it? What about FolkOS? FolkOS, the Folksonomy Operating System, I can already see the advertisements…. And, yes, I patented the idea, I got every possible TradeMark and not only on Earth. I patented FolkOS also on Venus and Alpha Centauri (venusians and alphacentaurians be aware! Don’t use my patented ideas! I have the best lawyers of the galaxy!).
[I tend to overload my emails of smilies (for expressing when I’m joking) but I don’t like them on blog posts, so I’m not sure my 4 readers understand when I (try to) make a joke. So, just to be sure, this is a joke … I think patenting computational ideas is a total nonsense (maybe a video can help in understanding why)].
In Tagwebs, Flickr, and the Human Brain, Jakob argues “a neuron in your brain is a lot like a tag in a tagweb“. A tagweb is a network of tags whose edges are the “this tag is tagged with this tag” relationship, for example he tags the tag “Victoria” with the tag “female”. He states that it is not possible to tag tags on flickr but there is a workaround. If you tag a page that “represents” a tag, you are implicitly tagging that tag and you can do it with del.icio.us. I tagged some pages representing tags with the new tag “tag_the_tag” (metatag has already another meaning due to HTML). It can be a sort of wordnet but bottom up. I’m skeptical about the rise of “tagging tags” but, if this happen, then tag spam will be an issue. Jacob ends with “I now understand how my brain works, and I can act in ways that embraces that knowledge.” that really seems an enormous excess of “technology-driven optimism”.
[New word you find in the text: metadadaism (search for metadadaism and write metadadaism in wikipedia).]
[Note for myself: an online article with colorful pictures is more likely to attract attention (at least for me) but .mov videos are bad since I have many problems watching them on my operating system libre]
[I’ll write something about my trip in Israel later on, as time permits]
I just found on HubLog an online service I was really waiting for: CiteULike (a prototype service to manage your personal library of academic papers). When you are logged in and visiting a page related to a paper, you can post that paper to your online library using a bookmarklet. In doing so, you can also specify tags, a list of keywords you’d like to associate with this article (a la del.icio.us and flickr) and optional notes. The service is very similar to del.icio.us (simple, tag-powered and social), but precisely tailored for academic papers. You can also see all the papers tagged under a certain tag (for example networks). Cool!
Some colleagues of mine are working on “how people can reach a shared common dictionary/language to denote concepts” (or at least understand each other still using their keywords). See Advertising games. We want to test ideas using real data from the blogosphere. The idea is to detect when 2 bloggers are posting about the same concept/topic but use different names to tag it (the post’s category). For example, I use “trust and reputation”, someone else uses “reputation” but we may speak about the same concept.
– There is an aggregated repository of posts with categories?
– If not, Have you any idea about how can I collect this information?
– posts must have a category associated (livejournal and blogger don’t let do this, while MovableType and WordPress yes).
Some ongoing web search about the topic we’re doing can be found at this wiki page, and this too. Thanks for help!
The FOAF workshop in Galway was almost 20 days ago, so the following report is a little bit late. Hope it can be useful at least as an historical memory.
It was fantastic to meet in flesh many people I just learnt to appreciate through their blogs. Many of the papers were very interesting. I especially like the idea of “Semantic cookies” (you keep your profile [as FOAF file] in a cookie and, with some trick, you give access to every site to it, sites can read it and give you a personalized experience) and “Bootstrapping the FOAF-Web: An Experiment in Social Network Mining” by Peter Mika (the idea is to use Google to infer social relationships among people). And there was also my paper of course. The presentation was so and so, I think I try to put too many concepts for a 15 minutes presentation. The only stuff I liked was the subtitle I wrote at the last second on the first slide: “Moleskiing: Climbing the peaks of FOAF”.
Almost half of the workshop was devoted to very interesting Breakout sessions.