Comments to “A cognitive analysis of tagging” by Rashmi Sinha

I wrote this comment to the great post A cognitive analysis of tagging (or how the lower cognitive cost of tagging makes it popular) but it does not appear in the comments so I post it here.

Wow, I overenjoyed your short-enough essay. Extremely clear!
Might I suggest you 3 additional topics you might want to consider and include in your struggle for understanding? I would do it myself but I’ll never be able to write as clearly as you ;-)

1) Visit http://cloudalicio.us/tagcloud.php?url=http://boingboing.net/
The graph shows the evolution in time of the tags used to tag a specific URL (in this case http://boingboing.net). You may notice that in the beginning people were using more “blogs” and now people use more “blog”. This suggests people are moving from a category-like way of using del.icio.us (I put boingboing in the “blogs” folder that contains all the blogs) to a tag-like way of using del.icio.us (I name boingboing as a prototype of the class “blog”).
Someone was making this point (surely more clearly) on some blog but I could not find it again. Anyway this is true also for other blogs and this is real, thriving evidence.

2) At http://www.blumpy.org/tagwebs/ there is another “cognitive” approach to the tagweb (or tagspace or tagsphere).
I wrote about it at http://moloko.itc.it/paoloblog/archives/2005/02/04/tag_the_tag_tag_and_metadadaism.html
Jakob argues “a neuron in your brain is a lot like a tag in a tagweb”. A tagweb is a network of tags whose edges are the “this tag is tagged with this tag” relationship, for example he tags the tag “Victoria” with the tag “female”.
Will it be possible/useful to let users tag the tags themselves?

3) Of course it would be better to have people tagging stuff in a way that makes sense to them but, as soon as tags are public (everyone can see them), there is concern about tag spam (I tag something with a certain tag so that other people will be exposed to it). This is not a problem when tags are private, for example for the tag you use in your gmail account: no big deal in spamming yourself, no?
I wrote about it at
http://moloko.itc.it/paoloblog/archives/2005/01/29/what_is_tag_spam_or_better_tag_spam_exists.html (from where you can find interesting links). Or check the image at http://www.micropersuasion.com/2005/07/yahoo_myweb_bec.html
In order to make better tag systems (I think this is one of your goals), we must take into account this issue as well. Of course one simple solution would be to give you the possibility to see only resources tagged by friends (flickr and Y!MyWeb2.0 let you do this) or friends of friends, i.e. users deemed trustworthy by a simple and customizable trust metric. What do you think?

AAAI05: terrific talk by Marty Tenenbaum

AI Meets Web 2.0: Building The Web of Tomorrow Today by Dr. Jay M. Tenenbaum.
Terrific terrific talk, fascinating. I should have podcasted it because you really missed something (except I have nothing to record audio on, would you consider sending me your old mp3 recorder pen?). I was so excited during the talk that I happened to take a photo of almost any slide. Actually the slides were 94 and I photoed 59 of them! Incredible to me as well.
Anyway, you might want to read the slides (pdf) or maybe you want to have a look at my pictures (possibly as a slideshow).
He introduced all the stuff I enjoy, such as Blogs, RSS, wiki (wikipedia), folksonomies, tags, flickr, Del.icio.us, microformats (aka Lower case semantic web), technorati, pubsub, greasemonkey (bookburro, greasemap) and much more; all tied together in a fascinating, convincing, making-sense manner!
After his presentation, we spoke about my research and he seemed interested. He invited me to visit commerce.net for one month or so and I have to say that I really like the idea. I spoke also with Rohit Khare that is actually working with Tenenbaum and he has a whole bunch of very clever, fascinating, realizable ideas that would really make an impact. They also underline more than once that this kind of architecture/language-of-web2.0 projects should be open source and I totally agree with them and like it.
Actually after the presentation, while I was speaking with Marty and Rohit, there was also Jesse Andrews, the creator of the mind-blowing book burro (actually he got most of the attention, totally deserved by the way). I guess it should be too cool having someone presenting your hack on a conference and then go to meet that person and say “You know the Book Burro extension you presented? Well, I’m the creator of it!”. Cool! If you want to see how Jesse looks like, here is a picture of him and wait some more great hacks from him in few days.

Visualizing time trends in how a site is tagged on del.icio.us: cloudalicious

The previous entry was about “powerlaws in the use of tags on del.icio.us”. Then at http://del.icio.us/tag/powerlaw, i found Pietro Speroni’s great post Tagclouds and cultural changes that (also) introduces cloudalicious, a one-night project of Terrell Russell. Cloudalicious shows the evolution in time of the tags used to tag any page on del.icio.us. Very very cool!!!
I tried to find a URL that was showing a non-converging behaviour but I failed. (Pietro was already providing some examples of sites presenting interesting trends in tags use.) Are your able to find at least one controversial URL? A site for which there was a great swift in time in the tags used for it.
For your information, I already tried with sites tagged on del.icio.us under controversial tags (such as abortion, scientology, jew), I tried with microsoft.com (as I was thinking may people would have tagged it as evil but this is not the case [in general people tend to tag what they like and less what they don’t like in order not to increase the visibility of it, so I tried with “terri schiavo blog” that was very visible for a short period of time and I was suspecting the “tasteless” or “awful” tags were much more and growing over time but this is not the case]).
The only one with a little bit of variance over time I was able to find is boingboing.net. See cloudalicious for http://boingboing.netcloudgraph_boingboing.jpgDel.icio.users seem to recognize it as a news site as time passes by. And it also seems that Del.icio.users are moving from “blogs” to “blog” as tag (common pattern or just for boingboing?).
There is some variance also with http://del.icio.us itself: see cloudalicious for http://del.icio.us
So I just repeat the small challenge: Can you find a URL that presents non-converging tags use?

Small suggestion for Terrell Russell (I write it here since I was not able to find his email address on his web site). [I’m sure he probably has already figured out by itself this suggestion since he was so good to put together in one night a great tool!]
Cloudalicious interface at the moment asks for These URLs (that) can be found at del.icio.us - they’re the red “and X other people” links. (for example, http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b represents http://boingboing.net/).
ec08a8ddfda4f2f9cad3a142dc49e23b is the md5sum of http://boingboing.net/
There are 2 easy way to obtain it automatically: (1) run md5sum on the server, (2) use http://del.icio.us/url?url=http://… (in which http://… can be replaced by the website we want to cloudicious).
In this way, users could enter in the Cloudicious interface, the real URL they are interested in (http://boingboing.net) and not the less easy to find (http://del.icio.us/url/ec08a8ddfda4f2f9cad3a142dc49e23b)
A bookmarklet and a greasemonkey extension (working on the site the user is browsing) are left as easy exercise for the reader as well ;-)

Lastly, let me mention that one of the key point of Clay Shirky in Ontology is Overrated: Categories, Links, and Tags that is also present is Pietro’s post is that the correct way of categorizing something does not exist (initial Yahoo! approach was trying to force this and failed and librarians still (must) try to adopt this semplifying but wrong assumption). Instead there are as many correct ways of categorizing a thing as there are users. This resonates with my study on controversial users on Epinions (pdf): the idea that there is a global value of trustworthiness/reputation for every user/peer in the system does not make sense but still most of the papers in the reputation/trust literature start with this wrong and misleading assumption.

UPDATE: I just found it now but Pietro in
On Tag Clouds, Metric, Tag Sets and Power Laws was already mentioning that the paper by Clay Shirky “Power Laws, Weblogs, and Inequality” started to be tagged as longtail only after the article from Wired: The Long Tail came out. See cloudalicious for http://www.shirky.com/writings/powerlaw_weblog.html.

Use of Tags on del.icio.us follows a powerlaw

I read the wonderful Ontology is Overrated: Categories, Links, and Tags by Clay Shirky (highly recommended! Read it all!). Near the end, he speaks about “Tag Distributions on del.icio.us” and shows a graph that resembles a powerlaw (even if this is about only 2 hours of activity of 64 del.icio.users). After 2 weeks of powerlaws, I see powerlaws everywhere and I thought “let’s try to test the hypothesis on a bigger dataset from del.icio.us”. Well, few googling-minutes told me that many people had already had this idea and already performed tests on del.icio.us.
And of course many of them can be found looking at http://del.icio.us/tag/powerlaw (the del.icio.us page that shows all the URLs tagged under “powerlaw”) [this is kind of uber-cool-self-referentialism].
Among the many, I just cite http://www.cozy.org/d/
(from which the image shown here is taken), where 84 popular URLs are studied and shown to exhibit a powerlaw structure (in the tags used for them). I suspect the value of del.icio.us can be found in the long tail of tagging as well.
Each dot on the log-log charts represent a tag. The most used tag appears to the left while the least appears to the right. All charts have the same x and y range, .5 to 1350; so the slope of these lines is about -1.

Folksonomies criticism

I tend to be enthusiastic about folksonomy and forget considering in what they are good and in what they are not, basically I forget to keep asking myself questions instead of blatantly state “Here we need a folksonomy! Yeahhey!!!”. Anyway, as a sort of balance, you might want to read a post by Gene Smith and one by danah that are more critics than I am (unfortunately).

New paper: Learning Contextualised Weblog Topics

I forgot about another paper I wrote: Learning Contextualised Weblog Topics (pdf) will be presented at WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics in Chiba, Japan, May 10th 2005. My boss was going to WWW2005 for presenting another paper and so we decided to submit our ongoing work to this workshop to get some feedback. We are still working with the system but we should be ready for prime time soon enough … stay tuned!
[I would have loved to meet Ethan Zuckerman that is the invited speaker at this workshop and whose work on media attention is just delicious. (I even proposed to help him in coding something for monitoring the Italian media world but it’s too bad I’m so lazy)]

If you like, check the paper Learning Contextualised Weblog Topics (pdf)
Abstract: In this paper, we examine how a topic-centric view of the Blogosphere can be created. We characterise the problems in aligning similar concepts created by a set of distributed, autonomous users and describe current iniatives to solve the problem. We introduce the Tagsocratic project, a novel initiave to solve the concept alignment problem using techniques derived from research in language acquisition among distributed, autonomous agents.

Tag your friends

The interface of Rojo is totally unusable (at least to me), i don’t understand the interface metaphors. What attracted me was the ability to tag your friends. So a curiosity: how would you tag me?
Our vision is that the next generation of feed reading requires new forms of organization so we built in the ability to tag your world, your content, your feeds, and even your friends.

FolkOS: Folksonomy Operating System

We were used to organize our bookmarks in folders, then del.icio.us came and we now appreciate folksonomies (flat taxonomies, just a set of free keywords you can attach to URLs). We are used to operating systems that allow us to categorize files (knowledge) on folders, would it make sense to have an operating system that allows us to categorize files only based on taxonomy (just add keywords to any file, all the files are in a flat pool)? I don’t know.
What I know is that the total lack of concurrency in the Operating Systems domain (actually just one global monopoly) is depriving all of us of new ideas, new paradigms, progress. If you compare it with the vibrant Web, where a new idea gets implemented and proposed almost daily, you can maybe see how far we would be if there were a free market for Operating Systems.
Anyway, how could we call it? What about FolkOS? FolkOS, the Folksonomy Operating System, I can already see the advertisements…. And, yes, I patented the idea, I got every possible TradeMark and not only on Earth. I patented FolkOS also on Venus and Alpha Centauri (venusians and alphacentaurians be aware! Don’t use my patented ideas! I have the best lawyers of the galaxy!).
[I tend to overload my emails of smilies (for expressing when I’m joking) but I don’t like them on blog posts, so I’m not sure my 4 readers understand when I (try to) make a joke. So, just to be sure, this is a joke … I think patenting computational ideas is a total nonsense (maybe a video can help in understanding why)].

“Tag_the_tag” tag and metadadaism

In Tagwebs, Flickr, and the Human Brain, Jakob argues “a neuron in your brain is a lot like a tag in a tagweb“. A tagweb is a network of tags whose edges are the “this tag is tagged with this tag” relationship, for example he tags the tag “Victoria” with the tag “female”. He states that it is not possible to tag tags on flickr but there is a workaround. If you tag a page that “represents” a tag, you are implicitly tagging that tag and you can do it with del.icio.us. I tagged some pages representing tags with the new tag “tag_the_tag” (metatag has already another meaning due to HTML). It can be a sort of wordnet but bottom up. I’m skeptical about the rise of “tagging tags” but, if this happen, then tag spam will be an issue. Jacob ends with “I now understand how my brain works, and I can act in ways that embraces that knowledge.” that really seems an enormous excess of “technology-driven optimism”.
[New word you find in the text: metadadaism (search for metadadaism and write metadadaism in wikipedia).]
[Note for myself: an online article with colorful pictures is more likely to attract attention (at least for me) but .mov videos are bad since I have many problems watching them on my operating system libre]

What is “Tag Spam”? Or better, Tag Spam exists?

Leigh asks So any signs that “tag spam” has started yet? (found because he uses “trust metrics” a keyword to which I’m subscribed in a number of service). Here I ask the same question. It seems very unlikely that web spammers (they called themselves “search engine optimizer”) cannot see in seconds the value of getting the wanted URL (of the to-be-busted book, movie, …) or photo (of to-be-busted movie, product, …) under my eyes. Afterwards, we are in the attention economy, aren’t we? Getting attention of some humans (or aggregators and, as a consequence, of many humans) on your item is the first step towards you getting reputation (and possibly money). [by the way, the same is true for this blog post].
However, if you look it from a biodiversity point of view, spam is good because forces you to evolve, to differentiate, to invent new solutions.

So, any signs of “tag spam”? If you find something, write it on wikipedia pages Spam or Spamdexing (there is nothing at the moment about this) or ask Britannica to insert it in the next version (hope you get the difference…).

But first, how to define “tag spam”? A bot is always a spammer? If you genuinely think that microsoft.com could be tagged as crap, then this is not spam? But if you tag something just in order to capture attention of other people, then this is spam? If I tag on del.icio.us this post as “folksonomy“, is this spam? If I tag my papers on CiteULike as “Cool” is this tag spam?
Rebecca pointed out that someone tagged on flickr an antisemite protest sign as “MLK” (Martin Luther King). Is this tag spam? She says “community standards” do not, indeed, can not defend against abuse of the system–only design can do that. Off the top of my head, there are several simple things Technorati could do to prevent this sort of thing from happening in the future:
And in fact, Rebecca is already starting to provide anti-spam techniques:
* Technorati could design their system not to publish any photo Flickr users have tagged “Might be offensive”.
* Technorati could create their own tagging system, and not publish any photo Technorati users tagged “Might be offensive”.
* Technorati could provide an email address so that users could alert staff if a photo was offensive or inappropriate, and then the staff could go in and tag the inappropriate photo so that it would not appear on Technorati’s site–or hand-select an appropriate one.

And in fact David Weinberger’s (implicitly) also suggesting to use a trust metric when he says
“Tags work because they’re so simple and because they are so connected to the human semantic context, but having billions of tags won’t work because they’re so simple and connected to the human semantic context. Will we be able to triangulate tags with other data - especially social data - so that we can get more out of them than we put in? It doesn’t seem impossible to me - simply knowing who created a tag lets you get more out of the tag than the person put in - but it’s not up to me to invent the stuff.”

Let me make a strong point here: “Tag Spam does not exist. What does exist are different ways of viewing stuff in the world (and I hope there will always be!). What does exist are also incentives to get attention of other people”. How can we take the most out of decentralized tagging? I think that using trust metrics we can choose to consider only tags provided by sources we deem trustworthy and exclude all the rest. There is the risk of DailyMe here: that is you will see only world classifications of people you already agree with and you will never ever get exposed to different way of thinking. I was speculating about it some time ago and leave this topic for next time.
Ok, I started with “trust metrics” and, having closed the circle, here I stop.

UPDATE: you can never stop. While I was writing 2 posts on Corante appeared that are very relevant.
In “issues of culture in ethnoclassification/folksonomy” danah argues that tagging is culture dependent. The great example about the book “Women, Fire and Dangerous Things” tells us that if someone (of a the culture described in the book) tags a picture of a woman under “danger”, this is not at all tag spam but simply a different point of view on world, a different culture (not a better or worst one).
And in Folksonomy is better for cultural values Clay replies that the same problems applies to ontologies but exacerbated and that “The aggregate good of tags is not that they create consensus or accuracy; they observably don�t, and this is very observability is much of their value.” He also reports that “But the relativity can also be interesting when crossed-tabbed with the identity of the tagger; I don�t want �toread� or �funny� generally, but I do want Liz�s �toread� tags, and Matt Webb�s �funny� links.” In my Jargon, he is here expressing a trust statement (I trust as 1/1 Liz in the context of “toberead” tag). What I propose is to use this information to automatically discover the identities trusted by Liz in the context of “toberead” context and automatically suggest them to Clay. The balance between “i keep a small and direct and controllable social network of people i really know” or” i use also automated tools that can infer, based on the global social network, how much i could trust unknown users” should be an user option in my opinion. The first is more controllable, the second is more prone to serendipity, exposure to something new and new persons but also less controllable and under risk of social attacks.

Since I’m here, there are other interesting posts I found later on navigating some of the links. They are here below:
Cheap Eats at the Semantic Web Caf�
Folksonomy Notes: Considering the Downsides, Behavioral Trends, and Adaptation
The Politically Correct Police (PCP) are making lots of noise about how “This isn’t right and SOMETHING SHOULD BE DONE”.
Technorati Tags Set for Abuse who is tagged as “Nude Celebrities” just to prouve the concept
Shapes of knowledge, word for poodles
Making use of tags and tagsonomies
Controlled Vocabularies and Folksonomies: Why Change is Good.
Social consequences of social tagging
and i guess you will find all of them on del.icio.us’s “folksonomy” tag