Leigh asks So any signs that “tag spam” has started yet? (found because he uses “trust metrics” a keyword to which I’m subscribed in a number of service). Here I ask the same question. It seems very unlikely that web spammers (they called themselves “search engine optimizer”) cannot see in seconds the value of getting the wanted URL (of the to-be-busted book, movie, …) or photo (of to-be-busted movie, product, …) under my eyes. Afterwards, we are in the attention economy, aren’t we? Getting attention of some humans (or aggregators and, as a consequence, of many humans) on your item is the first step towards you getting reputation (and possibly money). [by the way, the same is true for this blog post].
However, if you look it from a biodiversity point of view, spam is good because forces you to evolve, to differentiate, to invent new solutions.
So, any signs of “tag spam”? If you find something, write it on wikipedia pages Spam or Spamdexing (there is nothing at the moment about this) or ask Britannica to insert it in the next version (hope you get the difference…).
But first, how to define “tag spam”? A bot is always a spammer? If you genuinely think that microsoft.com could be tagged as crap, then this is not spam? But if you tag something just in order to capture attention of other people, then this is spam? If I tag on del.icio.us this post as “folksonomy“, is this spam? If I tag my papers on CiteULike as “Cool” is this tag spam?
Rebecca pointed out that someone tagged on flickr an antisemite protest sign as “MLK” (Martin Luther King). Is this tag spam? She says “community standards” do not, indeed, can not defend against abuse of the system–only design can do that. Off the top of my head, there are several simple things Technorati could do to prevent this sort of thing from happening in the future:
And in fact, Rebecca is already starting to provide anti-spam techniques:
* Technorati could design their system not to publish any photo Flickr users have tagged “Might be offensive”.
* Technorati could create their own tagging system, and not publish any photo Technorati users tagged “Might be offensive”.
* Technorati could provide an email address so that users could alert staff if a photo was offensive or inappropriate, and then the staff could go in and tag the inappropriate photo so that it would not appear on Technorati’s site–or hand-select an appropriate one.
And in fact David Weinberger’s (implicitly) also suggesting to use a trust metric when he says
“Tags work because they’re so simple and because they are so connected to the human semantic context, but having billions of tags won’t work because they’re so simple and connected to the human semantic context. Will we be able to triangulate tags with other data – especially social data – so that we can get more out of them than we put in? It doesn’t seem impossible to me – simply knowing who created a tag lets you get more out of the tag than the person put in – but it’s not up to me to invent the stuff.”
Let me make a strong point here: “Tag Spam does not exist. What does exist are different ways of viewing stuff in the world (and I hope there will always be!). What does exist are also incentives to get attention of other people”. How can we take the most out of decentralized tagging? I think that using trust metrics we can choose to consider only tags provided by sources we deem trustworthy and exclude all the rest. There is the risk of DailyMe here: that is you will see only world classifications of people you already agree with and you will never ever get exposed to different way of thinking. I was speculating about it some time ago and leave this topic for next time.
Ok, I started with “trust metrics” and, having closed the circle, here I stop.
UPDATE: you can never stop. While I was writing 2 posts on Corante appeared that are very relevant.
In “issues of culture in ethnoclassification/folksonomy” danah argues that tagging is culture dependent. The great example about the book “Women, Fire and Dangerous Things” tells us that if someone (of a the culture described in the book) tags a picture of a woman under “danger”, this is not at all tag spam but simply a different point of view on world, a different culture (not a better or worst one).
And in Folksonomy is better for cultural values Clay replies that the same problems applies to ontologies but exacerbated and that “The aggregate good of tags is not that they create consensus or accuracy; they observably don�t, and this is very observability is much of their value.” He also reports that “But the relativity can also be interesting when crossed-tabbed with the identity of the tagger; I don�t want �toread� or �funny� generally, but I do want Liz�s �toread� tags, and Matt Webb�s �funny� links.” In my Jargon, he is here expressing a trust statement (I trust as 1/1 Liz in the context of “toberead” tag). What I propose is to use this information to automatically discover the identities trusted by Liz in the context of “toberead” context and automatically suggest them to Clay. The balance between “i keep a small and direct and controllable social network of people i really know” or” i use also automated tools that can infer, based on the global social network, how much i could trust unknown users” should be an user option in my opinion. The first is more controllable, the second is more prone to serendipity, exposure to something new and new persons but also less controllable and under risk of social attacks.
Since I’m here, there are other interesting posts I found later on navigating some of the links. They are here below:
Cheap Eats at the Semantic Web Caf�
Folksonomy Notes: Considering the Downsides, Behavioral Trends, and Adaptation
The Politically Correct Police (PCP) are making lots of noise about how “This isn’t right and SOMETHING SHOULD BE DONE”.
Technorati Tags Set for Abuse who is tagged as “Nude Celebrities” just to prouve the concept
Shapes of knowledge, word for poodles
Making use of tags and tagsonomies
Controlled Vocabularies and Folksonomies: Why Change is Good.
Social consequences of social tagging
and i guess you will find all of them on del.icio.us’s “folksonomy” tag
Interesting post.
I did have quite a long thing about this sort of spam while writing CiteULike. I did two things to try to prevent it:
1) Originally, I only permitted postings from a handful of supported sites. To get on one of those in the first place, you’d have to had your paper published in a peer-reviewed journal. So, for example, you couldn’t create a web page of your pet theories of how the universe was created in some variant Big Bang theory taking place in the contents of some vast pan-dimensional pistachio nut (and, believe me, people have emailed me asking how they can give these sort of theories more prominence on the site).
2) I then relented, and then let people post any web page they liked (and they had to type in the citation details if I couldn’t automatically extract them). However, to do this I don’t let these posted pages appear on the front page of CiteULike. I don’t let them appear in tag searches (so if you post something as “cool”, for it to be found by that tag it would at least have to have been a refereed paper). You can, of course, see them on the user’s own page (and if you’ve put that user on your watchlist then you’ll see all the articles that way), but there’s a limit to how prominent you get get any old article displayed on CiteULike.
On a more general spam not, I also do the rel=”nofollow” thing for links, to stop people trying to boost their PageRank.
So, that’s the plan. Seems to be working OK so far.
I’m not sure how my scheme can be applied to del.icio.us or flickr, but it works very well in my domain-specific case where there’s external editorial control (journal editors). If anyone does tag spam us, they’ll only be able to do so with decent quality academic articles.
I agree that you can “control” the item that get tagged, in citeulike case, scientific papers. However I don’t think you can “control” tags. If the community of citeulike will boost to 1.000.000 users then it will be interesting to observe the emergence of tags for expressing, for example, suckiness of a paper.
Can you imagine being able to create a “very_anonymous” username and then tag your boss’ papers as “sucks” or “dontreadthis” or “thisfuckingsucksdontreadit” …
Convergence on this kind of highly semantic tags will be very interesting. I think.
And by the way i can create 100 fake identities all certifying my paper as “ubercool” and “networks” and “best”. we will see we will see …
On a different topic, i think there is an european project about “objective quality metrics for scholars”: maybe you can try to apply for funding there for your citeulike …
I agree that you can “control” the item that get tagged, in citeulike case, scientific papers. However I don’t think you can “control” tags. If the community of citeulike will boost to 1.000.000 users then it will be interesting to observe the emergence of tags for expressing, for example, suckiness of a paper.
Can you imagine being able to create a “very_anonymous” username and then tag your boss’ papers as “sucks” or “dontreadthis” or “thisfuckingsucksdontreadit” …
Convergence on this kind of highly semantic tags will be very interesting. I think.
And by the way i can create 100 fake identities all certifying my paper as “ubercool” and “networks” and “best”. we will see we will see …
On a different topic, i think there is an european project about “objective quality metrics for scholars”: maybe you can try to apply for funding there for your citeulike …
Very true, but most people don’t add tags like “sucks” or “dontreadthis” to their watchlists, so it’s unlikely that this sort of negative spamming would be very widely noticed.
Nor, I suspect, do people keep an eye on things like “ubercool” in an academic setting. Most the tags in the current watchlists represent the user’s speciality, so I think about the worst your 100 fake user attack could do would be misclassify papers – and that ought to be relatively easy to spot and do something about.
Interesting post…
We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.
Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)
But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.
Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?
Interesting post…
We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.
Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)
But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.
Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?
Interesting post…
We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.
Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)
But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.
Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?
ps. my apologies if I inadvertently commentspammed you, first your MT wouldn’t accept Rebecca’s last name (citing inappropriate language, which was funny in a way) and upon correction it just hung forever so I posted again.
Yes, i know. my machine is very slow and it takes ages for accepting a comment.
I totally am with you about the no-censorship campaign.
About languages: have you heard of brazilians being the first written language on orkut? maybe something similar will happen for tags (or maybe chinese). we must work on some cross-language/ cross-culture tools if we want to leave in an open-minds-world
I think as tag systems mature, they will evolve simple features to combat spam tags.
Right now, tags are a binary thing: If something is tagged “candy”, then it could turn up in my search for “candy” things. If not, it won’t. That makes the system susceptible to tag spam, where someone tags a “drug” ad with “candy” to fool people.
The Tag2.0 approach will be to treat tags as points along a spectrum of consensus. Think of those tag clouds that show tags sized and colored according to their frequency. A search for “candy” tagged items would return results ordered or filtered by consensus (how frequently were they tagged “candy”). This would help tackle the spam problem but also weed out fringe tags (e.g. honest but obscure tags). If you wanted to widen your search to include fringe tags, you could.
Another trick would be to allow “anti-tagging”, so that hoards of honest users could add a “not candy” tag to drown out the spammer’s dishonest tag.
In the end, the strength of “social tagging” is that there are more honest and rational people than spammers and weirdos. Likewise, the sheer number of relevant and obvious tags will keep the malicious and fringe tags in check.
The success of wikipedia (and all the wisdom of the crowds things we have seen) is not “that there are more honest and rational people than spammers and weirdos”.
The same people (we humans) who leave now and create wikipedia are the same people (we humans) who were polluting usenet with noise.
Different tools are able to let wisdom come out of crowds or … noise.
But I agree about better systems for tagging in future: in flickr they already do tag clusters and you can flag offending things (tags too I think to remember).
Paolo,
Wait a second… Wikipedia may be “social software” too but its mechanism for collaboration is complete different from a tagging system (like del.icio.us, for example).
The bozos pollute Wikipedia precisely because changes are “destructive” (but reversible). That is, “the last bozo wins.”
In a tagging system, the bozos can’t take away a good tag only add bad ones.
Now imagine a Wikipedia where the last change doesn’t win, only the most popular changes are large and bold (for “consensus”) while the least popular changes were small and faded (for “controversial”).
The junk might still be there but it would be (like in tag systems) much harder to spam the page with nonsense.
The other reason “the wisdom of the crowd” works for tag systems is that uses don’t try to change (or spam) some big, public, authoratitive thing (like the Wikipedia page). Instead, users just change their version of the thing. Collectively those selfishly useful tags become collectively useful.
Uhm, not sure I got your point.
Actually I think tag systems are more easily spammable than wikipedia. As you said in Wikipedia someone can reverse your spam.
In a tag system, in general, this is not done.
For example, if I start tagging obscene or stupid photos with the tag “MLK” in flickr, I’m spamming (polluting) the tag. Imagine someone subscribed to MLK tag of photos on flickr. She will get my spam and no (simple) way to filter it out. And there is now way (for now, in Flickr) to let the wisdom of the crowds spam out the targeted tag.
Am I missing something?
No, my bad. You didn’t miss anything. I didn’t consider that while “bad tags” don’t hurt “good content”, it is possible to mis-tag “bad content” with “good tags” in order to spam feeds or searches for them. Good catch.
One remedy for that is a community “flagging” system much like Flickr allows users to flag content as offensive or email systems let you flag email as spam. This doesn’t have to be a single “spam” flag but could also be a generalized anti-tag. In your example, if enough people tagged the spam content “~mlk” (meaning “not mlk”) then your MLK feed reader could filter out spam content with that tag.
In this way, the “wisdom of the mob” would gang up on badly tagged content and run it out of their neighborhood.
Eventually, I believe tag systems (and feeds) will solve these problems by going from being binary (“has tag, doesn’t have tag”) to analog where you can filter by the ratio of good to bad tags. This is what Google’s Page Rank algorithm does now when it favors trustworthy keywords/links/domains vs. rotten keywords/links/domains.
The idea of “negative tags” or anti-tags is great!!! And what the googlenet will become of course is highly personalized filtering. No more the “wisdom of the mob” but the “wisdom of the persons you trust and appreciate” … we will go there soon enough I think, with everything under google control. No more need to think, then. ;(