14 thoughts on “What is “Tag Spam”? Or better, Tag Spam exists?

  1. Richard Cameron

    Interesting post.

    I did have quite a long thing about this sort of spam while writing CiteULike. I did two things to try to prevent it:

    1) Originally, I only permitted postings from a handful of supported sites. To get on one of those in the first place, you’d have to had your paper published in a peer-reviewed journal. So, for example, you couldn’t create a web page of your pet theories of how the universe was created in some variant Big Bang theory taking place in the contents of some vast pan-dimensional pistachio nut (and, believe me, people have emailed me asking how they can give these sort of theories more prominence on the site).

    2) I then relented, and then let people post any web page they liked (and they had to type in the citation details if I couldn’t automatically extract them). However, to do this I don’t let these posted pages appear on the front page of CiteULike. I don’t let them appear in tag searches (so if you post something as “cool”, for it to be found by that tag it would at least have to have been a refereed paper). You can, of course, see them on the user’s own page (and if you’ve put that user on your watchlist then you’ll see all the articles that way), but there’s a limit to how prominent you get get any old article displayed on CiteULike.

    On a more general spam not, I also do the rel=”nofollow” thing for links, to stop people trying to boost their PageRank.

    So, that’s the plan. Seems to be working OK so far.

    I’m not sure how my scheme can be applied to del.icio.us or flickr, but it works very well in my domain-specific case where there’s external editorial control (journal editors). If anyone does tag spam us, they’ll only be able to do so with decent quality academic articles.

  2. paolo

    I agree that you can “control” the item that get tagged, in citeulike case, scientific papers. However I don’t think you can “control” tags. If the community of citeulike will boost to 1.000.000 users then it will be interesting to observe the emergence of tags for expressing, for example, suckiness of a paper.
    Can you imagine being able to create a “very_anonymous” username and then tag your boss’ papers as “sucks” or “dontreadthis” or “thisfuckingsucksdontreadit” …
    Convergence on this kind of highly semantic tags will be very interesting. I think.
    And by the way i can create 100 fake identities all certifying my paper as “ubercool” and “networks” and “best”. we will see we will see …

    On a different topic, i think there is an european project about “objective quality metrics for scholars”: maybe you can try to apply for funding there for your citeulike …

  3. paolo

    I agree that you can “control” the item that get tagged, in citeulike case, scientific papers. However I don’t think you can “control” tags. If the community of citeulike will boost to 1.000.000 users then it will be interesting to observe the emergence of tags for expressing, for example, suckiness of a paper.
    Can you imagine being able to create a “very_anonymous” username and then tag your boss’ papers as “sucks” or “dontreadthis” or “thisfuckingsucksdontreadit” …
    Convergence on this kind of highly semantic tags will be very interesting. I think.
    And by the way i can create 100 fake identities all certifying my paper as “ubercool” and “networks” and “best”. we will see we will see …

    On a different topic, i think there is an european project about “objective quality metrics for scholars”: maybe you can try to apply for funding there for your citeulike …

  4. Richard Cameron

    Very true, but most people don’t add tags like “sucks” or “dontreadthis” to their watchlists, so it’s unlikely that this sort of negative spamming would be very widely noticed.

    Nor, I suspect, do people keep an eye on things like “ubercool” in an academic setting. Most the tags in the current watchlists represent the user’s speciality, so I think about the worst your 100 fake user attack could do would be misclassify papers – and that ought to be relatively easy to spot and do something about.

  5. Michiel

    Interesting post…

    We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.

    Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)

    But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.

    Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
    I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?

  6. Michiel

    Interesting post…

    We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.

    Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)

    But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.

    Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
    I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?

  7. Michiel

    Interesting post…

    We can already observe the different culture issues; I sometimes tag flickr images in dutch (my native language) and I often wonder what will happen to tags when the massive spanish-speaking internet community starts usings tags en masse.

    Theoretically, it would be possible for a spammer to write a script to register and control a massive number of, for instance, del.icio.us users to push links to spam. And yes, that would probably push the operator of that service to some counter-action (and so the dance begins anew)

    But I find that so remote from Rebeccas complaint about some antisemite picture getting linked to a Martin Luther King tag that it’s a in a different category altogether.

    Whenever I read something about her now-famous post I can’t help but picture some hysterical housewife screaming “won’t somebody please think of the children?” and grit my teeth. Censorship sucks, in whatever disguise. MLK lived in a time where racial segregation was an issue, and intolerance, and in the end he was assasinated.
    I value his accomplishments, a share a great number of his values and mourn his loss, but at the same time I find it perfectly acceptable that the MLK tag gets linked to just those things he was against. How can you know light without knowing the dark?

    ps. my apologies if I inadvertently commentspammed you, first your MT wouldn’t accept Rebecca’s last name (citing inappropriate language, which was funny in a way) and upon correction it just hung forever so I posted again.

  8. paolo

    Yes, i know. my machine is very slow and it takes ages for accepting a comment.
    I totally am with you about the no-censorship campaign.
    About languages: have you heard of brazilians being the first written language on orkut? maybe something similar will happen for tags (or maybe chinese). we must work on some cross-language/ cross-culture tools if we want to leave in an open-minds-world

  9. SJones

    I think as tag systems mature, they will evolve simple features to combat spam tags.

    Right now, tags are a binary thing: If something is tagged “candy”, then it could turn up in my search for “candy” things. If not, it won’t. That makes the system susceptible to tag spam, where someone tags a “drug” ad with “candy” to fool people.

    The Tag2.0 approach will be to treat tags as points along a spectrum of consensus. Think of those tag clouds that show tags sized and colored according to their frequency. A search for “candy” tagged items would return results ordered or filtered by consensus (how frequently were they tagged “candy”). This would help tackle the spam problem but also weed out fringe tags (e.g. honest but obscure tags). If you wanted to widen your search to include fringe tags, you could.

    Another trick would be to allow “anti-tagging”, so that hoards of honest users could add a “not candy” tag to drown out the spammer’s dishonest tag.

    In the end, the strength of “social tagging” is that there are more honest and rational people than spammers and weirdos. Likewise, the sheer number of relevant and obvious tags will keep the malicious and fringe tags in check.

  10. paolo Post author

    The success of wikipedia (and all the wisdom of the crowds things we have seen) is not “that there are more honest and rational people than spammers and weirdos”.
    The same people (we humans) who leave now and create wikipedia are the same people (we humans) who were polluting usenet with noise.
    Different tools are able to let wisdom come out of crowds or … noise.
    But I agree about better systems for tagging in future: in flickr they already do tag clusters and you can flag offending things (tags too I think to remember).

  11. SJones

    Paolo,

    Wait a second… Wikipedia may be “social software” too but its mechanism for collaboration is complete different from a tagging system (like del.icio.us, for example).

    The bozos pollute Wikipedia precisely because changes are “destructive” (but reversible). That is, “the last bozo wins.”

    In a tagging system, the bozos can’t take away a good tag only add bad ones.

    Now imagine a Wikipedia where the last change doesn’t win, only the most popular changes are large and bold (for “consensus”) while the least popular changes were small and faded (for “controversial”).

    The junk might still be there but it would be (like in tag systems) much harder to spam the page with nonsense.

    The other reason “the wisdom of the crowd” works for tag systems is that uses don’t try to change (or spam) some big, public, authoratitive thing (like the Wikipedia page). Instead, users just change their version of the thing. Collectively those selfishly useful tags become collectively useful.

  12. paolo Post author

    Uhm, not sure I got your point.
    Actually I think tag systems are more easily spammable than wikipedia. As you said in Wikipedia someone can reverse your spam.
    In a tag system, in general, this is not done.
    For example, if I start tagging obscene or stupid photos with the tag “MLK” in flickr, I’m spamming (polluting) the tag. Imagine someone subscribed to MLK tag of photos on flickr. She will get my spam and no (simple) way to filter it out. And there is now way (for now, in Flickr) to let the wisdom of the crowds spam out the targeted tag.
    Am I missing something?

  13. SJones

    No, my bad. You didn’t miss anything. I didn’t consider that while “bad tags” don’t hurt “good content”, it is possible to mis-tag “bad content” with “good tags” in order to spam feeds or searches for them. Good catch.

    One remedy for that is a community “flagging” system much like Flickr allows users to flag content as offensive or email systems let you flag email as spam. This doesn’t have to be a single “spam” flag but could also be a generalized anti-tag. In your example, if enough people tagged the spam content “~mlk” (meaning “not mlk”) then your MLK feed reader could filter out spam content with that tag.

    In this way, the “wisdom of the mob” would gang up on badly tagged content and run it out of their neighborhood.

    Eventually, I believe tag systems (and feeds) will solve these problems by going from being binary (“has tag, doesn’t have tag”) to analog where you can filter by the ratio of good to bad tags. This is what Google’s Page Rank algorithm does now when it favors trustworthy keywords/links/domains vs. rotten keywords/links/domains.

  14. paolo Post author

    The idea of “negative tags” or anti-tags is great!!! And what the googlenet will become of course is highly personalized filtering. No more the “wisdom of the mob” but the “wisdom of the persons you trust and appreciate” … we will go there soon enough I think, with everything under google control. No more need to think, then. ;(

Leave a Reply

Your email address will not be published. Required fields are marked *