31 Mar
Attacking HITS (and not PageRank)
While I think PageRank is a very clever (though simple) idea, I’m not very sure about HITS. What this algorithms are for? For predicting the quality of a page on the Web based on all the links between pages. PageRank assumes that a page linked by many pages and linked by pages of high quality (recursive!) has a good quality, i.e. it is an authority. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs.
So, why do I appreciate PageRank and less HITS? Because the latter can be easily attacked. The PageRank of this page depends only on the pages linking to this page and I cannot easily force everyone on the web to link to this page. It depends on what other pages decide to link and I have no power over it.
Conversely, according to HITS, the hubness of this page depend on the pages this page link to, and I have total power over the pages I link to! Do I want this page to become an hub about cars? It is enough to link to (what I think are) cars authorities: bmw, mercedes, ferrari, ford, renault, … (fiat is better not). Then do I want to exploit the hubness score this page got? I would simply link also to crappyCarsISell.com. HITS thinks this page is an hub and, since an hub by definition points to authorities, hence HITS thinks crappyCarsISell.com is a car authority.
What matters is Direction of links! I have no control on links that go in my page but I have total control in links that go out of my page. Anyway I think the work by Kleinberg is simply great but HITS does not take into account the fact that users will always try to game systems (especially, but not only, if they have an immediate benefit).
… I was almost forgotting the initial reason of this post: I got remind about HITS reading Lexical authorities in an encyclopedic corpus: a case study with Wikipedia by my friend Francesco, whose blog I just discovered today via a comment he left here. And this means one less friend without a blog! Welcome Francesco!
If you enjoyed reading this, subscribe to my RSS Feed
(you can always unsubscribe later)








Posted by Francesco on 31.03.05 at 4:27 pm
Thank you for your warm welcome,
regarding the main topic of your post, I think your analysis is interesting, but I somewhat disagree:
1) It seems to me (I’m no expert, however) that the “hubness” is somewhat used only as a mean to compute “authority”, which is often the only value which is used in search engine rankings: of course you can create some fake “hub” pages to promote the authority of other pages, but this is the same stategy used to fool pagerank
2) In the wikipedia experiment, we considered only authority, since hubness results were’t really significant or interesting (or perhaps our analysis was not accurate!)
3) *More important:* links in web pages are “real content” (whereas, for example, a judgement about a product in a recomendation system is not a product). The consequence is that a “fake” hub… is still a good hub. To make a fake hub, you need to perform the time consuming (and useful) task of choosing authoritative pages, and this kind of work is exactly what “hubness” rewards!
4) hubness and authority are orthogonal: you can “fake” a hub, but not an authoritative hub
Francesco
Posted by paolo on 31.03.05 at 4:27 pm
Not very warm actually, I almost forgot to do it! ;-)
anyway, I still believe that
- it is easy to become an hub (i can create a bot in few minutes doing it: directories at dmoz are probably good hubs, creting a local mirror makes your local mirror a good hub)
- then i add to my local mirror (with good hubness) the link to the page whose authority i want to increase.
easy and working.
i think hits cannot be used on the web maybe on some intranet web.
Posted by paolo on 31.03.05 at 4:27 pm
Not very warm actually, I almost forgot to do it! ;-)
anyway, I still believe that
- it is easy to become an hub (i can create a bot in few minutes doing it: directories at dmoz are probably good hubs, creting a local mirror makes your local mirror a good hub)
- then i add to my local mirror (with good hubness) the link to the page whose authority i want to increase.
easy and working.
i think hits cannot be used on the web maybe on some intranet web.
Posted by Francesco Bellomi on 31.03.05 at 4:27 pm
hmm, I see your point…
Francesco
Posted by Amir Michail on 31.03.05 at 4:27 pm
I have been working on a variation of hubs & authorities that encourages timely and helpful search engine rankings:
http://www.cse.unsw.edu.au/~amichail/collabrank
http://www.cse.unsw.edu.au/~amichail/collabrank/collabrank.pdf
Posted by Cai on 31.03.05 at 4:27 pm
Hmm, I do agree with Paolo and disagree with Francesco. In fact, I think that Paolo’s point is excellent :-)
@Francesco: PageRank is *not* equally suceptive to attacks as HITS, as you have to create rank totally on your own: in other words, you have to “drill down” into the network and create enough peers to trust you (some sort of recursive back-stepping). On the other hand, for HITS, you just need two steps.
1) Create good hubs, which is easy, you just link to good authorities and they cannot do anything against that since you can link to anything you want :-)
2) create good authorities by having your good hubs link to your authorities.
With PageRank, it doesn’t work since you need to have the good authorities link to do - and they’re not going to do it if you’re a malicious rank grabber ;-) On the other hand, with HITS, you can short-circuit this security mechanism through step 1).
Have fun boys and I’m eagerly awaiting your comments!
BTW: Hey Paolo, what’s up with you, long time no hear, my Italian friend!
Posted by Cai on 31.03.05 at 4:27 pm
Hmm, I do agree with Paolo and disagree with Francesco. In fact, I think that Paolo’s point is excellent :-)
@Francesco: PageRank is *not* equally suceptive to attacks as HITS, as you have to create rank totally on your own: in other words, you have to “drill down” into the network and create enough peers to trust you (some sort of recursive back-stepping). On the other hand, for HITS, you just need two steps.
1) Create good hubs, which is easy, you just link to good authorities and they cannot do anything against that since you can link to anything you want :-)
2) create good authorities by having your good hubs link to your authorities.
With PageRank, it doesn’t work since you need to have the good authorities link to do - and they’re not going to do it if you’re a malicious rank grabber ;-) On the other hand, with HITS, you can short-circuit this security mechanism through step 1).
Have fun boys and I’m eagerly awaiting your comments!
BTW: Hey Paolo, what’s up with you, long time no hear, my Italian friend!
Posted by paolo on 31.03.05 at 4:27 pm
Hi cai!
writing a paper for web intelligence conference and (as always) late ;-(
but not a randomly generated paper (see next post)
talk to you more later hopwfully
this and other attacks are collectable at
http://moloko.itc.it/trustmetricswiki/moin.cgi/TrustMetricsAttacks
in case you (both) wanna write a paper on this lemme know (wanna? lemme?!? …)
Posted by Zbigniew Lukasiak on 31.03.05 at 4:27 pm
One countermeasure would be to normalize the hubness - that is divide the number of links to authoritative pages by the number of all links on the page. But still this would let to create sites with big hubness but still meaningless - because the a random list of authoritative pages is not very usefull.
Personally I think the division to hubs and authorities is a bit artificial.