New paper: “Page-reRank: using trusted links to re-rank authority”

I uploaded another paper of mine in the papers section. This is still under review for the Web Intelligence 2005 conference and is titled “Page-reRank: using trusted links to re-rank authority” (pdf). Let me know what you think of it, if you like.

Abstract The basis of much of the intelligence on the Web is the hyperlink structure which represents an organising principle based on the human facility to be able to discriminate between relevant and irrelevant material. Second generation search engines like Google make use of this structure to infer the authority of particular web pages. However, the linking mechanism provided by HTML does not allow the author to express different types of links such as positive or negative endorsements of page content. Consequently, algorithms like PageRank produce rankings that do not capture the different intentions of web authors. In this paper, we review some of the initiatives for adding simple semantic extensions to the link mechanism. Using a large real world data set, we demonstrate the different page rankings produced by considering extra semantic information in page links. We conclude that Web intelligence would benefit in adoption of languages that allow authors easily encode simple semantic extensions to their hyperlinks.

Paper accepted at AAAI05: “Controversial Users demand Local Trust Metrics: an Experimental Study on Epinions.com Community”

A paper of mine titled “Controversial Users demand Local Trust Metrics: an Experimental Study on Epinions.com Community” (pdf) got accepted for the Twentieth National Conference on Artificial Intelligence (AAAI-05)! Cool! The email I received this morning says “Your paper was one of 148 accepted to AAAI-05, out of 803 submissions. AAAI is a highly selective conference, and you are to be congratulated on your paper’s acceptance.” This means acceptance rate is 18%. Let me know if you like/dislike the paper or want to discuss its topic a bit. I think controversiality is an important theme and I think there are too many papers that assume that every user/agent has a global goodness value that is the same for everyone (there are some users that are bad for everyone and the goal of the technique is to spot them out). This assumption is unrealistic: just think of Bush or Berlusconi … some people like them (yeah, I know it’s kinda incredible) and some other don’t. My paper hopefully provide some evidence about this intuitive phenomena. You might also want to check other papers of mine.

Title: Controversial Users demand Local Trust Metrics: an Experimental Study on Epinions.com Community
Abstract: In today’s connected world it is possible and very common to interact with unknown people, whose reliability is unknown. Trust Metrics are a recently proposed technique for answering questions such as “Should I trust this user?”. However, most of the current research
assumes that every user has a global quality score and that the goal of the technique is just to predict this correct value. We show, on data from a real and large user community, epinions.com, that such an assumption is not realistic because there is a signi cant
portion of what we call controversial users, users who are trusted and distrusted by many. A global agreement about the trustworthiness value of these users cannot exist. We argue, using computational experiments, that the existence of controversial users (a normal phenomena in societies) demands Local Trust Metrics, techniques able to predict the trustworthiness of an user in a personalized way, depending on the very personal view of the judging user.

Randomly-generated paper accepted for a conference!

Too funny, too sad. SCIgen is an Automatic Computer Science Paper Generator. The program (GPL-licenced and hence Free Software) generates random Computer Science research papers, including graphs, figures, and citations. I was thinking about doing something like it since a lot of time, but wait … one of the random paper got accepted for a conference!!!
One useful purpose for such a program is to auto-generate submissions to “fake” conferences; that is, conferences with no quality standards, which exist only to make money. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (for example, check out the gibberish on the WMSCI 2005 website). Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005! See Examples for more details.
The accepted paper is Rooter: A Methodology for the Typical Unification of Access Points and Redundancy by Jeremy Stribling, Daniel Aguayo and Maxwell Krohn and the “authors” say We are currently working on the “camera-ready”, and received many donations to send us to the conference, so that we can give a randomly-generated talk. Ehi, researcher! You can cite it! After all it is a published paper! Not the crappy stuff you find on blogs! Beware, never cite an online article, only articles published on the old paper at one of the millions of crappy iper-expensive conferences!
And, in case you want to cite a paper of mine, I just created “A Case for Randomized Algorithms” and “Comparing XML and Markov Models” or you can just generate a new paper for me. Writing a paper is now easier than ever!!! I need to click 8 more times on this link and then I can just spend one year on holidays since I already produced a good amount of papers.
[I found the news on BoingBoing, a blog reseachers should cite sometime…]

Attacking HITS (and not PageRank)

While I think PageRank is a very clever (though simple) idea, I’m not very sure about HITS. What this algorithms are for? For predicting the quality of a page on the Web based on all the links between pages. PageRank assumes that a page linked by many pages and linked by pages of high quality (recursive!) has a good quality, i.e. it is an authority. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs.
So, why do I appreciate PageRank and less HITS? Because the latter can be easily attacked. The PageRank of this page depends only on the pages linking to this page and I cannot easily force everyone on the web to link to this page. It depends on what other pages decide to link and I have no power over it.
Conversely, according to HITS, the hubness of this page depend on the pages this page link to, and I have total power over the pages I link to! Do I want this page to become an hub about cars? It is enough to link to (what I think are) cars authorities: bmw, mercedes, ferrari, ford, renault, … (fiat is better not). Then do I want to exploit the hubness score this page got? I would simply link also to crappyCarsISell.com. HITS thinks this page is an hub and, since an hub by definition points to authorities, hence HITS thinks crappyCarsISell.com is a car authority.
What matters is Direction of links! I have no control on links that go in my page but I have total control in links that go out of my page. Anyway I think the work by Kleinberg is simply great but HITS does not take into account the fact that users will always try to game systems (especially, but not only, if they have an immediate benefit).
… I was almost forgotting the initial reason of this post: I got remind about HITS reading Lexical authorities in an encyclopedic corpus: a case study with Wikipedia by my friend Francesco, whose blog I just discovered today via a comment he left here. And this means one less friend without a blog! Welcome Francesco!

GUESS the graph

GUESS: The Graph Exploration System by IBM seems a very interesting tool if you have fun managing and playing with graphs but I didn’t have time to try it yet. They say Source code available soon, if you have some desperate need for it in the meantime just email me and GUESS uses some great open source software including Piccolo, JUNG, HSQLDB, Jython , and RServe. I use JUNG and it is a delicious piece of software. If GUESS is able to improuve it and to give something more, it is probably an astonishing piece of software (and it is open source)

“Serpica naro” is “San Precario”

san_precario.jpgThis is pure genius! News from Repubblica.it (in Italian).
Serpica Naro, young anglojapanese artist and fashion-maker, was supposed to close the Milano fashion week (Settimana Della Moda) today. BUT (suspence …) Serpica Naro does not exist!
The organizers were fouled by the creative Italian collective Chainworkers. Serpica Naro is in fact an acronym of San Precario (depicted in picture), the newest of a long list of saints but this time with a reason.
Continue reading

Will your friend ask “Are you buzzing me?”

Old but very interesting article The Hidden (in Plain Sight) Persuaders from NyTimes (link via NYTimes link generator).
Some companies, such as BzzAgent, sells as a service “viral social peer-to-peer marketing”, that is normal people telling you (and many other people) how cool a certain product is. The interesting point is that those people are normal people (maybe your friend) and not some superpaid supermodel and also that those people volunteer (!) for spreading good reviews about a certain product, for example, the “Al Fresco” sausages (?!?).
The article raised in me a lot of questions. For example, while I can understand why activists want to spread their ideas (for example, Greenpeace, Attac, EFF, FSF, Engineers Without Borders just to mention some of them), why on hearth would someone (without being paid!!) fight for advertising “Al Fresco” sausages to her friends? There are so many good causes you can embrace, why on earth someone chooses to embrace “Al Fresco” sausages?
I simply don’t get it, so I guess I should experiment it directly: anyone interested in setting up such a company in Italy? If yes, comment this post or send me email.
Below some excerpts from the article but I suggest you to read it all.
Continue reading

Trust competition testbed rules now available

Do you know RoboCup? In the software version, you can program your own football players and then have them competing against the players of someone else. You can use whatever technique and the goal is to score more goals that the competitor. “It is an attempt to foster AI and intelligent robotics research by providing a standard problem where wide range of technologies can be integrated and examined.”.
With a similar goal, some researchers are working on a trust competition testbed. The idea? You program your player in the “social game”, have it playing against (or with?) the other players and at the end evaluate in some way her performances (how well she reasoned about trusting other players and information in order to reach her objectives). And we can also evaluate how the “society”, intended as the ecology of players, evolves (or not) based on the different, local behaviours. Anyway, if you are interested, check the Trust competition Rules (longer pdf version) and Trust competition FAQ. Want to play with the Java code? Unluckly, not yet possible but I guess you might obtain the code if you email them. Release of the testbed distribution is being withheld until July, 2005. At that time, the testbed will be publicly available for experimentation and competition practice.