Author Archives: paolo

Creative Commons power: you make a photo, someone else use it for a video

1) Stallman was in Trento and I got the chance to be pictured with him.
2) Gavin Hill found the picture and emailed me asking to use the picture for a video he was making. I release everything on my blog on a Attribution-NonCommercial-ShareAlike 1.0 Creative Commons Licence but he preferred a more strict one and so I re-released this photo to him under a Attribution Creative Commons Licence.
3) He created a video about “How Software Patents Actually Work” with the picture. [He wrote me in first email that he would release the video under CC licence as well. At the moment, I think he forgot to write it on the video page so I emailed him about this] [UPDATE: he told me that “The video itself has details of the CC license at the end”]
4) I’m in the “thanks list” at the end of the video.
Everything thanks to CreativeCommonsLicenses, the copyright for the 21th century!
Napo is the guy that appears in the picture (and the guy I ask to when I have a problem with my GNU/Linux and the guy I share the office with every single day) and wrote about it on his blog on persone.softwarelibero.org an entry in Italian.
You can also translate the video in your language, if you like.

India rejects software patents

From BoingBoing: Software Patents Stopped in India. Well, India acted much savvier than Europe but we are still fighting against Software Patents in Europe. Software patents are good only for mega big companies that use them to destroy, via legal fights, small and medium companies. Imagine being sued by, say, IBM for patent infringement. You will lose a lot of time and money to defend your reasons and you will eventually give up and bankrupt. Software Patents are a nightmare for European IT market and for Europe. Help in stop them.

Randomly-generated paper accepted for a conference!

Too funny, too sad. SCIgen is an Automatic Computer Science Paper Generator. The program (GPL-licenced and hence Free Software) generates random Computer Science research papers, including graphs, figures, and citations. I was thinking about doing something like it since a lot of time, but wait … one of the random paper got accepted for a conference!!!
One useful purpose for such a program is to auto-generate submissions to “fake” conferences; that is, conferences with no quality standards, which exist only to make money. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (for example, check out the gibberish on the WMSCI 2005 website). Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005! See Examples for more details.
The accepted paper is Rooter: A Methodology for the Typical Unification of Access Points and Redundancy by Jeremy Stribling, Daniel Aguayo and Maxwell Krohn and the “authors” say We are currently working on the “camera-ready”, and received many donations to send us to the conference, so that we can give a randomly-generated talk. Ehi, researcher! You can cite it! After all it is a published paper! Not the crappy stuff you find on blogs! Beware, never cite an online article, only articles published on the old paper at one of the millions of crappy iper-expensive conferences!
And, in case you want to cite a paper of mine, I just created “A Case for Randomized Algorithms” and “Comparing XML and Markov Models” or you can just generate a new paper for me. Writing a paper is now easier than ever!!! I need to click 8 more times on this link and then I can just spend one year on holidays since I already produced a good amount of papers.
[I found the news on BoingBoing, a blog reseachers should cite sometime…]

ICT4development

One of my interest is “How can information technology improve lives in the developing world?” (sentence from this post). If you are interested in this topic, you will enjoy Ethan Zuckerman’s ramblings on Africa, technology and media and particularly the post titled Mike Best with evidence that ICT4D works….
[I don’t like the term “developing world. In Italian I tend to use “paesi del Sud del Mondo”, that it is not 100% satisfactory as well since you could argue that Sud (south) can be intended as less valuable than North but I don’t agree: on many topics, the word South can carry more positive values than the word North]

Attacking HITS (and not PageRank)

While I think PageRank is a very clever (though simple) idea, I’m not very sure about HITS. What this algorithms are for? For predicting the quality of a page on the Web based on all the links between pages. PageRank assumes that a page linked by many pages and linked by pages of high quality (recursive!) has a good quality, i.e. it is an authority. HITS is based on the notions of hub and authority: a good hub is a page that points to several good authorities; a good authority is a page that is pointed at by several good hubs.
So, why do I appreciate PageRank and less HITS? Because the latter can be easily attacked. The PageRank of this page depends only on the pages linking to this page and I cannot easily force everyone on the web to link to this page. It depends on what other pages decide to link and I have no power over it.
Conversely, according to HITS, the hubness of this page depend on the pages this page link to, and I have total power over the pages I link to! Do I want this page to become an hub about cars? It is enough to link to (what I think are) cars authorities: bmw, mercedes, ferrari, ford, renault, … (fiat is better not). Then do I want to exploit the hubness score this page got? I would simply link also to crappyCarsISell.com. HITS thinks this page is an hub and, since an hub by definition points to authorities, hence HITS thinks crappyCarsISell.com is a car authority.
What matters is Direction of links! I have no control on links that go in my page but I have total control in links that go out of my page. Anyway I think the work by Kleinberg is simply great but HITS does not take into account the fact that users will always try to game systems (especially, but not only, if they have an immediate benefit).
… I was almost forgotting the initial reason of this post: I got remind about HITS reading Lexical authorities in an encyclopedic corpus: a case study with Wikipedia by my friend Francesco, whose blog I just discovered today via a comment he left here. And this means one less friend without a blog! Welcome Francesco!

China releases “Human Rights Record of the United States in 2004”

USA is used to release a report on Human Rights for every country in the world. Every country but the USA. So China thought about filling the gap and presented The Human Rights Record of the United States in 2004. (i read the comment in Italian by Repubblica). Interesting reading, full of data, numbers and stats. This is a link to Yahoo Cache version, just in case.
Of course nobody could argue that China is better than USA about Human Rights. But it is interesting that China is explicitly attacking USA on such a topic: can you imagine any other country releasing such a report? By the Information Office of the State Council of the People’s Republic of China. I can’t. With this report, China is saying “we are as powerful as you and we can judge you, as you judge all the world”. This is a scary situation for our future.
Continue reading

GUESS the graph

GUESS: The Graph Exploration System by IBM seems a very interesting tool if you have fun managing and playing with graphs but I didn’t have time to try it yet. They say Source code available soon, if you have some desperate need for it in the meantime just email me and GUESS uses some great open source software including Piccolo, JUNG, HSQLDB, Jython , and RServe. I use JUNG and it is a delicious piece of software. If GUESS is able to improuve it and to give something more, it is probably an astonishing piece of software (and it is open source)

The economist on Collaborative filtering

Article over at The Economist United we find on Collaborative Filtering. It is interesting to note that it speculates also on attacks to Recommender Systems. An interesting (simple as it should be) idea is the following:
Nolan Miller, of Harvard University’s Kennedy School of Government, and his colleagues (…) probabilistic techniques to determine whether a score is likely to be “honest”, by spotting unusual-looking patterns in scoring. Dozens of accounts created on the same day, all of which give high scores both to a bestseller and a new book, for example, might be an orchestrated attempt by a publisher to get fans of the former to buy the latter.
Continue reading