I’ll be in Minneapolis for the Recommender Systems 2007 from October 18th until October 22nd presenting a paper titled Trust-aware Recommender Systems which is a summary of a part of my PhD thesis. I’ll be hosted by Renee, couchsurfing as usual. If you are around and would like to discuss anything, let me know, k? See you soon, on the other side of the pond!
Users reviews of products (like “I bought an Ipod and it was not working” or “I went yesterday to XYZ Restaurant and it was fabolous” or “i saw ‘paradise now’ and it was great”) are the basic building blocks of Recommender Systems. And of course they are able to determine the success or failure of a product. Many people nowadays before buying a product check “what Internet is saying about this product?”, usually the level of information awareness is precisely this one.
So, it should not be surprising that:
– There are authors on Amazon who write reviews of their own books under pseudonyms
at least one U.S. author was mistakenly outed on Amazon.com’s Canadian website as having written a review of his own work. The real names of thousands of people who had posted anonymous customer reviews under pseudonyms like “a reader from St. Louis” were revealed online for several days – a mistake that finally was corrected after reviewers, some of them authors themselves, complained.
– a restaurant is suing zSurvey.com, a company that collects restaurant reviews from common consumers and posts them online and in a book, for damaging its reputation. (…) seeking a public apology and 50,000 yuan (US$6,173) each in compensation.They are also demanding the Website delete all of the negative comments it has posted online and stop publishing a guide book with negative comments”.
– and mainly that Amazon Gets Patents on Consumer Reviews
Review your local dry cleaner, pay $10 million?
User reviews are a hot new content area, being used by Google (Quote, Chart), Yahoo (Quote, Chart) and MSN to sweeten their local search results. But as of Thursday, such consumer reviews could put search providers, as well as thousands of e-commerce sites, video rental or review sites and online booksellers, in the sights of Amazon.com’s (Quote, Chart) lawyers.
The patents are simply absurd (you can read them in the article) and I’m not going to comment them and I’m very happy that at least for now Europe voted against Software Patents).
About reviews, I think that creators should be free to publish their opinions (in term of reviews in this case), they should own their reviews (hreview seems a great format for this task), reviews should be released under very liberal licences and everyone should be allowed to aggregate the reviews and do whatever she prefers with this information: offer a Recommender System service, use them for her own decisions, …. Reviews are one of the cornerstones of the Information society and they should be usable by anyone who has an idea.
Some days ago I had to give a presentation for the 2K* symposium, a joint initiative of research groups from different IT institutions, based in Trento and in Genova. The 40 mins presentation was titled “Trust in Recommender Systems: an historical overview and recent developments” (check the source code!). It is heavily based on an old presentation, I just added some slides about microformats, a concept I wanted to convey to the audience.
You can find many presentations in S5 format in the microformats wiki; I also liked this presentation of Firefox, with style vulpes-flagrans or with style greenery. Yes, I know the stile I used for my presentation is not that great, if someone with graphical skills would like to create a style for me, it will be very appreciated of course.
For starting playing with S5, I suggest you S5 primer (you need to download HTML code and edit it) or S5present, an open-source web-based slideshow application (you just create an identity there and then use the site for creating the presentation). Guess what? S5 Presents was written in under 10 hours and 500 lines of code using the fantastic Ruby on Rails framework.
AI Meets Web 2.0: Building The Web of Tomorrow Today by Dr. Jay M. Tenenbaum.
Terrific terrific talk, fascinating. I should have podcasted it because you really missed something (except I have nothing to record audio on, would you consider sending me your old mp3 recorder pen?). I was so excited during the talk that I happened to take a photo of almost any slide. Actually the slides were 94 and I photoed 59 of them! Incredible to me as well.
Anyway, you might want to read the slides (pdf) or maybe you want to have a look at my pictures (possibly as a slideshow).
He introduced all the stuff I enjoy, such as Blogs, RSS, wiki (wikipedia), folksonomies, tags, flickr, Del.icio.us, microformats (aka Lower case semantic web), technorati, pubsub, greasemonkey (bookburro, greasemap) and much more; all tied together in a fascinating, convincing, making-sense manner!
After his presentation, we spoke about my research and he seemed interested. He invited me to visit commerce.net for one month or so and I have to say that I really like the idea. I spoke also with Rohit Khare that is actually working with Tenenbaum and he has a whole bunch of very clever, fascinating, realizable ideas that would really make an impact. They also underline more than once that this kind of architecture/language-of-web2.0 projects should be open source and I totally agree with them and like it.
Actually after the presentation, while I was speaking with Marty and Rohit, there was also Jesse Andrews, the creator of the mind-blowing book burro (actually he got most of the attention, totally deserved by the way). I guess it should be too cool having someone presenting your hack on a conference and then go to meet that person and say “You know the Book Burro extension you presented? Well, I’m the creator of it!”. Cool! If you want to see how Jesse looks like, here is a picture of him and wait some more great hacks from him in few days.
I was invited by Stefano Mizzaro to give a lecture in his course in “Web Information Retrieval”. I spoke about “Trust in Recommender Systems: an historical overview and recent developments”. It was a lot of fun (at least for me). And I thought I could share the slides with you. They are in OpenOffice .sxi format (it is an open format, so if you program does not read a commonly used open format, you probably better change it). They are released under a Attribution-ShareAlike Creative Commons licence. This means that if you want to use them you just have to give credit to me and re-share your slides under the same licence. If you don’t want to re-share your derivative work under the same Creative Commons licence, you are still free, free of not using them. Enjoy.
Google, do hire Stan before Yahoo! does it. Stan is the author of “Outfoxed – Personalize your internet.” I didn’t play with the code yet (seems a Linux version is not yet ready at the moment, but on the way). Yes, the code is open source (Mozilla Public Licence), sweet! Anyway, the detailed description is fantastic! It is a bit like what I want to do for my PhD thesis. The difference? Stan did it! Check the site: it has a lot of interesting pages such as The Outfoxed Idea (A collection of thoughts on the theoretical aspects of Outfoxed, and the whole idea of using social networks for metadata distribution). Or at least the page A Third Phase of Internet Search in which Stan pictiorally shows the 3 phases: Naive trust –> PageRank and inferred quality –> Social networks to determine subjective quality
I know the title is hard to parse. Let use some parenthesis: Read [the books [people [you dislike] dislike]].
That is, there are people you dislike, they dislike some books, you possibly will like these books.
Pietro Speroni reports that A right winged newspaper: Human Events online, asked a panel of 15 conservative scholars and public policy leaders to help us compile a list of the Ten Most Harmful Books of the 19th and 20th centuries. (here the list) and how “The list have it all, itï¿½s the most complete list of texts I found that were really important to understand the world we are living in”. The rationale behind is: if neocons believe these books are harmful and since I think neocons are harmful, I should read these books. While this is ok on real world, this reasoning does not work in Trust-aware Recommender Systems, topic in which I’m phding. In online communities (in which it is easy to create fake identities) this is subject to a simple attack and anyone could easily game the system. The idea: since I get recommended the items disliked by people I dislike, the user I dislike could pretend to “dislike” the item she wants I get recommended. Ex: a neocon identity could pretend to dislike the book “why bush is right” (hopefully this does not exist and it is just an example) and I get recommended it. For this reason, in algorithms I designed, I decided that the opinions of people you dislike should not influence your recommendations at all, they are simply discarded because otherwise they are able to influence your recommendations and hence game the system. Well, not sure, I’m good in explaining it (English is hard…). Maybe you want to check some papers of mine in which hopefully I was helped in writing in a clearer way. Since we are speaking of books, maybe you want to check the list of books I’ve read (actually it is not at all complete or updated, I was trying to keep it with allconsuming.net and to decentralized publish it also in semantic web formats (RSS | XML) but in fact I created it once and never updated … maybe in a short future there will be a tool that will allow me to keep a list of read books, with comments and to automatically publish it on my blog, in that case I’ll probably try again to keep it updated. Or such a tool is already there? If so, please let me know).
The list of books that neocons think are harmful is
Some weeks ago, Tantek was introducing a new microformat hReview.
We are pleased to announce the first public draft (v0.1) of hReview, jointly co-authored by representatives from America Online, CommerceNet Labs, Microsoft, Six Apart, Technorati, and Yahoo!. hReview is an open microformat standard for publishing and indexing distributed reviews on the Web. This standard enables users to contribute, identify, and aggregate review content on their own web sites and blogs as well as on community sites.
I didn’t have time yet to dig into it but it is good that they analyzed previous attempts (I was trying to use RVW by Alf Eaton and to keep my list on Allconsuming but I didn’t put too much effort into this) and that they ask for Feedback; almost all the links are to Wikipages so you can edit them directly there.
In general I really appreciate the work of Technorati (I also wrote a paper backing their proposal of VoteLinks, submitted to Web Intelligence 2005: “Page-reRank: using trusted links to re-rank authority” (pdf)).
Some other link I’ll try to digest later on: jluster on hreview, hreview on technorati, hreview on del.icio.us, organizedshopping on hreview, adriancuthbert suggested to use this_is_an_hreview as common tag (tagspace?).
It would be great to have this format widely adopted so that the amount of decentralized published reviews will become soon huge and I will have a large amount a data for what I’m working on in my PhD: Trust-aware decentralized Recommender Systems. If interested, check my (a bit outdated) PhD proposal at my papers page.
I’m thinking about writing a book on Trust Metrics, or maybe about Trust Metrics and Recommender Systems. (I need to write my PhD thesis anyway so if I can get it published, this is a plus). Well, a search inside-books on Amazon for “trust metric” reveals this is not a too covered topic. Good. Do you have any suggestion? Publisher, topics, whatever. Anyway being able to search inside (almost) every book in one second is astonishing, sometimes I forget about how astonishing the Web is…
Article over at The Economist United we find on Collaborative Filtering. It is interesting to note that it speculates also on attacks to Recommender Systems. An interesting (simple as it should be) idea is the following:
Nolan Miller, of Harvard University’s Kennedy School of Government, and his colleagues (…) probabilistic techniques to determine whether a score is likely to be “honest”, by spotting unusual-looking patterns in scoring. Dozens of accounts created on the same day, all of which give high scores both to a bestseller and a new book, for example, might be an orchestrated attempt by a publisher to get fans of the former to buy the latter.