Tag Archives: Wikipedia

Manypedia presented at Wikisym

I uploaded on slideshare the presentation I gave time ago at Wikisym 2012. It is embedded below. There is a comparison of Points of View across different wikis (such as ecured.cu, the Cuban government official wiki, and Conservapedia, Anarchopedia, veganpedia, …) and a comparisons of the same page across different language Wikipedias thanks to Manypedia (such as “List of controversial issues” in English, Chinese and Catalan Wikipedia, “Human rights in the United States” in English and Chinese, “Osama Bin Laden” in English and Arabic, “Vietnam War” in English and Vietnamese, “Northern Cyprus” in Turkish, Greek and English Wikipedia, “Underwear” in English and Arabic)

Manypedia is online at http://www.manypedia.com.

The paper is at http://www.gnuband.org/papers/manypedia-comparing-language-points-of-view-of-wikipedia-communities/. If you like Manypedia and the paper, please cite it. Thanks!

Article about Manypedia on Italian newspaper Corriere

I’ve been interviewed by the Italian newspaper Corriere della Sera about Manypedia (and Wikitrip). If you know Italian, you can read the resulting article titled “Every Wikipedia represents its own culture: even the concept of controversiality is controversial” at corriere.it. The journalist liked to stress the fact both Manypedia and WikiTrip are open source, which is a good thing I think.
Manypedia on corriere.it

Manypedia: Comparing Linguistic Points Of View (LPOV) of different language Wikipedias

As part of our investigation of the social side of Wikipedia in SoNet, Federico “fox” and I created Manypedia, a web mashup which I really like ;)
On Manypedia, you compare Linguistic Points Of View (LPOV) of different language Wikipedias. For example (but this is just one of the many possible comparisons), are you wondering if the community of editors in the English, Arabic and Hebrew Wikipedias are crystallizing different histories of the Gaza War? Now you can check “Gaza War” page from English and Arabic Wikipedia (both translated into English) or from Hebrew Wikipedia (translated into English).
Manypedia, by using the Google Translate API, automatically translates the compared page in a language you don’t know into the language you know. And this is not limited to English as first language. For example you can search a page in the Italian Wikipedia (or in 56 languages Wikipedias) and compare it with the same page from the French Wikipedia but translated into Italian. In this way you can check the differences of the page from another language Wikipedia even if you don’t know that language, sweet!
Well, the Gaza War is just one of the topics which might have very different LPOVs on difference language Wikipedias but there are many more. As a starting point, you can check the Wikipedia page “List of controversial issues” which lists many controversial articles grouped around 15 main categories. Actually it is interesting to compare the controversial articles page on English and Chinese Wikipedia (the English Wikipedia is slightly more centered around topics important for US/Western culture and in particular the Chinese Wikipedia page reports pages such as “Anti-Japanese War”, “Nanjing Massacre”, “Taiwan”, “Human Rights in China”, “Falun Gong”, “Tiananmen Incident”, “Mao Zedong”, “List of sites blocked by China”) or on Catalan Wikipedia (in which controversiality arises around what is a country, Catalan countries and Valencia).
On top header of Manypedia there are some featured comparisons handpicked by us (and a random one is loaded on the main page) but actually you can search in real time for any page that appears in any language Wikipedia. Currently we support 56 languages so that for example, you can search for a page in the Arabic Wikipedia and compare it with the same page in Hebrew Wikipedia but translated into Arabic. Or from Italian compared with French, or from Tagalog compared with Catalan, or from Hindi compared with Irish, or from Turkish compared with Yiddish, or from Persian compared with Swahili … well, you’ve got the idea ;)
Of course if you have any suggestion or feedback, we would love to hear it in order to make Manypedia better and more useful.
You can contact us via Twitter (Manypedia@Twitter) or via Facebook (Manypedia@Facebook).

Researcher position available in SoNet at FBK, Trento, Italy about social side of Wikipedia

A position is available in the SoNet (Social Networking) research unit at Bruno Kessler Foundation, Center for Information Technology, Trento, Italy. The SoNet research unit focuses its research on the social side of Wikipedia and wikis in general.

The successful candidate will join our group working on a project whose goal is to mine, analyze and computationally model the individual and collective behaviour in communities and social networks of Wikipedia users.
The ideal candidate should have:
* Ph.D. level education in a relevant discipline
* A good record of relevant research published in peer-reviewed conferences or journals
* Strong empirical and analytical orientation, with experience in handling large amounts of data coming from user action logs and social networks
* Experience with statistics and with analysis of complex networks
* Knowledge of at least one of Python, Perl, C/C++, R, Java
* Proficiency in both written and spoken English
Additional requirements:
* Prior experience in using social media for disseminating personal research
* Interdisciplinary background
* Experience with GNU/Linux systems

Type of contract: co.co.pro (collaboration contract) for one year. Initial appointments are for one year and renewal is based on performance. Gross Salary offer will be approximately Euro 22.500,00 and can be increased based on experience and skills of the candidate.

Feel free to contact me if you have any question and you can find more information about how to apply.

Report of ACM Hypertext 2011 conference

Past week I attended the Hypertext 2011 conference in Eindhoven where I presented the paper “Social networks of Wikipedia” discussing two different algorithms for extracting networks of conversations from User Talk pages in Wikipedia and evaluating them against the manual coding of all messages in User Talk pages of the Venetian Wikipedia. The main point was listing all the many details in Wikipedia practices and formatting styles that you need to be aware of if you want to derive realistic results from your quantitative analysis. The code of the algorithms is available as open source and some network datasets extracted from Wikipedia as well.

The conference was smaller than what I expected but interesting. There were some people working on Wikipedia and I had many interesting conversations with them.
The best talk was hands down the one by Noshir Contractor titled “From Disasters to WoW: Using Web Science to Understand and Enable 21st Century Multidimensional Networks”. He spoke about the many different great works is doing in an entertaining and funny style. The main methodological take-away message I got is that he is looking at networks at the edge level, considering the “motivation” for each edge (positive/negative links, in fact) and seeing how much different established sociological theories such as homophily, social balance, winner takes it all, etc are able to explain the topology of the network. For example 4 networks extracted from 4 different kinds of interactions of the same users of an online massive multi player game (I think “who fights with whom, who is in guild with whom, who exchanges messages with whoms, who trades goods with whom) exhibit different patterns and the particular orientation of a certain network can be explained by the balance of the motivations explained by the different theories. In particular the network of “who trades goods with whom” has special “motivations” that are influenced by the presence of so-called goldfarmers, people typically in China or other average-low-income countries who play online games doing repetitive tasks with the goal to acquire in-game currency that is usually sold against real-world currency to other players. One of their paper about this “Mapping Gold Farming Back to Offline Clandestine Organizations: Methodological, Theoretical, and Ethical Challenges” won the award for Best Paper at the recent Game Behind the Game conference. What I was really surprised to hear is that he is working as well on Wikipedia!
In fact, in his keynote, Noshir presented some recent work he has been doing with one of his students, Brian Keegan, about Wikipedia’s coverage of breaking news articles such as the Japan earthquake. Interestingly Michela Ferron and I wrote a paper titled “Wikipedia as a Lens for Studying the Real-time Formation of Collective Memories of Revolutions” in which we highlight the richness of the phenomena of collective memory building on Wikipedia about the current north-African revolutions (all the Wikipedia pages get created few minutes or days after the events and receive an incredible number of edits from many different users, what we interpret as a process of collective memory building) and we discuss research directions (more info about this in a next blog post). Out article was recently accepted in the “International Journal of Communication” and we are of course delighted by that. Actually the editor of IJoC is Manuel Castells, who will be giving a keynote at the upcoming ICWSM about … guess what? Social Media and Wiki-Revolutions: The New Frontier of Political Change. I guess it is really a hot topic nowadays, which is both conforting (we are doing cool stuff) and worrying (because these guys are really good and it is hard to do better … but we will try ;)
Actually in two weeks Noshir will come to Trento to give a one week course on Social Network Analysis which I’m really looking forward to attend and I hope to gather further insights via discussions with him.
The other guys who presented works about Wikipedia at Hypertext conference were David Laniado and his colleagues from Barcelona Media who presented “Co-authorship 2.0: Patterns of collaboration in Wikipedia“, an interesting analysis of networks of coediting on Wikipedia and its comparison with networks of scientific co-authoring. He was also there with a poster about “Automatically assigning Wikipedia articles to macro-categories”, joint work with Jacopo Farina and David Laniado.
There was also another very interesting work titled “Social Capital Increases Efficiency of Collaboration Among Wikipedia Editors” presented by Keiichi Nemoto of Fuji Xerox who was working with Peter Gloor and Robert Laubacher of MIT Center for Collective Intelligence. They found the more cohesive and more centralized the collaboration network of Wikipedia editors and the more network members were already collaborating before starting to work together on an article, the faster the article they work on will be promoted to good or featured article.
Overall it was good to discover interesting projects and meet good people working on Wikipedia which I hope I’ll keep meeting at future conferences.

Trust in the algorithm or in the human social process? Google, Wikipedia and Points of View.

Very interesting interview of Google News director at NiemanLab.
Krishna Bharat ponders about POVs (Point of View).
“many perspectives coming together can be much more educational than singular points of view”. Ok, I agree.
“You really want the most articulate and passionate people arguing both sides of the equation.” Ok.
“Then, technology can step in to smooth out the edges and locate consensus.” Technology to step in starts to become less agreeable. For doing what? For telling me the truth? What is the most consensual representation of facts?
“That is the opportunity that having an objective, algorithmic intermediary provides you”.
This is the point that I really don’t like. Shall we rely on the algorithmic objectivity to form our visions of world facts? Interestingly this is how Google was “casting” its algorithm for many years: “PageRank relies on the uniquely democratic nature of the web” or “be based on impartial and objective relevance criteria.
The interview goes on with “If you trust the algorithm to do a fair job and really share these viewpoints, then you can allow these viewpoints to be quite biased if they want to be.” and Trusting in the algorithm means trusting in the tacit completeness of the automation it offers to readers.”
Now, I think it is a bit scary that a corporation asks you to trust the objective, algorithmic intermediary they provide to you (with the goal of making money, which is of course totally acceptable per se).

Actually I agree with Ken Thompson that in “Reflections on Trusting Trust” (Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763) claimed You can’t trust code that you did not totally create yourself. (It it very pertinent also that in the paper the very next sentence is Especially code from companies that employ people like me).

As last point, I would like to say that I prefer to trust the transparent social process that happens, for example, on Wikipedia. On pages such as “Climate Change” hundreds of different editors participate and, even if Wikipedia policy asks to write from a Neutral Point of View, it is undeniable that many of them have strong POVs. This is very visible on controversial pages such as the Israeli-Palestinian conflict for example.
What I prefer of Wikipedia, over the objective, algorithmic intermediary provided by Google, is the fact the process is carried out by humans (this is not completely true since there are many automatic bots on Wikipedia but currently they perform mainly maintenance tasks) and, more importantly, the fact you can analyze the complete history of edits (and who made them) that brought each article to its current state. Moreover, if you don’t agree with the current framing of a concept, you can get involved and contribute your POV by editing the page or discussing it in the related talk page.
Let me highlight also how the FAQ about Neutral Point of View on Wikipedia clearly states that “the NPOV policy says nothing about objectivity. In particular, the policy does not say that there is such a thing as objectivity in a philosophical sense—a “view from nowhere” (to use Thomas Nagel’s phrase), such that articles written from that viewpoint are consequently objectively true.”
Let me conclude with the Italian poet Giacomo Leopardi which in “La ginestra” (Wild Broom) was lamenting “le magnifiche sorti e progressive” (the “magnificent and progressive fate”) of the human race. I think we should do it all a bit more than we currently do instead of embracing algorithmic objectivity.

Image: Giacomo Leopardi from Wikipedia (in the public domain)

Qwiki: awesome animations of Wikipedia pages

Some time ago I made a video of evolution in time of the Wikipedia page about 2005 London bombings.
Well, what you get from Qwiki, for almost every Wikipedia page, has nothing to do with it! It is awesome! Below there is the embedding of the qwiking of page about 2005 London bombings.

View 7 July 2005 London bombings and over 3,000,000 other topics on Qwiki.

Qwiki gets info from a Wikipedia page and automatically reads a text summary (synchronise with the text), adding images from different sources.
It is amazing! I can imagine students in schools pondering “instead of listening this boring professor about history of Europe, I’ll check the qwiking of it” (see below).

View History of Europe

Or do you want to quickly get an idea about the recent 2011 Egyptian revolution? Nothing better than qwiking it (see below).

View 2011 Egyptian revolution and over 3,000,000 other topics on Qwiki.

Well, you can compare these videos with the reports created by professional journalists of CNN or BBC and pondering how far we are from automatic generation in real-time of news reports.
Currently most videos are short (even when the corresponding pages are very long) and this totally makes sense from Qwiki perspective but I guess we are not far away from automatic generation of school lessons about geography, history or literature (and more). For example check the qwiking of the Trento, the city where I live and work.

View Trento and over 3,000,000 other topics on Qwiki.

And as an early feedbacker was saying, I’m nearly in tears. This is so beautiful.

“Social networks of Wikipedia” paper accepted at HyperText 2011

The paper I wrote “Social networks of Wikipedia” got accepted for the 22nd ACM Conference on Hypertext and Hypermedia.If you are going to be as well in Eindhoven, on June 6-9, 2011, please let me know!
If you are interested, you can read the entire paper, the abstract is below. We also released the source code (Python) at sonetlab and released some network datasets extracted from User Talk pages (in GraphML format so you can easily import it in your tool, we like Gephi).

Network extracted from User Talk pages of Venetian Wikipedia visualized with Gephi.

Wikipedia, the free online encyclopedia anyone can edit, is a live social experiment: millions of individuals volunteer their knowledge and time to collective create it. It is hence interesting trying to understand how they do it. While most of the attention concentrated on article pages, a less known share of activities happen on user talk pages, Wikipedia pages where a message can be left for the specific user. This public conversations can be studied from a Social Network Analysis perspective in order to highlight the structure of the “talk” network. In this paper we focus on this preliminary extraction step by proposing different algorithms. We then empirically validate the differences in the networks they generate on the Venetian Wikipedia with the real network of conversations extracted manually by coding every message left on all user talk pages. The comparisons show that both the algorithms and the manual process contain inaccuracies that are intrinsic in the freedom and unpredictability of Wikipedia growth. Nevertheless, a precise description of the involved issues allows to make informed decisions and to base empirical findings on reproducible evidence. Our goal is to lay the foundation for a solid computational sociology of wikis. For this reason we release the scripts encoding our algorithms as open source and also some datasets extracted out of Wikipedia conversations, in order to let other researchers replicate and improve our initial effort.

Studying Collective Memories in Wikipedia

I’m the supervisor of Michela Ferron, PhD student at the Center for Mind/Brain Sciences of the University of Trento and working with me in the SoNet group of the Bruno Kessler Foundation.
Her project is on formation of collective memories in Wikipedia and she just put up an interesting blog I suggest you to check. You find it at http://empiricalmemories.wordpress.com.

Below a video showing some comments posted during the fifth anniversary of September 11 attacks and during the first anniversary of the Virginia Tech massacre (occurred on 16 April 2007) on the related Wikipedia talk pages. But on the blog there is much more.

The state of Wikipedia


The transcript is below:

Wikipedia is one of the most important websites on the Internet today, but you might be surprised to learn it began as a side project of another online encyclopedia. That was called Nupedia, to be a traditional encyclopedia written by experts—free and online—but only one person had final publishing authority and it wasn’t quite taking off.
As the founder of Nupedia, I led the group to establish a farm team of sorts for future Nupedia articles. We used a new software platform to make collaboration easy—the wiki—Wikipedia.
It happened to be the perfect way to write many pages very quickly. Soon enough, Nupedia couldn’t keep up and Wikipedia took center stage. We were creating not just a free content encyclopedia but a “free encyclopedia that anyone can edit.” Other language editions appeared quickly—over 270 at last count—and it was soon followed by sister projects like Wikisource, Wikinews and Wiktionary.
In 2003, I created the Wikimedia Foundation to ensure that Wikipedia could keep up with its own growth. Wikipedia gets almost 400 million visitors every month, and the list of sites visited more often is very short and very famous. Wikipedia celebrates its tenth anniversary in January 2011 and in these ten years has become one of the most popular websites in the world. I still lead the community and the Wikimedia Foundation helps us to make Wikipedia what it is today.
Who does edit Wikipedia? Over time, as many as 1.2 million people have contributed to Wikipedia. As of 2010, there are more than 11 million monthly edits to all Wikipedias in all languages. According to one survey, we have about twice the proportion of Ph.Ds compared to the general public. On the English Wikipedia almost 50% have no religion and 14.6% of French editors claim to believe in Pastafarianism. It would be fair to say that most Wikipedians are not average.
One reason, maybe, is that editing a single page is easy, but getting heavily involved is harder. The community is defined by more than 200 combined policies, guidelines and essays, to say nothing of the discussions and reviews, committees and noticeboards, WikiProjects and more. All the site content is decided by Wikipedia’s volunteer contributors. The Wikimedia Foundation has no editorial role whatsoever.
The Foundation’s job is to keep the servers running and the lights on, but there’s more to it than that. The Foundation is also growing Wikipedia’s presence worldwide—more data centers to speed up Wikipedia worldwide and even bringing its first office outside of the United States to India.
Wikipedia is already very popular in the West and in the North. A new challenge is going to be making Wikipedia available to the developing world, as well. The Foundation is a charity and runs entirely on donations—some from corporations and institutions, but the vast majority from its millions of editors and readers.
It’s incredible what has been accomplished already, but Wikipedia is far from done. As any reader knows, some articles are very good, but some are not. Wikipedia still needs a lot of work. Yet, this is a new challenge. Not just building an encyclopedia from scratch, but making it better: more accurate, more citations. Not just broad, but deep.
There’s never been anything like Wikipedia before, and its future horizon is very, very long. As Wikipedia enters its second decade, it’s up to all of us to make sure it gets even better.