Studying Collective Memories in Wikipedia

I’m the supervisor of Michela Ferron, PhD student at the Center for Mind/Brain Sciences of the University of Trento and working with me in the SoNet group of the Bruno Kessler Foundation.
Her project is on formation of collective memories in Wikipedia and she just put up an interesting blog I suggest you to check. You find it at http://empiricalmemories.wordpress.com.

Below a video showing some comments posted during the fifth anniversary of September 11 attacks and during the first anniversary of the Virginia Tech massacre (occurred on 16 April 2007) on the related Wikipedia talk pages. But on the blog there is much more.

Wikipedia mentioned in books in 1975

UPDATE: Dami, in a comment to this post, says “if a word appears in a newer edition of an older work (e.g. in the introduction section of cheap reprints of public domain books) Google will count it as an appearance at the time the original work was published.” I checked and this is true, thanks Dami!

I was playing with Google Books Ngram Viewer, which allows you to check how frequently certain phrases occurred in books published since 1950 up to 2008.
Curiously the following graph reports that some books (only 0.0000011% but greater than zero anyway!) were containing the work “wikipedia” (and “wiki”) already in 1950 and in 1975. Maybe there is a small bug even in mighty google services?

The following graph instead shows the increase (as expected) of mentions to “wikipedia” and “wiki” in books since 2003.

Percentage of men and women on different Wikipedias

Few days ago there was an interesting article on NYTimes about the small percentage of women on Wikipedia.
Today on the gendergap mailing list at wikipedia there is a very interesting ongoing discussion. Some preliminary statistics from the discussion are:

Wikipedia in specific language Number of users who specified gender in preferences Percentage of users who specified gender in preferences How many men How many women Percentage of women
13959842 2.01% 233312 46973 16.76%
1167708 3.47% 35726 4800 11.84%
998668 2.16% 18556 3054 14.13%
78180 2.66% 1666 414 19.90%
620393 16.80% 80491 23750 22.78%
414511 3.64% 12106 2999 19.85%
368815 2.92% 8977 1781 16.56%
1464442 2.26% 27980 5070 15.34%

Interesting to note how on Russian Wikipedia, users tend to express their gender much more (16.80%!). Do you have ideas if (1) this is a cultural issue specific of Russians, (2) it depends on the practices of the specific Wikipedia in Russian or (3) it depends on the user interface, for example it might be that when you register you are redirect to an HTML page in which you can specify also your gender?
Also interesting is the fact that in this Wikipedia the percentage of women is the highest (22.78%). Probably the reason is that in a place in which gender is more represented, it is more normal for women to represent it as well. While where gender it is not represent, it is in general foolish for women to explicitly say “Hey, I’m female!” in order not to attract (additional) unwanted messages. Or put in other terms, OMG Girlz Don’t Exist on teh Intarweb!!!!1.

Img by nojhan, under Creative Commons

Fact checking in the time of Web

In the time of Web, news come much faster than years ago. Can few journalists under harsh deadlines of hours really check factual assertions in order to determine they are true? I guess the question boils down to two different ones: (1) is a timely fact-checking possible? (i.e. what is really happening, for instance, in the Niger Delta for oil?) and (2) if this is possible, can few journalists from their offices check if what is mentioned in a certain “report” is really happened?

The issue (fact-checking) is not new of course but can be dated back to the first newspaper. What is new is the fact we now live in a global world and the Web let news to spread faster and faster.

I made a short investigation to see if there is any user-generated, bottom-up, web2.0sque attempt to fact check in the time of Web. Of course there is. Following a list of what I found:

1. FactCheck.org and FactCheckEd.org, two attempts by Annenberg Public Policy Center at the University of Pennsylvania. FactCheck.org is a non-partisan, nonprofit website that describes itself as a “‘consumer advocate’ for voters that aims to reduce the level of deception and confusion in U.S. politics.”
2. WikiFactCheck.org, a proposal by Andrew Lih, an associate professor at the University of Southern California’s Annenberg School of Communication and Journalism and author of “The Wikipedia Revolution: How a bunch of nobodies created the world’s greatest encyclopedia”. In a blog post he explains why he believes a wiki is perfect for the task of decentralized fact-checking.
3. Truth-o-meter by PolitiFact.com. PolitiFact.com was awarded the Pulitzer Prize for National Reporting in 2009 for “its fact-checking initiative during the 2008 presidential campaign that used probing reporters and the power of the World Wide Web to examine more than 750 political claims, separating rhetoric from truth to enlighten voters.”
4. The fact checker project by the Washington Post. This is a more traditional attempt but made available in the wild on a public website. During 15 months, Michael Dobbs have checked some 200 claims and statements relating to the presidential campaign, and received 18,000 comments, many of them vehemently disputing his verdicts. He used Pinocchios as markers of un-truthiness.
5. Fact and Reference Check project by Wikipedia. Wikipedia itself has a WikiProject about this important issue. The purpose of this project is to verify facts in Wikipedia by multiple independent sources. Basically there are templates that anyone can add to articles so that these articles end up in categories such as Category:Wikipedia articles needing factual verification, Category:Articles lacking sources, Category:Articles needing additional references Category:Articles lacking reliable references or Category:Articles with unsourced statements.

UPDATE 2010/11/18: Thanks to a comment by sergio maistrello about factcheck.it, I came to know the following ones, thanks Sergio!

Do you know of more attempts? You are very welcome to add them in the comments. Thanks!

Amazing visualizations of activity on Wikis

Warning: this webpage loads many processor-intensive animations. It might break your browser and probably you will have to close browser window (tab) after use.

The first visualization is made by Erik Zachte and available at stats.wikimedia.org.
The animation (embedded below) shows 4 aspects of the development of different Wikipedias in different languages (en, it, fr, …): X-axis: Age of a project, Y-axis: Number of articles per project, Circle size: Number of editors per project, Color: Maturity of content (blue=mostly stubs, violet=mostly larger articles)

Interactive version, all projects (requires Firefox 3+, Safari 4+ or Chrome)

Static version, Wikipedia only (8 Mb Flash)

The other 3 visualisations are made by Matt Ryal with JavaScript (Processing.js and RaphaëlJs). They are about activity on wiki and blogs of Atlassian’s Extranet.
I embed them here but you can check Matt’s post for more details and better visualization.

Activity — a rippling visualisation of comment activity on the wiki. Based loosely on the Apple Arabesque screensaver.

Comments — a falling bar-graph visualisation of comments by blogpost. Based very much on a Flash visualisation by Digg, but reimplemented in JS (this is about blog and not wiki).

Contributors — a tree graph visualisation linking commenters and blog post authors. (this is about blog and not wiki)

Listening Enrico Giovannini (OECD) speaking about “measuring progress of societies”

I’m in Luserna for blow minding 3 weeks of Webvalley.
Now listening to Enrico Giovannini of Organisation for Economic Co-operation and Development “democracy and statistics”. You can watch what we are listening on ustream.tv.
Online Video provided by Ustream.
And blogging on Webvalley blog as well!

TrentoWiki.it, a wiki for the city of Trento

UPDATE: now also with videos of Trento and bloggers of Trento.
TrentoWiki logoSome time ago I started TrentoWiki.it. I opened TrentoWiki because I needed a place to store information about the places, the events, the many opportunities that this small charming city and its surroundings offer. Up to now it didn’t attract thousands of contributors but it is anyway a useful service at least for myself.

So, who can be interested in a Wiki about Trento?
(1) People who are going to come to Trento (because of a conference, such as the upcoming conference about Free Software (May 16, 2008), or BlogFest in Riva del Garda, or for working in a research centre or just for tourism) and might be able to find information in the wiki, and in fact one of the most accessed pages in “Cercare casa a Trento” (find house in Trento) and Photos of Trento), and
(2) people who lives in Trento and possibly don’t know about all the interesting stuff happening and available in the city.
So please share your local knowledge and insights and, please, be bold in editing TrentoWiki!

TrentoWiki is Mediawiki powered, just like Wikipedia.
The license is Creative Commons Attribution-Share Alike 3.0 which means that the knowledge created on the wiki can be reused legally elsewhere as long as attribution is given and the license remains the same; this means that even if I decide to close the wiki or anything else, all the content can be moved by anyone elsewhere.
TrentoWiki is opened to anonymous editing but you are certainly welcome to create an identity on the wiki.
For me running a wiki is also a very useful experiment, for example for experimenting with the challenge to be multilingual (there is a Category:English) which will be an issue also for the project of getting a wiki internal to my research institute adopted. And it is also an experiment because I’m curious to see if a wiki targeted to a small community can work even by reaching a critical mass of few users. We’ll see, and in the meantime, please do join TrentoWiki.it

Wikipedia trust network

I just discovered that there is (was?) a proposal for implementing a trust network in Wikipedia.
The proposal originated from a posting of Jimbo Wales himself on a mailing list in February 2004.
Some exerpts from the Wikipedia article follow:

The proposed system has the three key ideas: (1) giving users a formal way of declaring their confidence in other users, (2) a way of seeing which users have declared their trust of a particular user, and (3) the resulting structure of trust-relationships formed between all users.
It provides an additional piece of information that may be useful when coming across another user for the first time. The Wikipedia user base is so large that two well-established and respected editors, concentrating on different areas of Wikipedia, may have no contact between each other for some time. Reading an editor’s user page, browsing through their contributions, and reading the threads in their talk are valuable but time-consuming methods of getting to know someone. Discovering that several reputable users, or users that you have particular regard for, have expressed their trust in an editor is a strong indicator of that editor’s value to Wikipedia. However, the sheer number of editors who trust a user should not be taken as a clear measurement of that user’s trustworthiness: the fact that a user is trusted by dozens of suspected sockpuppets would only harm their reputation.
There are a variety of reasons to express trust in another user: you may have worked together on a proposal or article, reviewed many of their edits in articles on your watchlist, or know them personally. Liking another user should not generally be enough; trusting somebody requires being confident that their contributions are civil, constructive and of generally high quality.

Of course distrust is a tough topic as usual.

Additionally, it would be wise to consider carefully any thoughts of writing explicit statements of distrust, bearing in mind the no personal attacks policy.

It is important to remember that the trust network is not a popularity contest, and so there is no need to actively seek out declarations of trust. The fact that another user has not made a declaration of trust in your favour is by no means a declaration of distrust.

And which trust metric is most suited is tackled as well:

The network itself can be analysed using a trust metric to rate individual users. There are very many different ways to do this, which will produce quite different results, and it is important to note that no metric is endorsed by this proposal.
The simplest trust metric is to count the number of users who trust the rated user, but this system is vulnerable to attack (for instance, the use of sockpuppet accounts to trust oneself).
Another is to count how many links there are in the chain of trust between yourself and another user: if I trust A, who trusts B, who trusts C, and this is the shortest path from myself to C, then C is three links away from me. I might decide that I explicitly trust anybody one link away from me, and implicitly trust anybody up to three links away. This is very different to the previous case: the measurement is personal, not absolute, and will not be affected by sock puppetry.

Since “who trusts you?” is more important than “how many people trust you?” there is little point in creating sock puppets to declare trust in yourself.

The original post of Jimbo is precious as well.

But most would adopt a personal policy of giving mostly positives or abstaining, reserving negatives for worst case scenarios.
Newcomers would have no rating at all, obviously. Very prominent people would have lots of ratings, mostly positive I would have to assume. I would probably have 95% positive rating, but not perfect, since beloved though I am and obviously deserve to be (*wink*), I am a target.
We’d likely see perfect positive ratings for people like Michael Hardy, who keeps his nose to the grindstone editing topics that aren’t controversial, and who stays out of internal politics almost
completely as far as I know.
Some sysops have taken enormous and weighty responsibilities on themselves to do important but controversial work like VfD or banning trolls or mediating disputes or editing articles about the Middle East. We’d naturally expect them to get mixed reviews, but we might be surprised… lots of people would give them positive ratings just for doing those jobs, acknowledging the difficulty and risk involved.

And then Jimbo lists advantages and disadvantages, very interesting!

Well, I’m phauly on Wikipedia, I think you should trust me.

Wikipedia shines also on Google Talk.

Yesterday I was looking for the wikipedia page for google talk. It was saying something like “… Google is rumored to be developing …” (see historic version). Today I reload the page and there is complete page full of details. There is also an Easter Eggs section! Already! And it was released today!! Wow, wikipedia is really collective knowledge at power!
Since it is only for Windows, I have no chance to try GoogleTalk and anyway I don’t miss it at all.
In the meantime, I try to guess the next subdomain will be office.google.com but the link for now leads somewhere else …