Yearly Archives: 2016

Which language edition of Wikipedia has more registered women?

A paper of mine published in 2014 started with this simple (but interesting, I think) question ;)
As you might know, Wikipedia is not available only in English but there are almost 300 Wikipedias written in other languages.
So what we did? We computed the percentage of females and males among registered users on 289 language editions of Wikipedia.

The pdf of “Gender Gap In Wikipedia Editing: A Cross-Language Comparison” is available for you to read.
But I suggest you to try to answer to the following questions beforereading the answers (which are in the paper) so that you might play a bit with your stereotypes and prejudices about culture and women around in the world ;)

1) Which language edition of Wikipedia has the largest percentage of registered users setting their gender as female? What is this percentage? It is more or less than 50%?
And 2) what is the language of the Wikipedia with the smallest percentage of women? How close to 0% might this be …?
3) Try to order the following language editions of Wikipedia from the largest percentage of female registered users to the smallest: Arabic, Bulgarian, Catalan, Chinese, French, German, Hindi, Japanese, Korean, Persian, Swedish, Thai. Where does the largest Wikipedia (the English one) is placed?
4) Moreover, considering that setting the gender on Wikipedia is optional and actually few users do it (see details in the paper). Which percentage of users set their gender on English Wikipedia? What is the Wikipedia in which most users set their gender? What is this percentage?

Note that, as written in the paper, of course languages do not map directly to countries. For example, Spanish Wikipedia is heavily edited from Spain but also Latin America and a similar point can be made from Arabic Wikipedia. India has many official languages Hindi, Bengali, Malayalam, Tamil, Marathi but also English. On the other hand, Italian Wikipedia or Catalan Wikipedia are much more “localized”.
Note also that in the paper we arbitrarily decided to consider only editions with at least 20.000 registered users since we computed percentages on registered users (a Wikipedia with 2 users setting their gender would have had percentages of 0%, 50% or 100% clearly not informative) and this filtering step reduced our sample to 76 Wikipedias with a large number of registered users (at least 20.000).
Note also that data refers to March 16, 2013 but we released the Python script as open source so you can re-run it if you are curious about the current situation. You can get the script on Github.

Ok, now you can go to read the paper “Gender Gap In Wikipedia Editing: A Cross-Language Comparison” to get the answers to the previous questions and hopefully be amazed! Enjoy! ;)