Tag Archives: sonet

Wikipedia datasets released

I strongly believe in replicability of science and I tend to release all the datasets I work on for other people use, improvement and testing. This is what I’ve done when I was working on trust metrics and recommender systems (see the datasets I released on Trustlet.org time ago) and this is also what I do with the SoNet group now that we explore the social side of Wikipedia (see the datasets at http://sonetlab.fbk.eu/data/: they are social network extracted from User talk pages, data about activity patterns on Wikipedia pages, and also about social capital (not on Wikipedia)). Enjoy!

Review of “Taking up the mop: identifying future wikipedia administrators”

Paper by Moira Burke and Robert Kraut of Carnegie Mellon University, presented at CHI ’08, Conference on Human Factors in Computing Systems.

This paper presents a model of editors who have successfully passed the peer review process to become admins. The lightweight model is based on behavioral metadata and comments, and does not require any page text. It demonstrates that the Wikipedia community has shifted in the last two years to prioritizing policymaking and organization experience over simple article-level coordination, and mere edit count does not lead to adminship.

In short, authors compute lots of stats for every single user and then they do regression with the binary variable “election successful, i.e. X became admin”. They separate Request for Adminship pre-2006 and after-2006.

The stats they compute are:
Strong edit history
* Article edits ‡
* Months since first edit
Varied experience
* Wikipedia (policy) edits ‡
* WikiProject edits ‡
* Diversity score
* User page edits ‡
User interaction
* Article talk edits ‡
* User talk edits ‡
* Wikipedia talk edits
* Arb/mediation/wikiquette edits
* Newcomer welcomes
* “Please” in comments
* “Thanks” in comments
Helping with chores
* “Revert” in comments ‡
* Vandal-fighting (AIV) edits
* Requests for protection
* “POV” in comments
* Admin attention/noticeboard edits
* X for deletion/review edits ‡
* Minor edits (%)
Observing consensus
* Other RfAs
* Village pump
* Votes
Edit summaries / comments
* Commented (%)
* Avg. comment length (log2 chars)
Merely performing a lot of production work is insufficient for “promotion” in Wikipedia. Candidates’ article edits were weak predictors of success. They also have to demonstrate more managerial behavior. Diverse experience and contributions to the development of policies and Wiki Projects were stronger predictors of RfA success. This is consistent with findings that Wikipedia is a bureaucracy [1] and that coordination work has increased substantially [8][13].

However, future work is needed to examine more closely what the admins are doing. Future admins also use article talk pages and comments for coordination and negotiation more often than unsuccessful nominees, and tend to escalate disputes less often.

Although this research has shown that judges pay attention to candidates’ job-relevant behavior and especially behavior that suggests the candidate will be a good manager and not just a good worker, it is silent about whether other factors and probit regressions on the likelihood of success in a identified in the organizational literature [9]—social networks, irrelevant attributes, or strategic self- presentation.

Indeed, recent evidence that Wikipedia admins use a secret mailing list to coordinate their actions toward others suggest that sponsorship may also play a role in promotion.

Future research in Wikipedia using techniques like those in the current paper can be used to test theories in organizational behavior about criteria for promotion. An important limitation of the current model is that it does not take the quality of contribution into account. We plan to improve the model by examining measures of length, persistence, and pageviews of edits, which are already being used in more processor intensive models of existing admin behavior [7] and impact of edits [10].

Criteria for admins have changed modestly over time. Success rates were much higher (75.5%) prior to 2006, and collaboration via article talk pages helped more in the past (+15% for every 1000 article talk edits, compared to +6.3% today). The diversity score performs similarly prior to 2006 (+3.7% then, +2.8% now). However, participation in Wikipedia policy and Wiki Projects? was not predictive of adminship prior to 2006, suggesting the community as a whole is beginning to prioritize policymaking and organization experience over simple article-level coordination.

If you want to read the details, you can read the PDF of the paper.
Credit: Picture by inju released under Creative Commons.

Review of “Feedback Effects between Similarity and Social Influence in Online Communities”

Today I presented to the other SoNetters a wonderful paper titled “Feedback Effects between Similarity and Social Influence in Online Communities” by David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri of Cornell University, presented at the 2008 KDD conference on Knowledge discovery and data mining. My review just under the slides I used for the presentation.

Besides the points already presented in the slides, here I add few points relevant for our research on Wikipedia.

Social influence: People become similar to those they interact with
Interaction ? similarity
Selection: People seek out similar people to interact with
Similarity ? interaction

They considered registered users to the English Wikipedia who have a user discussion page (~510,000 users as of April 2, 2007). They are responsible for 61% of edits to the roughly 3.4 million articles. They ignore actions by users without discussion pages, who tend to have very few social connections.

User’s activity vector v(t): number of times that he or she has edited each article up to that point in time t.
Similarity(u,v): similarity between activity vectors of user u and v.
Time of ?rst meeting for two users u and v = time at which one of them ?rst makes a post on the user discussion page of the other.

In principle, we could also try to infer social interactions based on posting to the interactions based on posting to the same article’s discussion page. Moreover, we found that using simple heuristics to infer interaction based on posts to article discussion pages produced closely analogous results to what we obtain from analyzing user discussion pages.

They ?nd that there is a sharp increase in the similarity between two editors just before they ?rst interact (selection), with a continuing but slower increase that persists long after this ?rst interaction (social influence).

They also create a model and estimate the unobservable parameters based on maximum-likelihood. The estimates are as follows:
* The parameter ?, the probability of communicating versus editing, was 0.058 (i.e. every 100 actions, 6 are talks while 94 are page edits). We can cite it and we can even verify this across different wikipedias and at different time slots.
* When considering article edits as actions, the article is chosen from one’s own interests with probability ? = 0.35, from a neighbor’s interests with probability ? = 0.081, from the overall interests of Wikipedia editors with probability ? = 0.5, and by creating a totally new article with probability ? = 0.069.
* When considering talks as actions, the user to communicate with is chosen randomly from the overall set of users with probability ? = 0.71, and someone who has engaged in a common activity with probability 1-? = 0.29

They also do some content analysis (30 instances of two users meeting for the ?rst time. We examined the content of the initial communication and any reply, looking for references to speci?c articles or other artifacts in Wikipedia. We also compared the edit history of the two users).
Of the 30 messages, 26 referenced a speci?c article, image, or topic. In 21 cases, the users had both recently worked on the artifact that was the subject of conversation.
The gap between co-activity and communication was usually short, often less than a day, though it stretched back three months in one case.
Informally, communications tended to fall into a few broad categories: o?ering thanks and praise, making requests for help, or trying to understand the editing.behavior of the other person.
This sample of interactions suggests that people most often come to talk to each other in Wikipedia when they become aware of the other person through recent shared activity around an artifact. Awareness then leads to communication, and often coordination.

A really wonderful paper!

Scholarship for 1 year in the SoNet group

Come to work with our research group (SoNet – Social Networking)! Read more at http://sonet.fbk.eu/en/work_with_us

Scholarship available (~1300 Euros after taxes per months).
Deadline: 5 February 2010!

The research activity will be about identifying requirements for a social networking platform for Associazione Trentini nel Mondo Onlus (thousands of people from Trentino who are currently leaving abroad) and in proposing different platforms and adoption strategies, following their deployment (carried out by developers of the SoNet project) and
in evaluating them.
The scholarship is for one year and activity will start 15 March 2010.

Two talks by David Orban in Trento on April 8th: The Open Internet Of Things, and

The SoNet FBK research group is happy to invite you to two talks by David Orban on April 8th in Trento.
The first talk, “The Open Internet Of Things”, will be about OpenSpime. It will be interesting if you are interested in sensors, positioning devices and memory, social, Web 2.0-style services in the real world, green technology, tech applied to the environment, open hardware and software, communications protocols, and future in general.
The second talk, “Preparing Humanity For The Impact Of Accelerating Technological Change”, will talk about the Singularity University, a recent new initiative funded by Nasa, Google and more.
I’ll wait you on April 8th!

First talk: The Open Internet Of Things
8 April 2009 – at 10.00 – Conference Room – Fondazione Bruno Kessler – Povo (TN) (up in the hills, see the map)
If we want the the forthcoming Internet of Things to flourish, the distributed smart sensor networks which take the current infrastructures for granted and base their necessarily autonomous activities on massive data collection, then we have to adopt an open architecture. Only an interoperable approach to the design of the next generation of hardware and software systems is going to be able and leverage the dramatic effects, and express the value to human civilization that the network of tens, or thousands of billions of new objects, the spime network is going to shape. For more info see http://www.openspime.com

Second talk: Preparing Humanity For The Impact Of Accelerating Technological Change
8 April 2009 – at 15.00 – Conference Room – Fondazione Bruno Kessler – Trento (downtown, see the map)
The impact of advanced technologies on our societies is becoming more and more extreme, exposing new tensions in our models of human relationships, learning, and values in policies, politics, and business. While relinquishment has been recommended by some, it appears that the way ahead will be the use of more, not less technology, as billions of people aim to achieve a high quality of life for themselves, and their children. The Singularity University, recently formed on an open, international and interdisciplinary approach employs an advanced curriculum to analyze how the future leaders of enterprise, culture, and science can best prepare to face the serious challenges ahead.

About the speaker:
David Orban is an entrepreneur and visionary. In recognition of his lifetime contribution to exponentially advancing technologies, he has been honored with the position of Advisor and European Lead to the prestigious Singularity University.
He is a Founder and Chief Evangelist of WideTag, Inc., a high technology start-up company providing the infrastructure for an open Internet of Things. David cuts across the limits of deep specialization to contribute to the new renaissance. He explains, “My vision is at the crossroads of technology and society as defined by their co-evolution.” David Orban’s personal motto is, “What is the question I should be asking?” This concept is his vehicle to accelerating cycles of invention and innovation in order to build the new world ahead.

Reblog this post [with Zemanta]

Insights into relationships on Facebook

Interesting blog post by Cameron Marlow, research scientist at Facebook over at overstated.net: Maintained Relationships on Facebook.

They start from a simple question: is Facebook increasing the size of people’s personal networks?

They looked at the communications of a random sample of users over the course of 30 days and defined networks in 4 different ways:

  • All Friends: the largest representation of a person’s network is the set of all people they have verified as friends. In research papers this number ranges between 300 and 3000. In facebook on average every users has 120 friends.
  • Reciprocal Communication: as a measure of a sort of core network, we counted the number of people with whom a person had had reciprocal communications, or an active exchange of information between two parties. In research papers, this numbers ranges from 3 as individuals with whom I can discuss important matters (for Americans) to 10 or 20 as ongoing contacts at a university.
  • One-way Communication: the total set of people with whom a person has communicated.
  • Maintained Relationships: the set of people for whom a user had clicked on a News Feed story or visited their profile more than twice. This is a sort of over-the-shoulder relationship, I’m “following” (this is the relationship type) the target user without she necessarily knowing it. This is a new type of relationship (not really available says 50 years ago), similar to reading the flow of thoughts of someone via a blog or just looking at the pictures uploaded on Flickr.

An interesting observation: “as a function of the people a Facebook user actively communicate with, you are passively engaging with between 2 and 2.5 times more people in their network”.

And another one: The stark contrast between reciprocal and passive networks shows the effect of technologies such as News Feed. If these people were required to talk on the phone to each other, we might see something like the reciprocal network, where everyone is connected to a small number of individuals. Moving to an environment where everyone is passively engaged with each other, some event, such as a new baby or engagement can propagate very quickly through this highly connected network.

facebook stats

Tools for finding conferences and journals

There are some aggregators for Call for Papers. The ones I use for finding conferences and journals are:

Check them. The amount of conferences organized every month is really amazing!
Do you use other services? Maybe with a recommender system in it so that they are able to directly suggest you the conferences relevant for you?

Kickoff meeting and public presentation for LiveMemories project with Ricardo Baeza-Yates from Yahoo! Research

livememories Wednesday October 22th 2008, in Trento there will be the kickoff meeting for the LiveMemories project, Active Digital Memories of Collective Life (in which I’m involved). The public workshop is open to everybody (it will be at least translated in Italian).
UPDATE: Now with blog in Italian http://lamemoriaaltempodiinternet.wordpress.com.
Check the program of the workshop or read it here below copy and pasted. There will be Ricardo Baeza-Yates, Director of the Yahoo! Research labs at Barcelona speaking about the Impact of Social Networks, Alessandro Cavalli – Professore di Sociologia, Università di Pavia, speaking about “La Costruzione Sociale della Memoria Collettiva”, Simon Delafond – Web producer – BBC, UK speaking about “BBC Memoryshare initiative” and presentations from the project partners and a collective discussion about “Quale modello per la libera circolazione della Memoria?”

I’m really looking forward for the event! If you are interested or you are coming, please let me know! See you!

Continue reading

The consequences of opensourcing Facebook code

Some weeks ago Facebook released its source code as Free and Open Source Software.
I’m very curious about the consequences of this action. Initially I was to suppose this choice would have been a tsunami in the social networking sites world, but I haven’t found many mentions of this around. So I tried to look around and to answer the question “Which were the consequences of Facebook making its code opensource?”.
I don’t have a clear idea, but it seems very small consequences.
How many clones of facebook popped up? Are they used? I haven’t found any facebook clone worth mentioning.

How many people downloaded the code? How many code patches were provided to Facebook? I guess one of the biggest intended consequences was this one: Facebook getting bug fixes, and chunks of code or suggestions on how to improve performances. Also, it is now easier, I think, for Facebook hiring new developers because they can know them in advance from the commits and suggestions they write about Facebook code. But for example there have been any exploit from people reading the code and finding weaknesses? Probably not, it is much more meaningful, if you discover a glitch to send an email just to Facebook to explain it, there is a chance Facebook might want to hire you as security expert.
Overall, Facebook is better off or worst off after the decision to release the code as Free Software? I was not able to get too much information about this and I’m a bit surprised. Actually I haven’t yet downloaded the code in order to test it. I was about to do it but then for Webvalley we decided to use BuddyPress so “check Facebook code” is still in the todo list.

Some interesting links which might be worth checking in more detail: open source projects on facebook wiki, the portal for developers on Facebook code (interesting!), Project Cassandra: Facebook’s Open Source Alternative to Google BigTable, the fact Google recently released its Protocol Buffers as open source, Facebook did it much earlier with Thrift.

So, did I miss something? What do you think were the consequences of Facebook opensourcing its code?