Tag Archives: kdd

Review of “Feedback Effects between Similarity and Social Influence in Online Communities”

Today I presented to the other SoNetters a wonderful paper titled “Feedback Effects between Similarity and Social Influence in Online Communities” by David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth Suri of Cornell University, presented at the 2008 KDD conference on Knowledge discovery and data mining. My review just under the slides I used for the presentation.

Besides the points already presented in the slides, here I add few points relevant for our research on Wikipedia.

Social influence: People become similar to those they interact with
Interaction ? similarity
Selection: People seek out similar people to interact with
Similarity ? interaction

They considered registered users to the English Wikipedia who have a user discussion page (~510,000 users as of April 2, 2007). They are responsible for 61% of edits to the roughly 3.4 million articles. They ignore actions by users without discussion pages, who tend to have very few social connections.

User’s activity vector v(t): number of times that he or she has edited each article up to that point in time t.
Similarity(u,v): similarity between activity vectors of user u and v.
Time of ?rst meeting for two users u and v = time at which one of them ?rst makes a post on the user discussion page of the other.

In principle, we could also try to infer social interactions based on posting to the interactions based on posting to the same article’s discussion page. Moreover, we found that using simple heuristics to infer interaction based on posts to article discussion pages produced closely analogous results to what we obtain from analyzing user discussion pages.

They ?nd that there is a sharp increase in the similarity between two editors just before they ?rst interact (selection), with a continuing but slower increase that persists long after this ?rst interaction (social influence).

They also create a model and estimate the unobservable parameters based on maximum-likelihood. The estimates are as follows:
* The parameter ?, the probability of communicating versus editing, was 0.058 (i.e. every 100 actions, 6 are talks while 94 are page edits). We can cite it and we can even verify this across different wikipedias and at different time slots.
* When considering article edits as actions, the article is chosen from one’s own interests with probability ? = 0.35, from a neighbor’s interests with probability ? = 0.081, from the overall interests of Wikipedia editors with probability ? = 0.5, and by creating a totally new article with probability ? = 0.069.
* When considering talks as actions, the user to communicate with is chosen randomly from the overall set of users with probability ? = 0.71, and someone who has engaged in a common activity with probability 1-? = 0.29

They also do some content analysis (30 instances of two users meeting for the ?rst time. We examined the content of the initial communication and any reply, looking for references to speci?c articles or other artifacts in Wikipedia. We also compared the edit history of the two users).
Of the 30 messages, 26 referenced a speci?c article, image, or topic. In 21 cases, the users had both recently worked on the artifact that was the subject of conversation.
The gap between co-activity and communication was usually short, often less than a day, though it stretched back three months in one case.
Informally, communications tended to fall into a few broad categories: o?ering thanks and praise, making requests for help, or trying to understand the editing.behavior of the other person.
This sample of interactions suggests that people most often come to talk to each other in Wikipedia when they become aware of the other person through recent shared activity around an artifact. Awareness then leads to communication, and often coordination.

A really wonderful paper!