Trust in the algorithm or in the human social process? Google, Wikipedia and Points of View.

Very interesting interview of Google News director at NiemanLab.
Krishna Bharat ponders about POVs (Point of View).
“many perspectives coming together can be much more educational than singular points of view”. Ok, I agree.
“You really want the most articulate and passionate people arguing both sides of the equation.” Ok.
“Then, technology can step in to smooth out the edges and locate consensus.” Technology to step in starts to become less agreeable. For doing what? For telling me the truth? What is the most consensual representation of facts?
“That is the opportunity that having an objective, algorithmic intermediary provides you”.
This is the point that I really don’t like. Shall we rely on the algorithmic objectivity to form our visions of world facts? Interestingly this is how Google was “casting” its algorithm for many years: “PageRank relies on the uniquely democratic nature of the web” or “be based on impartial and objective relevance criteria.
The interview goes on with “If you trust the algorithm to do a fair job and really share these viewpoints, then you can allow these viewpoints to be quite biased if they want to be.” and Trusting in the algorithm means trusting in the tacit completeness of the automation it offers to readers.”
Now, I think it is a bit scary that a corporation asks you to trust the objective, algorithmic intermediary they provide to you (with the goal of making money, which is of course totally acceptable per se).

Actually I agree with Ken Thompson that in “Reflections on Trusting Trust” (Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763) claimed You can’t trust code that you did not totally create yourself. (It it very pertinent also that in the paper the very next sentence is Especially code from companies that employ people like me).

As last point, I would like to say that I prefer to trust the transparent social process that happens, for example, on Wikipedia. On pages such as “Climate Change” hundreds of different editors participate and, even if Wikipedia policy asks to write from a Neutral Point of View, it is undeniable that many of them have strong POVs. This is very visible on controversial pages such as the Israeli-Palestinian conflict for example.
What I prefer of Wikipedia, over the objective, algorithmic intermediary provided by Google, is the fact the process is carried out by humans (this is not completely true since there are many automatic bots on Wikipedia but currently they perform mainly maintenance tasks) and, more importantly, the fact you can analyze the complete history of edits (and who made them) that brought each article to its current state. Moreover, if you don’t agree with the current framing of a concept, you can get involved and contribute your POV by editing the page or discussing it in the related talk page.
Let me highlight also how the FAQ about Neutral Point of View on Wikipedia clearly states that “the NPOV policy says nothing about objectivity. In particular, the policy does not say that there is such a thing as objectivity in a philosophical sense—a “view from nowhere” (to use Thomas Nagel’s phrase), such that articles written from that viewpoint are consequently objectively true.”
Let me conclude with the Italian poet Giacomo Leopardi which in “La ginestra” (Wild Broom) was lamenting “le magnifiche sorti e progressive” (the “magnificent and progressive fate”) of the human race. I think we should do it all a bit more than we currently do instead of embracing algorithmic objectivity.

Image: Giacomo Leopardi from Wikipedia (in the public domain)

Wikipedia mentioned in books in 1975

UPDATE: Dami, in a comment to this post, says “if a word appears in a newer edition of an older work (e.g. in the introduction section of cheap reprints of public domain books) Google will count it as an appearance at the time the original work was published.” I checked and this is true, thanks Dami!

I was playing with Google Books Ngram Viewer, which allows you to check how frequently certain phrases occurred in books published since 1950 up to 2008.
Curiously the following graph reports that some books (only 0.0000011% but greater than zero anyway!) were containing the work “wikipedia” (and “wiki”) already in 1950 and in 1975. Maybe there is a small bug even in mighty google services?

The following graph instead shows the increase (as expected) of mentions to “wikipedia” and “wiki” in books since 2003.

Google helps Wikipedia helping the world … maybe.

In 2008, Google opened a project competing with Wikipedia: Knol. The project at January 2009 had grown to 100,000 articles, something it is hard to define a success.
Wikipedia - Cancer Survivor Since then it seems the attitude of Google towards Wikipedia have changed a bit, more like “Ok, you (Wikipedia) can become the de facto monopolist in the user-generated creation of knowledge, we have other and more challenging competitors to defeat now, we will incorporate you later on down the way”.
Two example of this new attitude (according to my view of course) are the Kiswahili Wikipedia Challenge and the Health Speaks Wikipedia pilot project.

The Kiswahili Wikipedia Challenge was a challenge launched in November 2009 by Google. The task was to translate English Wikipedia articles into Kiswahili or to write Wikipedia articles from scratch. Participants received prizes such as laptops, mobile phones, prepaid internet access modems, Google T-shirts. Google stated goal: “We hope to make the online experience richer and more relevant for 100 million African users who speak Kiswahili.”

The results might not be that great. The Wikipedia Signpost of 2010-07-26 quotes from the blog post what happened on the Google Challenge @ the Swahili Wikipedia:

Nearly all of them are gone now and left a lot of articles which often are not really state of the art formally and also linguistically … they don’t care because they were there for laptops and other prizes (no need to be rude, but it hurts me pretty bad).

An article in New York Times is similarly not exalted. The last paragraphs of the article comments on Google-generated content in Wikipedias in languages of India.

However, the surge in content created by Google’s project to improve these sites still needs work, according some local site administrators. For example the Wikipedia in Tamil – one of the underrepresented South Asian languages – the entries covered “too many American pop stars and Hindi movies, which Tamils may not need as a priority.” There was also sloppiness in language and coding.

Despite these concerns, Tamil Wikipedia plans on working with Google to continue the additions. The Bengali Wikipedia, however, took greater umbrage and simply deleted the Google-generated content. The Bengali Wikipedians explained that the material simply did not meet their standards.

The Health Speaks Wikipedia pilot project was announced yesterday and is focused on increasing the quantity and quality of online health information in languages spoken in developing countries. They started a pilot project to support community-based, crowd-sourced translation of health information from English Wikipedias into Arabic, Hindi and Swahili Wikipedias.
They have chosen hundreds of good quality English language health articles from Wikipedia that they hope will be translated with the assistance of Google Translator Toolkit, made locally relevant, reviewed and then published to the corresponding local language Wikipedia site. They have also funded the professional translation of a small subset of these articles. And they are additionally providing a donation incentive to encourage community translators to participate. For the first 60 days, they will donate 3 cents (US) for each English word translated to the Children’s Cancer Hospital Egypt 57357, the Public Health Foundation of India and the African Medical and Research Foundation (AMREF) for the pilots in Arabic, Hindi and Swahili, respectively, up to $50,000 each. This means that community translators will help their friends and neighbors access quality health information in a local language, while also supporting a local non-profit organization working in health or health education.

“Send your data, we will discover the hidden patterns” or Google Machine Learning Prediction API

Wow! The Google Prediction API enables access to Google’s machine learning algorithms to analyze your historic data and predict likely future outcomes. Upload your data to Google Storage for Developers, then use the Prediction API to make real-time decisions in your applications. The Prediction API implements supervised learning algorithms as a RESTful web service to let you leverage patterns in your data, providing more relevant information to your users. Run your predictions on Google’s infrastructure and scale effortlessly as your data grows in size and complexity.
* Language identification
* Customer sentiment analysis
* Product recommendations & upsell opportunities
* Message routing decisions
* Diagnostics
* Document and email classification
* Suspicious activity identification
* Churn analysis
* And many more…

Percentage of pie charts which resembles Pac Man (as a Google pie chart)

The URL for generating the following pie chart on the fly via Google charts, containing all the needed parameters (that’s why it is so long), is
http://chart.apis.google.com/chart?chxt=x,y&cht=p&chco=FAFAFA,FFFF00,FAFAFA&chs=600×300&chtt=Percentage%20of%20Google%20Chart%20Which%20Resembles%20Pac-man%20Chart%20title&chd=t:10,80,10&chl=Does%20not%20resemble%20Pac-man|Resembles%20Pac-man which produces

From mattcutts.

Google on China: “we are no longer willing to continue censoring our results on Google.cn”

We have decided we are no longer willing to continue censoring our results on Google.cn, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down Google.cn, and potentially our offices in China.

Read more on Google blog.
Everything seems originated by cyber attacks trying to access Gmail accounts of Chinese human rights activists.
I cannot read what is the real message between the line and the real reason but this seems big news.

Google and Virgin to conquer Mars … opensourceing it!

UPDATE: thanks to the comment by Vincenzo, I now know this was a April 1st fool! Thanks Vincenzo! The application form with its strange questions could have me realize that! Example:
# I am a world-class expert in
medicine and first aid
Guitar Hero II

Project Virgle, the first permanent human colony on Mars, by Google and Virgin.
The vision is heavily based on Open Source and Crowdsourcing. Clever move, both from PR perspective and from practical perspective!

It comprises three equal partners: Google, Virgin and a diffuse network of talented individuals who want to participate in our mission.

Who do I see as the perfect leader for this project? Yochai Benkler, fabolous author of the book “The Wealth of Networks: How Social Production Transforms Markets and Freedom” and of “Sharing Nicely: On shareable goods and the emergence of sharing as a modality of economic production”, most inspiring paper I ever read.

More from http://www.google.com/virgle/opensource.html:

A post-post-industrial economy
What does “open source” mean in the context of a distant, planet-wide, century-long enterprise? Today’s industrialized (and post-industrialized) (and, one imagines, post-post- industrialized) economies are sustained not so much by physical wealth as by advanced systems of shared knowledge whose marginal productivity grows as more is accumulated. “Shared,” however, doesn’t mean valueless; we see Virgle as a decidedly for-profit venture that will develop most efficiently via decentralized models of effort, authority and reward. If the first economic revolution was agricultural, the second industrial and the third digital, the fourth will be Open Source — the birthing of a planetary civilization whose development is driven by the unbound human imagination.

We want to engage, one might say, the Long Tail of human creativity. Instead of 5,000 people working 12 hours a day six days a week in exchange for a full salary and benefits, imagine 5 million people working a few hours a week in exchange for contribution-based equity in the form of shares in Virgle Inc and ownership of the land of which the colony will ultimately take some form of possession.

You weren’t thinking real estate? Start. Virgle’s costs will be considerable — we’re planning on up-front investments of $10-15 billion in the first two decades –- but so too will the colony’s long-term earnings. Whatever one’s interpretation of the Outer Space Treaty, for instance, it seems clear that the initial explorers and developers will be able to claim ownership of some significant portion of 143 million square kilometers of Martian real estate, which, sold (or traded as open-source sweat equity) at an average value of $10 per acre, would be worth a cool $358 billion. Multiply that by 100x for its post-terraforming value and you get a figure of $36 trillion. Clearly, whatever model of real estate distribution our emerging society adopts, its worth will exceed the investments likely to be required to unlock that value.

Our civilization’s most valuable export, meanwhile, will be intellectual property. The problems our Pioneers solve in the course of their world-building enterprise will represent an engine of invention in dozens of lucrative areas, from biotechnology to geology, physics to agriculture. We see the community’s system of intellectual property development evolving from a community open source model to commercial open source (or perhaps we mean that the other way around?). We can imagine that commons-based peer production model — in which the creative energy of large numbers of people is coordinated into large, meaningful projects, mostly without traditional hierarchical organization or direct financial compensation — extending to almost every imaginable aspect of Martian life.

One identity (system) to rule them all

Lots of competition and activities for becoming the defacto identity system for the future Web.

faebook connect
Facebook pushes Facebook connect

Google friends
Google pushes Google Friend Connect

data portability
While MySpace Embraces DataPortability, Partners With Yahoo, Ebay And Twitter.

one ring
One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them.

Google Opens School of Personal Growth

Google wants to help Googlers grow as human beings on all levels. Emotional, mental, physical and ‘beyond the self’… (This) is why Google University instituted the School of Personal Growth, perhaps the first of its kind in a large corporation. We don’t just pamper Googlers, we want to help them fulfill their full human potential.”

With classes available entitled “The Neuroscience of Empathy” and “Search Inside Yourself,” Broecker said the end goal is to help Googlers be more creative by helping them be more relaxed and open to new ideas.

I found it as I was listening to one of the podcast of AudioDharma by Marc Lesser which lectures a course named “Search Inside Yourself” at Google. Quite a title! And I’m dreaming of setting up something like the School of Personal Growth in the research institute where I work

From webpronews.

Religions symbols in Unicode characters

I was testing a chat system we’re creating with Extjs (amazin Javascript framework!) and I wanted to test issues with “strange” characters. So I quickly found the page of special characters on Italian wikipedia and I was very surprised to see that there are religious (and political) symbols in Unicode standard characters. What you see in the following are normal chars you can copy and paste, just as a normal “a”. I think the classification under “religious As bru was saying in chat: “wow, lots of crosses” … ;)

☥ ☥ ☦ ☦ ☧ ☧ ☨ ☨ ☩ ☩ ☪ ☪ ☫ ☫ ☬ ☬ ☭ ☭ ☮ ☮ ☯ ☯ ♰ â™° ♱ â™± ✝ ✝ ✞ ✞ ✟ ✟

A search on Google for “☭” did not return any result. A search for the swastica symbol instead returns results and actually it was once one of the hot terms in Google Trends. I wonder what Unicode characters does Google include and exclude.

By the way, what is the strangest Unicode character you are able to find (for some definition of “strange”)?