The True Size of Africa (Map)
Farewell to cheap capital? The implications of long-term shifts in global investment and saving

Translating Wikipedia, With Some Help from Google

It is undeniable that there are some good reasons to use machine translation, and to trust (hope) that they will be good enough for publishing your contents live. In the case of the Wikipedia, I guess you can assume the risk and the cost of having sub-optimal, non-professional translations in order to be able to have the contents in your language.

That said, I must confess that I am still skeptical with regards to a widespread use of machine translations whenever you have a product that needs to be not only translated superbly, but also "localized", which is something that only someone with a very deep knowledge of the language and current trends in your target market could provide.


To help Wikipedia become more helpful to speakers of smaller languages, the Google Translation team is working with volunteers, translators and Wikipedians across India, the Middle East and Africa to translate more than 16 million words for Wikipedia into Arabic, Gujarati, Hindi, Kannada, Swahili, Tamil and Telugu. We began these efforts in 2008, starting with translating Wikipedia articles into Hindi, a language spoken by tens of millions of Internet users. At that time the Hindi Wikipedia had only 3.4 million words across 21,000 articles––while in contrast, the English Wikipedia had 1.3 billion words across 2.5 million articles.

The Google team selected the Wikipedia articles using a couple of different sets of criteria. First, they used Google search data to determine the most popular English Wikipedia articles read in India. Using Google Trends, they found the articles that were consistently read over time––and not just temporarily popular. Finally they used Translator Toolkit to translate articles that either did not exist or were placeholder articles or “stubs” in Hindi Wikipedia. In three months, they used a combination of human and machine translation tools to translate 600,000 words from more than 100 articles in English Wikipedia, growing Hindi Wikipedia by almost 20 percent. They have since repeated this process for other languages, to bring the total number of words translated to 16 million.


Photo: AttributionNoncommercialShare Alike Some rights reserved by zeni666


The comments to this entry are closed.