Recently, news of Mozilla’s new translation programme Bergamot spread through international technology news portals. Few know that the high-level team also includes language technologists from the University of Tartu who are helping to improve the flexibility and quality of machine translation. The head of the Tartu part of this work, professor of natural language processing at the Institute of Computer Science Mark Fišel, describes the background of the collaboration.
Reading about the project from English media, you will learn that it involves a machine translation programme (The Bergamot Project; see browser.mt) for open-source web browsers, such as Mozilla Firefox, the largest difference with, e.g., Google Translation being its privacy. When most similar machine translation programmes are cloud-based, Bergamot must be downloaded to the computer and no user data is collected during its use.
In addition to the University of Tartu and Mozilla, the consortium of Bergamot also includes the University of Edinburgh, Charles University and the University of Sheffield.
Mark Fišel, please describe what exactly this project is about?
It all began with language technologists from four universities wanting to do a European Commission-funded research project together on machine translation. One idea was to fit machine translation into a web browser. Thanks to a contact person at the University of Edinburgh, we asked Mozilla to be our partner and in January 2019 the project kicked off. This is a research project, which means that most of our activity is exploratory: we are studying how we could alter the best existing machine translation methods in a way to make them even better.
What exactly is machine translation?
The principle of machine translation is easy to explain: a machine or a computer must translate one text from one language to another automatically. It is one of the oldest language processing tasks, as this has been actively addressed since the beginning of the1950s. Despite the long history, ideal machine translation is yet to be developed, however, in practice, its quality is good enough for it to find use. Machine translated text is mainly used in post-editing, where the automatically translated text is manually corrected. With many topics, the average time needed for post-editing is less than what is required for translating from scratch.
What needs to be done to make the quality of machine translation better? What does your daily work entail?
Our main role in this project is to make machine translation engines flexible and adaptable to the content and style of the text. For example, in the context of nature, the machine should translate the word aas as ‘meadow’, but recognising a text on knitting, aas should be translated as ‘loop’. Or seeing a formal English text, the Estonian translation should use the form ‘teie’ not ‘sina’. In the end, the programme should be able to make these decisions automatically.
We are also participating in other stages of the project: for example, we are working on the automatic estimation of translation quality. Its purpose being to decide after the generation of the translation whether it was successful or not. This is necessary to warn the user of a low-quality translation.
What is the final product if everything goes according to plan?
A large proportion of the project is research and experiments, but a working prototype will also be made. At the moment we plan to make the new technology available in the Firefox browser.
What is its main difference compared to Google’s current automatic translation?
The main difference with Google’s automatic translation and its machine translation plugin for Chrome is that Google Translate is cloud-based, which means that all text input is sent to Google’s servers for translating. Bergamot machine translation will work on the client’s computer and not on a cloud, which ensures the privacy of the texts.
The second characteristic is that existing translation engines – including Google’s and UT’s – translate single sentences without looking at the context. The contribution by the University of Tartu scientists should ensure that the translation engine adapts to the context and style of the entire web page and takes into account other additional information to improve translation quality.
What is the ‘shift to client-side translation’ that has received a lot of attention in English media?
Our partners at the Charles University in Prague are working on the so-called client-side translation. The idea is to provide the possibility of improving translation quality for users who are not fluent in the target language. The purpose of the machine translation system in this case would be to identify that part of the input as either being too complicated or ambiguous for successful translation, and to ask the user to rephrase it.
In conclusion, it may be said that the researchers at the University of Tartu Institute of Computer Science are working on applications, which most of the readers of this article probably use regularly. It is important to note that all the results of this research when finished, will be freely released with permissive licences. This project involves translations from English into Estonian, Polish, Czech, German, French and Spanish, and vice versa.
Written by Randel Kreitsberg, University of Tartu
The translation of this article from Estonian Public Broadcasting science news portal Novaator was funded by the European Regional Development Fund through Estonian Research Council.