The Institute of the Estonian Language and the Estonian Ministry of Education and Research have initiated the largest language technology development project to date, as a result of which the University of Tartu Institute of Computer Science and the Baltic language technology and localisation company Tilde started working on a reliable open-source machine translation platform for three language pairs: Estonian-English-Estonian, Estonian-Russian-Estonian and Estonian-German-Estonian.
In the course of the project, a high-quality machine translation platform will be developed for general web and news texts, crisis information, legal and military texts, to be used by public institutions, translation agencies, freelance translators as well as developers from companies in the fields of IT and artificial intelligence.
According to Mark Fišel, Professor in Natural Language Processing of the University of Tartu, the project aims to do more than just to develop the existing machine translation engines further. Project partners will test new cutting-edge approaches to advance the state of machine translation in Estonia and make it practically useful to a wide variety of end-users.
“We will try different approaches to domain-specific machine translation, test brand new methods such as modular transformer-type neural networks, integrate neural machine translation with dictionaries and lexicons as well as develop solutions for both text and speech translation,” explained Fišel. He also emphasised that the training of the new models would not be possible without the High Performance Computing Centre of the University of Tartu, which supports the university as well as the language technologists of Tilde throughout the project.
Pekka Myllylä, Managing Director of Tilde Eesti, says that the quality of Estonian machine translation has made a breakthrough and can be used for real-life translation tasks. “Machine translation has an important role in the Estonian digital development by removing language barriers and facilitating multilingual information exchange,” said Myllylä. According to him, the best results can be achieved by consistently teaching the terminology and style of a particular text domain to the translation engine.
“Representing the contracting authority, I am glad that the winning consortium comprises the two leading centres in the field of machine translation in Estonia – the University of Tartu and Tilde,” said Arvi Tavast, Director of the Institute of the Estonian Language, expressing hope that this provides assurance not only for this year’s project but also for the continuous development of Estonian machine translation in the future.
The project is funded via the Estonian public procurement “Development of public sector machine translation technology”. The total budget is 600,000 euros and the project ends in December 2021. The responsible funding authority is the Institute of the Estonian Language and the project has been initiated and supported by the Ministry of Education and Research.