Discourse-Oriented Statistical Machine Translation

A project funded by the Swedish Research Council (Vetenskapsrådet 2012-916).

Purpose and Aims

Automatic translation of human language (machine translation) has matured significantly over the recent years. However, severe problems in today’s technology are still obvious and originate from the inability of current models to adapt to specific topics and domains and to produce coherent text. The aim of this project is to tackle these issues by taking discourse-wide information into modern translation engines.

One of the main problems of state-of-the-art machine translation (MT) is the focus on local information. MT systems typically disregard any information outside the individual sentence to be translated. There is no way to enforce textual coherence in such a system. It is impossible to detect topic shifts or other discourse related events. Sentences are just translated one-by-one in isolation without making use of cross-sentential information and the result is exactly the same with any arbitrary ordering of sentences. The objective of our research is to develop novel models in the framework of statistical machine translation that change this situation in a principled way.

Project Goals

In this project, we propose to develop novel models for discourse-oriented machine translation that lead to more natural translations. First or all, we like to introduce models of textual cohesion in the target language focusing on fluency and coherence of the generated text. Secondly, we also like to include cross-sentential information from the source language input to improve adequacy of the translated texts. The latter will enable models to adjust dynamically to various topics and domains, creating topic-aware machine translation systems that derive appropriate information from discourse similarly to how human translators would.


Resources and Tools