Half of the EU citizens are not able to hold a conversation in a
language other than their mother tongue, let alone to conduct a
negotiation, or interpret a law. In a time of wide availability of
communication technologies, language barriers are a serious bottleneck
to European integration and to economic and cultural exchanges in
general. More effective tools to overcome such barriers, in the form
of software for machine translation and other cross-lingual textual
information access tasks, are in strong demand.
Statistical methods are promising, in that they achieve performances
equivalent or superior to those of rule-based systems, at a fraction
of the development effort. There are, however, some identified
shortcomings in these methods, preventing their broad diffusion. As an
example, even though lexical choice is usually more accurate with
Statistical Machine Translation (SMT) systems than with their
rule-based counterparts, the text they produce tends to be less
grammatical. As a second example, SMT systems are trained in batch
mode and do not adapt by taking user feedback into account. Finally,
in Cross-Language Information Retrieval tasks, query words are most
often translated independent of one another, thus giving up possibly
relevant contextual clues.
We strived in SMART to address these and other shortcomings by the
methods of modern Statistical Learning. The scientific focus was on
developing new and more effective statistical approaches while
ensuring that existing know-how was duly taken into account.
In the three years of the project execution, the consortium produced
two innovative SMT systems --both distributed as open-source
projects-- a major extension to an existing SMT system (PORTAGE
adaptive), new discriminative language and distortion models, new
cross-language lexicon adaptation methods, an extensive analysis of an
SMT system as a learning agent, and an effective method for automatic
prediction of translation quality sufficiently accurate to be deployed
right at the end of the project.
Solutions were tested in thorough field evaluations on two user
scenarios. A first scenario focused on the work of professional translators and
aimed at validating new technologies by assessing impact on
productivity. The second scenario consisted in enabling a user to access portions of
the multilingual Wikipedia in languages of which (s)he has limited
command. For a demo related to this scenario integrating a number of technologies developed in the project, click here.
More than four and a half years separate the initial ideas around
SMART and its conclusion. As in most endavours, not everything went as
planned, and the world around us evolved at a fast pace, requiring us
to adapt. Many amazing developments happened in our research
communities at the same time as this project was running, and we
learned very much from others. We like to believe that we delivered on
our promise of showing new valuable directions on fascinating,
relevant and challenging problems. We will certainly keep exploring
them in the future, and hope others will join.