Machine translation (MT) evaluation metrics based on n-gram co-occurrence statistics are financially cheap to execute and their value in comparative research is well documented. However, their value as a standalone measure of MT output quality is questionable. In contrast, manual methods of MT evaluation are financially expensive. This paper will present early research being carried out within the CNGL (Centre for Next Generation Localisation) on a low-cost means of acquiring MT evaluation data in an operationalised manner in a commercial post-edited MT (PEMT) context. An approach to MT evaluation will be presented which exposes translators to output from a set of candidate MT systems and reports back on which system requires the least post-editing. It is hoped that this approach, combined with instrumentation mechanisms for tracking the performance and behaviour of individual post-editors, will give insight into which MT system, if any, out of a set of candidate systems is most suitable for a particular large or ongoing technical translation project. For the longer term we propose that post-editing data gathered in a commercial context may be valuable to MT researchers.
J. Moran, D. Lewis, Centre for Next Generation Localization