We describe a new version of MEANT, which participated in the metrics task of the Second Conference on Machine Translation (WMT 2017). MEANT 2.0 uses idf-weighted distributional ngram accuracy to determine the phrasal similarity of semantic role fillers and yields better correlations with human judgments of translation quality than earlier versions. The improved phrasal similarity enables a subversion of MEANT to accurately evaluate translation adequacy for any output language, even languages without an automatic semantic parser. Our results show that MEANT, which is a non-ensemble and untrained metric, consistently performs as well as the top participants in previous years - including ensemble and trained ones - across different output languages. We also present the timing statistics for MEANT for better estimation of the evaluation cost. MEANT 2.0 is open source and publicly available.
CITATION STYLE
Lo, C. K. (2017). MEANT 2.0: Accurate semantic MT evaluation for any output language. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 589–597). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4767
Mendeley helps you to discover research relevant for your work.