Abstract
Character n-gram F-score (CHRF) is shown to correlate very well with human rankings of different machine translation outputs, especially for morphologically rich target languages. However, only two versions have been explored so far, namely CHRF1 (standard F-score, β = 1) and CHRF3 (β = 3), both with uniform n-gram weights. In this work, we investigated CHRF in more details, namely β parameters in range from 1/6 to 6, and we found out that CHRF2 is the most promising version. Then we investigated different n-gram weights for CHRF2 and found out that the uniform weights are the best option. Apart from this, CHRF scores were systematically compared with WORDF scores, and a preliminary experiment carried out on small amount of data with direct human scores indicates that the main advantage of CHRF is that it does not penalise too hard acceptable variations in high quality translations.
Cite
CITATION STYLE
Popović, M. (2016). CHRF deconstructed: β parameters and n-gram weights. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 499–504). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2341
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.