CHRF ++: Words helping character n-grams

Maja Popovic

Conference ProceedingsOPEN ACCESS

CHRF ++: Words helping character n-grams

Popovic M

WMT 2017 - 2nd Conference on Machine Translation, Proceedings (2017) 612-618

DOI: 10.18653/v1/w17-4770

420Citations

144Readers

Abstract

Character n-gram F-score (CHRF) is shown to correlate very well with human relative rankings of different machine translation outputs, especially for morphologically rich target languages. However, its relation with direct human assessments is not yet clear. In this work, Pearson's correlation coefficients for direct assessments are investigated for two currently available target languages, English and Russian. First, different ß parameters (in range from 1 to 3) are re-investigated with direct assessment, and it is confirmed that ß = 2 is the optimal option. Then separate character and word n-grams are investigated, and the main finding is that, apart from character n-grams, word 1-grams and 2-grams also correlate rather well with direct assessments. Further experiments show that adding word unigrams and bigrams to the standard CHRF score improves the correlations with direct assessments, though it is still not clear which option is better, unigrams only (CHRF+) or unigrams and bigrams (CHRF++). This should be investigated in future work on more target languages.

Cite

CITATION STYLE

APA

Popovic, M. (2017). CHRF ++: Words helping character n-grams. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 612–618). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4770

CHRF ++: Words helping character n-grams

Abstract

Cite

Register to see more suggestions