Character n-gram F-score (CHRF) is shown to correlate very well with human relative rankings of different machine translation outputs, especially for morphologically rich target languages. However, its relation with direct human assessments is not yet clear. In this work, Pearson's correlation coefficients for direct assessments are investigated for two currently available target languages, English and Russian. First, different ß parameters (in range from 1 to 3) are re-investigated with direct assessment, and it is confirmed that ß = 2 is the optimal option. Then separate character and word n-grams are investigated, and the main finding is that, apart from character n-grams, word 1-grams and 2-grams also correlate rather well with direct assessments. Further experiments show that adding word unigrams and bigrams to the standard CHRF score improves the correlations with direct assessments, though it is still not clear which option is better, unigrams only (CHRF+) or unigrams and bigrams (CHRF++). This should be investigated in future work on more target languages.
CITATION STYLE
Popovic, M. (2017). CHRF ++: Words helping character n-grams. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 612–618). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4770
Mendeley helps you to discover research relevant for your work.