Comparison and adaptation of automatic evaluation metrics for quality assessment of re-speaking

3Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

Re-speaking is a mechanism for obtaining high-quality subtitles for use in live broadcasts and other public events. Because it relies on humans to perform the actual re-speaking, the task of estimating the quality of the results is nontrivial. Most organizations rely on human effort to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems (like Machine Translation). This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER, and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking. The purpose of this paper is to assess whether the above automatic metrics normally used for MT system evaluation can be used in lieu of the manual NER metric to evaluate re-speaking transcripts.

Cite

CITATION STYLE

APA

Wołk, K., & Koržinek, D. (2017). Comparison and adaptation of automatic evaluation metrics for quality assessment of re-speaking. Computer Science, 18(2), 129–144. https://doi.org/10.7494/csci.2017.18.2.129

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free