Detecting content-heavy sentences: A cross-language case study

Junyi Jessy Li; Ani Nenkova

Conference ProceedingsOPEN ACCESS

Detecting content-heavy sentences: A cross-language case study

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (2015) 1271-1281

DOI: 10.18653/v1/d15-1148

9Citations

84Readers

Abstract

The information conveyed by some sentences would be more easily understood by a reader if it were expressed in multiple sentences. We call such sentences content heavy: these are possibly grammatical but difficult to comprehend, cumbersome sentences. In this paper we introduce the task of detecting content-heavy sentences in cross-lingual context. Specifically we develop methods to identify sentences in Chinese for which English speakers would prefer translations consisting of more than one sentence. We base our analysis and definitions on evidence from multiple human translations and reader preferences on flow and understandability. We show that machine translation quality when translating content heavy sentences is markedly worse than overall quality and that this type of sentence are fairly common in Chinese news. We demonstrate that sentence length and punctuation usage in Chinese are not sufficient clues for accurately detecting heavy sentences and present a richer classification model that accurately identifies these sentences.

Cite

CITATION STYLE

APA

Li, J. J., & Nenkova, A. (2015). Detecting content-heavy sentences: A cross-language case study. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 1271–1281). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1148

Detecting content-heavy sentences: A cross-language case study

Abstract

Cite

Register to see more suggestions