Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Among the well-known accessibility services for audiovisual media are subtitling for the deaf and hard-of-hearing, audio description, and sign language interpreting. More recently, automatic text simplification has emerged as a topic in the context of media accessibility, with research often approaching the task as a case of (sentence-based) monolingual machine translation. This approach relies on large amounts of high-quality parallel data, which is why monolingual sentence alignment has gained momentum. Alignment for text simplification is a complex task, with alignments often taking the form of n:m (in contrast to the standard case of 1:1 in machine translation). In this contribution, we evaluate the performance of different alignment methods against a human-created gold standard of standard German/simplified German sentence alignments created from a number of parallel corpora. Two of the corpora contain multiple levels of simplification. We employ a variety of alignment methods developed for monolingual tasks and bilingual sentence alignment. We explore strategies such as ensembling and score-based filtering to further improve the performance over these baselines. We show that combining multiple alignment methods with various hard voting strategies can outperform even the best individual methods and that we achieve similar results with score-based filtering of extracted alignments to find the most promising candidates. Our results motivate the notion that the overall task of sentence alignment for automatic simplification of German should be viewed as a two-step process that goes beyond the application of individual alignment methods.

Cite

CITATION STYLE

APA

Spring, N., Kostrzewa, M., Rios, A., & Ebling, S. (2022). Ensembling and Score-Based Filtering in Sentence Alignment for Automatic Simplification of German Texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13308 LNCS, pp. 137–149). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-05028-2_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free