Towards customized automatic segmentation of subtitles

Aitor Álvarez; Haritz Arzelus; Thierry Etchegoyhen

Journal Article

Towards customized automatic segmentation of subtitles

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8854 229-238

DOI: 10.1007/978-3-319-13623-3_24

17Citations

18Readers

Get full text

Abstract

Automatic subtitling through speech recognition technology has become an important topic in recent years, where the effort has mostly centered on improving core speech technology to obtain better recognition results. However, subtitling quality also depends on other parameters aimed at favoring the readability and quick understanding of subtitles, like correct subtitle line segmentation. In this work, we present an approach to automate the segmentation of subtitles through machine learning techniques, allowing the creation of customized models adapted to the specific segmentation rules of subtitling companies. Support Vector Machines and Logistic Regression classifiers were trained over a reference corpus of subtitles manually created by professionals and used to segment the output of speech recognition engines. We describe the performance of both classifiers and discuss the merits of the approach for the automatic segmentation of subtitles.

Author supplied keywords

Cite

CITATION STYLE

APA

Álvarez, A., Arzelus, H., & Etchegoyhen, T. (2014). Towards customized automatic segmentation of subtitles. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8854, 229–238. https://doi.org/10.1007/978-3-319-13623-3_24

Towards customized automatic segmentation of subtitles

Abstract

Author supplied keywords

Cite

Register to see more suggestions