Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system

Olga Khomitsevich; Pavel Chistikov; Tatiana Krivosheeva; Natalia Epimakhova; Irina Chernykh

Conference Proceedings

Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9319 161-169

DOI: 10.1007/978-3-319-23132-7_20

8Citations

4Readers

Get full text

Abstract

We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71% for periods, 46–66% for commas, 19–47% for question marks, and 77–87% for “mark/no mark” classification. The results for recognizer output are 46–66% for periods, 43–60% for commas, 10–38% for questions, and 64–80% for “mark/no mark”.

Author supplied keywords

Cite

CITATION STYLE

APA

Khomitsevich, O., Chistikov, P., Krivosheeva, T., Epimakhova, N., & Chernykh, I. (2015). Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9319, pp. 161–169). Springer Verlag. https://doi.org/10.1007/978-3-319-23132-7_20

Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system

Abstract

Author supplied keywords

Cite

Register to see more suggestions