CORPRES: Corpus of Russian professionally read speech

Pavel Skrelin; Nina Volskaya; Daniil Kocharov; Karina Evgrafova; Olga Glotova; Vera Evdokimova

Conference Proceedings

CORPRES: Corpus of Russian professionally read speech

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6231 LNAI 392-399

DOI: 10.1007/978-3-642-15760-8_50

21Citations

5Readers

Get full text

Abstract

The paper introduces CORPRES - COrpus of Russian Professionally REad Speech developed at the Department of Phonetics, Saint Petersburg State University, as a result of a three-year project. The corpus includes samples of different speaking styles produced by 4 male and 4 female speakers. Six levels of annotation cover all phonetic and prosodic information about the recorded speech data, including labels for pitch marks, phonetic events, phonetic, orthographic and prosodic transcription. Precise phonetic transcription of the data provides an especially valuable resource for both research and development purposes. Overall corpus size is 60 hours of speech. The paper contains information about CORPRES design and annotation principles, and overall data description. Also, we discuss possible use of the corpus in phonetic research and speech technology as well as some findings on the Russian sound system obtained from the corpus data. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Skrelin, P., Volskaya, N., Kocharov, D., Evgrafova, K., Glotova, O., & Evdokimova, V. (2010). CORPRES: Corpus of Russian professionally read speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6231 LNAI, pp. 392–399). https://doi.org/10.1007/978-3-642-15760-8_50

CORPRES: Corpus of Russian professionally read speech

Abstract

Author supplied keywords

Cite

Register to see more suggestions