The C-ORAL-BRASIL I: Reference corpus for informal spoken Brazilian Portuguese

Tommaso Raso; Heliana Mello

Conference Proceedings

The C-ORAL-BRASIL I: Reference corpus for informal spoken Brazilian Portuguese

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7243 LNAI 362-367

DOI: 10.1007/978-3-642-28885-2_40

12Citations

7Readers

Get full text

Abstract

The C-ORAL-BRASIL is a Brazilian Portuguese spontaneous speech corpus, representative of the state of Minas Gerais diatopy (primarily from the capital city, Belo Horizonte,metropolitan area). The corpus was compiled following the same architecture and segmentation criteria adopted by the C-ORAL-ROM [1] as well as its alignment software, the WinPitch [2]. The corpus comprises 139 informal speech texts, 208,130 words, 21:08:52 hours of recording (6.1 GB wav files). The mean word number per text is 1,500. The recordings were carried out with high resolution, non-invasive wireless equipment, generally with clip-on, monodirectional microphones, and a mixer whenever there were more than two interactants, in a few occasions omnidirectional microphones were used. The texts are transcribed following the CHAT format [3], implemented for prosodic annotation [4]. The main goals for the corpus architecture are the documentation of the diaphasic and diastratic variations in Brazilian Portuguese speech. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Raso, T., & Mello, H. (2012). The C-ORAL-BRASIL I: Reference corpus for informal spoken Brazilian Portuguese. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7243 LNAI, pp. 362–367). https://doi.org/10.1007/978-3-642-28885-2_40

The C-ORAL-BRASIL I: Reference corpus for informal spoken Brazilian Portuguese

Abstract

Author supplied keywords

Cite

Register to see more suggestions