Crowdsourcing speech and language data for resource-poor languages

Hamdy Mubarak

Conference Proceedings

Crowdsourcing speech and language data for resource-poor languages

Mubarak H

Advances in Intelligent Systems and Computing (2018) 639 440-447

DOI: 10.1007/978-3-319-64861-3_41

0Citations

3Readers

Get full text

Abstract

In this paper, we present benefits of using crowdsourcing to build speech and language resources for different annotation tasks for dialectal Arabic as an example of resource-poor languages. We show recommendations for job design and quality control that allow us to build high quality data for variety of tasks. Most of these recommendations are language-independent and can be applied to other languages as well. We summarize lessons learned from experiments in data acquisition tasks, such as image annotation (transcription of Arabic historical documents), machine translation (translation from English to Hindi), speech annotation (transcription of dialectal Arabic audio files), text annotation (conversion from dialectal Arabic to Modern Standard Arabic (MSA)), and text classification (annotation of offensive language on Arabic social media, and classification of questions on Arabic medical web forums).

Author supplied keywords

Cite

CITATION STYLE

APA

Mubarak, H. (2018). Crowdsourcing speech and language data for resource-poor languages. In Advances in Intelligent Systems and Computing (Vol. 639, pp. 440–447). Springer Verlag. https://doi.org/10.1007/978-3-319-64861-3_41

Crowdsourcing speech and language data for resource-poor languages

Abstract

Author supplied keywords

Cite

Register to see more suggestions