Crowdsourcing speech and language data for resource-poor languages

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we present benefits of using crowdsourcing to build speech and language resources for different annotation tasks for dialectal Arabic as an example of resource-poor languages. We show recommendations for job design and quality control that allow us to build high quality data for variety of tasks. Most of these recommendations are language-independent and can be applied to other languages as well. We summarize lessons learned from experiments in data acquisition tasks, such as image annotation (transcription of Arabic historical documents), machine translation (translation from English to Hindi), speech annotation (transcription of dialectal Arabic audio files), text annotation (conversion from dialectal Arabic to Modern Standard Arabic (MSA)), and text classification (annotation of offensive language on Arabic social media, and classification of questions on Arabic medical web forums).

Cite

CITATION STYLE

APA

Mubarak, H. (2018). Crowdsourcing speech and language data for resource-poor languages. In Advances in Intelligent Systems and Computing (Vol. 639, pp. 440–447). Springer Verlag. https://doi.org/10.1007/978-3-319-64861-3_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free