Best practices for crowdsourcing dialectal arabic speech transcription

6Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we investigate different approaches in crowdsourcing transcriptions of Dialectal Arabic speech with automatic quality control to ensure good transcription at the source. Since Dialectal Arabic has no standard orthographic representation, it is very challenging to perform quality control. We propose a complete recipe for speech transcription quality control that includes using output of an Automatic Speech Recognition system. We evaluated the quality of the transcribed speech and through this recipe, we achieved a reduction in transcription error of 1.0% compared with 13.2% baseline with no quality control for Egyptian data, and down to 4% compared with 7.8% for the North African dialect.

Cite

CITATION STYLE

APA

Wray, S., Mubarak, H., & Ali, A. (2015). Best practices for crowdsourcing dialectal arabic speech transcription. In 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings (pp. 99–107). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3211

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free