Abstract
In this paper, we investigate different approaches in crowdsourcing transcriptions of Dialectal Arabic speech with automatic quality control to ensure good transcription at the source. Since Dialectal Arabic has no standard orthographic representation, it is very challenging to perform quality control. We propose a complete recipe for speech transcription quality control that includes using output of an Automatic Speech Recognition system. We evaluated the quality of the transcribed speech and through this recipe, we achieved a reduction in transcription error of 1.0% compared with 13.2% baseline with no quality control for Egyptian data, and down to 4% compared with 7.8% for the North African dialect.
Cite
CITATION STYLE
Wray, S., Mubarak, H., & Ali, A. (2015). Best practices for crowdsourcing dialectal arabic speech transcription. In 2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings (pp. 99–107). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3211
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.