Spoken term discovery for language documentation using translations

8Citations
Citations of this article
74Readers
Mendeley users who have this article in their library.

Abstract

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available. We present a method for partially labeling additional speech with translations in this scenario. We modify an unsupervised speech-totranslation alignment model and obtain prototype speech segments that match the translation words, which are in turn used to discover terms in the unlabelled data. We evaluate our method on a Spanish- English speech translation corpus and on two corpora of endangered languages, Arapaho and Ainu, demonstrating its appropriateness and applicability in an actual very-low-resource scenario.

Cite

CITATION STYLE

APA

Anastasopoulos, A., Bansal, S., Goldwater, S., Lopez, A., & Chiang, D. (2017). Spoken term discovery for language documentation using translations. In EMNLP 2017 - 1st Workshop on Speech-Centric Natural Language Processing, SCNLP 2017 - Proceedings of the Workshop (pp. 53–58). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4607

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free