Spoken term discovery for language documentation using translations

Antonios Anastasopoulos; Sameer Bansal; Sharon Goldwater; Adam Lopez; David Chiang

Conference ProceedingsOPEN ACCESS

Spoken term discovery for language documentation using translations

EMNLP 2017 - 1st Workshop on Speech-Centric Natural Language Processing, SCNLP 2017 - Proceedings of the Workshop (2017) 53-58

DOI: 10.18653/v1/w17-4607

8Citations

74Readers

Abstract

Vast amounts of speech data collected for language documentation and research remain untranscribed and unsearchable, but often a small amount of speech may have text translations available. We present a method for partially labeling additional speech with translations in this scenario. We modify an unsupervised speech-totranslation alignment model and obtain prototype speech segments that match the translation words, which are in turn used to discover terms in the unlabelled data. We evaluate our method on a Spanish- English speech translation corpus and on two corpora of endangered languages, Arapaho and Ainu, demonstrating its appropriateness and applicability in an actual very-low-resource scenario.

Cite

CITATION STYLE

APA

Anastasopoulos, A., Bansal, S., Goldwater, S., Lopez, A., & Chiang, D. (2017). Spoken term discovery for language documentation using translations. In EMNLP 2017 - 1st Workshop on Speech-Centric Natural Language Processing, SCNLP 2017 - Proceedings of the Workshop (pp. 53–58). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4607

Spoken term discovery for language documentation using translations

Abstract

Cite

Register to see more suggestions