Using Computational Approaches to Integrate Endangered Language Legacy Data into Documentation Corpora: Past Experiences and Challenges Ahead

Rogier Blokland; Niko Partanen; Michael Rießler; Joshua Wilbur

Journal ArticleOPEN ACCESS

Using Computational Approaches to Integrate Endangered Language Legacy Data into Documentation Corpora: Past Experiences and Challenges Ahead

Blokland R
Partanen N
Rießler M
et al.

Proceedings of the Workshop on Computational Methods for Endangered Languages (2019) 2(1)

DOI: 10.33011/computel.v2i.451

N/ACitations

6Readers

Abstract

The systematic integration of pre-digital published transcriptions of legacy language materials offers many possiblities to enrich documentary corpora with data that is often very comparable to contemporary collections, and often originating from the same speech communities reesearchers currently work with. Especially recent advances in text recognition technologies make the reuse of old materials a very attractive and accessible task. However, the output of text recognition needs to be connected to further parts of the pipeline, namely forced alignment and speeech recognition. The workflows discussed here attempt to reach a maximally useful situation where legacy data is transformed into a usable and comparable format, but not transformed into a time aligned corpus.

Cite

CITATION STYLE

APA

Blokland, R., Partanen, N., Rießler, M., & Wilbur, J. (2019). Using Computational Approaches to Integrate Endangered Language Legacy Data into Documentation Corpora: Past Experiences and Challenges Ahead. Proceedings of the Workshop on Computational Methods for Endangered Languages, 2(1). https://doi.org/10.33011/computel.v2i.451

Using Computational Approaches to Integrate Endangered Language Legacy Data into Documentation Corpora: Past Experiences and Challenges Ahead

Abstract

Cite

Register to see more suggestions