Generating Varied Training Corpora in Runyankore Using a Combined Semantic and Syntactic, Pattern-Grammar-based Approach

Joan Byamugisha

Conference Proceedings

Generating Varied Training Corpora in Runyankore Using a Combined Semantic and Syntactic, Pattern-Grammar-based Approach

Byamugisha J

INLG 2020 - 13th International Conference on Natural Language Generation, Proceedings (2020) 273-282

DOI: 10.18653/v1/2020.inlg-1.34

1Citations

65Readers

Get full text

Abstract

Machine learning algorithms have been applied to achieve high levels of accuracy in tasks associated with the processing of natural language. However, these algorithms require large amounts of training data in order to perform efficiently. Since most Bantu languages lack the required training corpora because they are computationally under-resourced, we investigated how to generate a large varied training corpus in Runyankore, a Bantu language indigenous to Uganda. We found the use of a combined semantic and syntactic, pattern and grammar-based approach to be applicable to this purpose, and used it to generate one million sentences, both labelled and unlabelled, which can be applied as training data for machine learning algorithms. The generated text was evaluated in two ways: (1) assessing the semantics encoded in word embeddings obtained from the generated text, which showed correct word similarity; and (2) applying the labelled data to tasks such as sentiment analysis, which achieved satisfactory levels of accuracy.

Cite

CITATION STYLE

APA

Byamugisha, J. (2020). Generating Varied Training Corpora in Runyankore Using a Combined Semantic and Syntactic, Pattern-Grammar-based Approach. In INLG 2020 - 13th International Conference on Natural Language Generation, Proceedings (pp. 273–282). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.inlg-1.34

Generating Varied Training Corpora in Runyankore Using a Combined Semantic and Syntactic, Pattern-Grammar-based Approach

Abstract

Cite

Register to see more suggestions