Exploiting the Tibetan Radicals in Recurrent Neural Network for Low-Resource Language Models

Tongtong Shen; Longbiao Wang; Xie Chen; Kuntharrgyal Khysru; Jianwu Dang

Conference Proceedings

Exploiting the Tibetan Radicals in Recurrent Neural Network for Low-Resource Language Models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10635 LNCS 266-275

DOI: 10.1007/978-3-319-70096-0_28

4Citations

4Readers

Get full text

Abstract

In virtue of the superiority of handling the sequence data and the effectiveness of preserving long-distance information, recurrent neural network language model (RNNLM) has prevailed in a range of tasks in recent years. However, a large quantities of data are required for language modelling with good performance, which poses the difficulties of modeling for low-resource languages. To address this issue, Tibetan as one of minority languages is instantiated, and its radicals (components of Tibetan characters) are explored for constructing language model. Motivated by the inherent structure of Tibetan, a novel construction of Tibetan character embedding is exploited to RNNLM. The fusion of individual radical embedding is enhanced by three ways, including using uniform weight (TRU), different weights (TRD) and radical combination (TRC). This structure, especially combining with the radicals, can extend the capability to capture long-term context dependencies and solve the low-resource problem to some extent. The experimental results suggest that this proposed structure obtained a better performance than standard RNNLM, yielding 7.4%, 12.7% and 13.5% relative perplexity reduction by using TRU, TRD and TRC respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Shen, T., Wang, L., Chen, X., Khysru, K., & Dang, J. (2017). Exploiting the Tibetan Radicals in Recurrent Neural Network for Low-Resource Language Models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10635 LNCS, pp. 266–275). Springer Verlag. https://doi.org/10.1007/978-3-319-70096-0_28

Exploiting the Tibetan Radicals in Recurrent Neural Network for Low-Resource Language Models

Abstract

Author supplied keywords

Cite

Register to see more suggestions