Sub-character neural language modelling in japanese

10Citations
Citations of this article
81Readers
Mendeley users who have this article in their library.

Abstract

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using subcharacters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of subcharacters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

Cite

CITATION STYLE

APA

Nguyen, V., Brooke, J., & Baldwin, T. (2017). Sub-character neural language modelling in japanese. In EMNLP 2017 - 1st Workshop on Subword and Character Level Models in NLP, SCLeM 2017 - Proceedings of the Workshop (pp. 148–153). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4122

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free