Abstract
In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using subcharacters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of subcharacters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.
Cite
CITATION STYLE
Nguyen, V., Brooke, J., & Baldwin, T. (2017). Sub-character neural language modelling in japanese. In EMNLP 2017 - 1st Workshop on Subword and Character Level Models in NLP, SCLeM 2017 - Proceedings of the Workshop (pp. 148–153). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4122
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.