Sub-character neural language modelling in japanese

Viet Nguyen; Julian Brooke; Timothy Baldwin

Conference ProceedingsOPEN ACCESS

Sub-character neural language modelling in japanese

EMNLP 2017 - 1st Workshop on Subword and Character Level Models in NLP, SCLeM 2017 - Proceedings of the Workshop (2017) 148-153

DOI: 10.18653/v1/w17-4122

10Citations

81Readers

Abstract

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements. This paper examines the effect of using subcharacters for language modeling in Japanese. This is achieved by decomposing characters according to a range of character decomposition datasets, and training a neural language model over variously decomposed character representations. Our results indicate that language modelling can be improved through the inclusion of subcharacters, though this result depends on a good choice of decomposition dataset and the appropriate granularity of decomposition.

Cite

CITATION STYLE

APA

Nguyen, V., Brooke, J., & Baldwin, T. (2017). Sub-character neural language modelling in japanese. In EMNLP 2017 - 1st Workshop on Subword and Character Level Models in NLP, SCLeM 2017 - Proceedings of the Workshop (pp. 148–153). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4122

Sub-character neural language modelling in japanese

Abstract

Cite

Register to see more suggestions