Joint embeddings of Chinese words, characters, and fine-grained subcharacter components

122Citations
Citations of this article
151Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Word embeddings have attracted much attention recently. Different from alphabetic writing systems, Chinese characters are often composed of subcharacter components which are also semantically informative. In this work, we propose an approach to jointly embed Chinese words as well as their characters and fine-grained subcharacter components. We use three likelihoods to evaluate whether the context words, characters, and components can predict the current target word, and collected 13,253 subcharacter components to demonstrate the existing approaches of decomposing Chinese characters are not enough. Evaluation on both word similarity and word analogy tasks demonstrates the superior performance of our model.

Cite

CITATION STYLE

APA

Yu, J., Jian, X., Xin, H., & Song, Y. (2017). Joint embeddings of Chinese words, characters, and fine-grained subcharacter components. In EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 286–291). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d17-1027

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free