Which Embedding Level is Better for Semantic Representation? An Empirical Research on Chinese Phrases

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Word embeddings have been used as popular features in various Natural Language Processing(NLP) tasks. To overcome the coverage problem of statistics, compositional model is proposed, which embeds basic units of a language, and compose structures of higher hierarchy, like idiom, phrase, and named entity. In that case, selecting the right level of basic-unit embedding to represent semantics of higher hierarchy unit is crucial. This paper investigates this problem by Chinese phrase representation task, in which language characters and words are viewed as basic units. We define a phrase representation evaluation tasks by utilizing Wikipedia. We propose four intuitionistic composing methods from basic embedding to higher level representation, and investigate the performance of the two basic units. Empirical results show that with all composing methods, word embedding out performs character embedding on both tasks, which indicates that word level is more suitable for composing semantic representation.

Cite

CITATION STYLE

APA

Pang, K., Tang, J., & Wang, T. (2018). Which Embedding Level is Better for Semantic Representation? An Empirical Research on Chinese Phrases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11109 LNAI, pp. 54–66). Springer Verlag. https://doi.org/10.1007/978-3-319-99501-4_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free