What is the basic semantic unit of chinese language? A computational approach based on topic models

6Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Chinese language has been generally regarded as a Subject-Verb -Object (SVO) language and the basic semantic unit is the Chinese word that is usually consisted by two or more Chinese characters. However, word-centered structure of Chinese language has been controversial in linguistics. Some recent research in computational linguistics in Chinese language suggests that the character-based models perform better than the word-based models in some applications such word segmentation. In this paper, the word-based topic models and the character-based models are tested for modeling Chinese language, respectively. By empirical studies, we demonstrated the effectiveness of using Chinese characters as the basic semantic units. These two models have close performance in text classifications while the character-based model has a better quality in language modeling and a much smaller vocabulary. By testing on a bilingual corpus, three independent topic models based on Chinese words, Chinese characters and English words are trained and compared to each other. we verify the capability of topic models in modeling semantics by experiments across Chinese and English. The classification accuracy can also be boosted up by aggregating the classification results from the three independent topic models. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Zhao, Q., Qin, Z., & Wan, T. (2011). What is the basic semantic unit of chinese language? A computational approach based on topic models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6878 LNAI, pp. 143–157). https://doi.org/10.1007/978-3-642-23211-4_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free