An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

8Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we try to find empirically the optimal dimensionality in data-driven models, Latent Semantic Analysis (LSA) model and Probabilistic Latent Semantic Analysis (PLSA) model. These models are used for building linguistic semantic knowledge which could be used in estimating contextual semantic similarity for the target word selection in English-Korean machine translation. We also facilitate k-Nearest Neighbor learning algorithm. We diversify our experiments by analyzing the covariance between the value of k in k-NN learning and accuracy of selection, in addition to that between the dimensionality and the accuracy. While we could not find regular tendency of relationship between the dimensionality and the accuracy, however, we could find the optimal dimensionality having the most sound distribution of data during experiments.

Cite

CITATION STYLE

APA

Kim, Y. S., Chang, J. H., & Zhang, B. T. (2003). An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2637, pp. 111–116). Springer Verlag. https://doi.org/10.1007/3-540-36175-8_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free