An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

Yu Seop Kim; Jeong Ho Chang; Byoung Tak Zhang

Conference Proceedings

An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2003) 2637 111-116

DOI: 10.1007/3-540-36175-8_11

8Citations

11Readers

Get full text

Abstract

In this paper, we try to find empirically the optimal dimensionality in data-driven models, Latent Semantic Analysis (LSA) model and Probabilistic Latent Semantic Analysis (PLSA) model. These models are used for building linguistic semantic knowledge which could be used in estimating contextual semantic similarity for the target word selection in English-Korean machine translation. We also facilitate k-Nearest Neighbor learning algorithm. We diversify our experiments by analyzing the covariance between the value of k in k-NN learning and accuracy of selection, in addition to that between the dimensionality and the accuracy. While we could not find regular tendency of relationship between the dimensionality and the accuracy, however, we could find the optimal dimensionality having the most sound distribution of data during experiments.

Author supplied keywords

Cite

CITATION STYLE

APA

Kim, Y. S., Chang, J. H., & Zhang, B. T. (2003). An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2637, pp. 111–116). Springer Verlag. https://doi.org/10.1007/3-540-36175-8_11

An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

Abstract

Author supplied keywords

Cite

Register to see more suggestions