Document classification method based on latent semantic indexing

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters.

Cite

CITATION STYLE

APA

Kim, J. J., Lee, Y. S., Moon, J. Y., & Park, J. M. (2018). Document classification method based on latent semantic indexing. International Journal of Grid and Distributed Computing, 11(4), 97–112. https://doi.org/10.14257/ijgdc.2018.11.4.09

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free