A general method of mining chinese web documents based on GA&SA and position-factors

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering and classification are two important techniques of mining Web information. In this paper, a new adaptive method of mining Chinese documents from the internet is proposed. First, we give an algorithm of clustering documents which combines Genetic Algorithm(GA) and Simulated Annealing(SA) based on Boolean Model. This Algorithm avoids the disadvantage of clustering documents by using pure GA which can not be utilized accurately since GA converges too early and bogs the local optimum. Then, considering that the effect of classification with traditional Vector Space Model(VSM) is not satisfying enough since it is not related to the grades of importance of words, we add the positionfactors of key words into VSM and set up a new classifier model to classify Chinese Web documents. Experimental results indicate that this adaptive method can make the process of clustering and classification more accurate and reasonable comparing to the methods which does not have the positions of words considered. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Bai, X., Sun, J., Che, H., & Wang, J. (2007). A general method of mining chinese web documents based on GA&SA and position-factors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 410–420). https://doi.org/10.1007/978-3-540-77018-3_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free