A general method of mining chinese web documents based on GA&SA and position-factors

Xi Bai; Jigui Sun; Haiyan Che; Jin Wang

Conference ProceedingsOPEN ACCESS

A general method of mining chinese web documents based on GA&SA and position-factors

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4819 LNAI 410-420

DOI: 10.1007/978-3-540-77018-3_41

0Citations

2Readers

Get full text

Abstract

Clustering and classification are two important techniques of mining Web information. In this paper, a new adaptive method of mining Chinese documents from the internet is proposed. First, we give an algorithm of clustering documents which combines Genetic Algorithm(GA) and Simulated Annealing(SA) based on Boolean Model. This Algorithm avoids the disadvantage of clustering documents by using pure GA which can not be utilized accurately since GA converges too early and bogs the local optimum. Then, considering that the effect of classification with traditional Vector Space Model(VSM) is not satisfying enough since it is not related to the grades of importance of words, we add the positionfactors of key words into VSM and set up a new classifier model to classify Chinese Web documents. Experimental results indicate that this adaptive method can make the process of clustering and classification more accurate and reasonable comparing to the methods which does not have the positions of words considered. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Bai, X., Sun, J., Che, H., & Wang, J. (2007). A general method of mining chinese web documents based on GA&SA and position-factors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 410–420). https://doi.org/10.1007/978-3-540-77018-3_41

A general method of mining chinese web documents based on GA&SA and position-factors

Abstract

Cite

Register to see more suggestions