Document classification using enhanced grid based clustering algorithm

Mohamed Ahmed Rashad; Hesham El-Deeb; Mohamed Waleed Fakhr

Journal Article

Document classification using enhanced grid based clustering algorithm

Lecture Notes in Electrical Engineering (2015) 312 207-215

DOI: 10.1007/978-3-319-06764-3_27

3Citations

7Readers

Get full text

Abstract

Automated document clustering is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text clustering aims to automatically assign the text to a predefined cluster based on linguistic features. This research proposes an enhanced grid based clustering algorithm. The main purpose of this algorithm is to divide the data space into clusters with arbitrary shape. These clusters are considered as dense regions of points in the data space that are separated by regions of low density representing noise. Also it deals with making clustering the data set with multi-densities and assigning noise and outliers to the closest category. This will reduce the time complexity. Unclassified documents are preprocessed by removing stops words and extracting word root used to reduce the dimensionality of feature vectors of documents. Each document is then represented as a vector of words and their frequencies. The accuracy is presented according to time consumption and the percentage of successfully clustered instances. The results of the experiments that were carried out on an in-house collected Arabic text have proven its effectiveness of the enhanced clustering algorithm with average accuracy 89 %.

Author supplied keywords

Cite

CITATION STYLE

APA

Rashad, M. A., El-Deeb, H., & Fakhr, M. W. (2015). Document classification using enhanced grid based clustering algorithm. Lecture Notes in Electrical Engineering, 312, 207–215. https://doi.org/10.1007/978-3-319-06764-3_27

Document classification using enhanced grid based clustering algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions