Document classification using enhanced grid based clustering algorithm

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automated document clustering is an important text mining task especially with the rapid growth of the number of online documents present in Arabic language. Text clustering aims to automatically assign the text to a predefined cluster based on linguistic features. This research proposes an enhanced grid based clustering algorithm. The main purpose of this algorithm is to divide the data space into clusters with arbitrary shape. These clusters are considered as dense regions of points in the data space that are separated by regions of low density representing noise. Also it deals with making clustering the data set with multi-densities and assigning noise and outliers to the closest category. This will reduce the time complexity. Unclassified documents are preprocessed by removing stops words and extracting word root used to reduce the dimensionality of feature vectors of documents. Each document is then represented as a vector of words and their frequencies. The accuracy is presented according to time consumption and the percentage of successfully clustered instances. The results of the experiments that were carried out on an in-house collected Arabic text have proven its effectiveness of the enhanced clustering algorithm with average accuracy 89 %.

Cite

CITATION STYLE

APA

Rashad, M. A., El-Deeb, H., & Fakhr, M. W. (2015). Document classification using enhanced grid based clustering algorithm. Lecture Notes in Electrical Engineering, 312, 207–215. https://doi.org/10.1007/978-3-319-06764-3_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free