Clustering of concept-drift categorical data implementation in JAVA

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Identification of useful clusters in large datasets has attracted considerable interest in clustering process. Clustering categorical data is a hard choice when compared to the numerical data, because the similarity measures in the traditional clustering algorithms uses distances between points to generate clusters that are not appropriate for Boolean and categorical attributes. Since data in the World Wide Web is increasing exponentially that affects on clustering accuracy and decision making, change in the concept between every cluster occurs named concept drift. To detect the difference of cluster distributions between the current data subset and previous clustering result, an algorithm called Drifting Concept Detection(DCD) which uses sliding window and node importance has been presented and implemented in JAVA language by considering "usenet" dataset in which every data point is the message and the node is the word. Hence it is challenging in the problem of clustering concept-drift categorical data. In this paper, few concepts have been implemented to produce the appropriate clustering results by minimizing the clustering process as the time evolving data comes into the sliding window every time that minimizes I/O costs and number of concept drifts decreases if sliding window size increases. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Reddy Madhavi, K., Vinaya Babu, A., & Viswanadha Raju, S. (2012). Clustering of concept-drift categorical data implementation in JAVA. In Communications in Computer and Information Science (Vol. 270 CCIS, pp. 639–654). https://doi.org/10.1007/978-3-642-29216-3_70

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free