Clustering categorical data based on the relational analysis approach and MapReduce

6Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The traditional methods of clustering are unable to cope with the exploding volume of data that the world is currently facing. As a solution to this problem, the research is intensified in the direction of parallel clustering methods. Although there is a variety of parallel programming models, the MapReduce paradigm is considered as the most prominent model for problems of large scale data processing of which the clustering. This paper introduces a new parallel design of a recently appeared heuristic for hard clustering using the MapReduce programming model. In this heuristic, clustering is performed by efficiently partitioning categorical large data sets according to the relational analysis approach. The proposed design, called PMR-Transitive, is a single-scan and parameter-free heuristic which determines the number of clusters automatically. The experimental results on real-life and synthetic data sets demonstrate that PMR-Transitive produces good quality results.

Cite

CITATION STYLE

APA

Lamari, Y., & Slaoui, S. C. (2017). Clustering categorical data based on the relational analysis approach and MapReduce. Journal of Big Data, 4(1). https://doi.org/10.1186/s40537-017-0090-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free