Performing Hierarchical Clustering on Huge Volumes of Data Using Enhanced Mapreduce Technique

K. Maheswari; M. Ramakrishnan

Book Chapter

Performing Hierarchical Clustering on Huge Volumes of Data Using Enhanced Mapreduce Technique

Springer, (2020), 315-324

DOI: 10.1007/978-981-15-3284-9_36

0Citations

1Readers

Get full text

Abstract

Among the various methods of clustering, hierarchical clustering is advantageous in many aspects. The implication of hierarchical clustering on large volumes of data is difficult as these data are normally unstructured, heterogeneous, in huge volumes, contains various types of noise and volatile. The Mapreduce framework is used to analyze huge volumes of data under parallel and distributed fashion. The efficiency of the algorithm can be improved by two optimization techniques viz. co-occurrence based feature selection and batch updating are used. Hence this paper presents a hierarchical clustering method using enhanced version of mapreduce framework for huge volumes of data. The research is conducted on web access log file containing 512 GB of data. The outcome of the results conducted by the algorithm show that the proposed method outperforms traditional clustering methods in terms of execution time and number of clusters formed.

Author supplied keywords

Cite

CITATION STYLE

APA

Maheswari, K., & Ramakrishnan, M. (2020). Performing Hierarchical Clustering on Huge Volumes of Data Using Enhanced Mapreduce Technique. In Lecture Notes in Networks and Systems (Vol. 118, pp. 315–324). Springer. https://doi.org/10.1007/978-981-15-3284-9_36

Performing Hierarchical Clustering on Huge Volumes of Data Using Enhanced Mapreduce Technique

Abstract

Author supplied keywords

Cite

Register to see more suggestions