Among the various methods of clustering, hierarchical clustering is advantageous in many aspects. The implication of hierarchical clustering on large volumes of data is difficult as these data are normally unstructured, heterogeneous, in huge volumes, contains various types of noise and volatile. The Mapreduce framework is used to analyze huge volumes of data under parallel and distributed fashion. The efficiency of the algorithm can be improved by two optimization techniques viz. co-occurrence based feature selection and batch updating are used. Hence this paper presents a hierarchical clustering method using enhanced version of mapreduce framework for huge volumes of data. The research is conducted on web access log file containing 512 GB of data. The outcome of the results conducted by the algorithm show that the proposed method outperforms traditional clustering methods in terms of execution time and number of clusters formed.
CITATION STYLE
Maheswari, K., & Ramakrishnan, M. (2020). Performing Hierarchical Clustering on Huge Volumes of Data Using Enhanced Mapreduce Technique. In Lecture Notes in Networks and Systems (Vol. 118, pp. 315–324). Springer. https://doi.org/10.1007/978-981-15-3284-9_36
Mendeley helps you to discover research relevant for your work.