Distributed Gaussian mixture model summarization using the MapReduce framework

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Further, data summarization is a useful concept for managing large datasets, wherein a subset of the data can be used to provide an approximate yet useful representation. Consolidation of these tools can allow a distributed implementation of data summarization. In this paper, we achieve this by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework (MRSGMM). In MR-SGMM, we partition input data, cluster the data within each partition with a density-based clustering algorithm called DBSCAN, and for all clusters we discover SGMM core points and their features. We test the implementation with synthetic and real datasets to demonstrate its validity and efficiency. This paves the way for a scalable implementation of Summarization using Gaussian Mixture Model (SGMM).

Cite

CITATION STYLE

APA

Esmaeilpour, A., Bigdeli, E., Cheraghchi, F., Raahemi, B., & Far, B. H. (2016). Distributed Gaussian mixture model summarization using the MapReduce framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9673, pp. 323–335). Springer Verlag. https://doi.org/10.1007/978-3-319-34111-8_39

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free