Meta-MapReduce for scalable data mining

Xuan Liu; Xiaoguang Wang; Stan Matwin; Nathalie Japkowicz

Journal ArticleOPEN ACCESS

Meta-MapReduce for scalable data mining

Journal of Big Data (2015) 2(1)

DOI: 10.1186/s40537-015-0021-4

17Citations

46Readers

Abstract

W e h a v e e n t e r e d t h e b i g data age. Knowledge extraction from massive data is becoming more and more urgent. MapReduce provides a feasible framework for programming machine learning algorithms in Map and Reduce functions. The relatively simple programming interface has helped to solve machine learning algorithms’ scalability problems. However, this framework suffers from an obvious weakness: it does not support iterations. This makes it difficult for algorithms requiring iterations to fully explore the efficiency of MapReduce. In this paper, we propose to apply Meta-learning programmed with MapReduce to avoid parallelizing machine learning algorithms while also improving their scalability to big datasets. The experiments conducted on Hadoop’s fully distributed mode on Amazon EC2 demonstrate that our algorithm Meta-MapReduce (MMR) reduces the training computational complexity significantly when the number of computing nodes increases while obtaining smaller error rates than those on a single node. The comparison of MMR with the contemporary parallelized Ad a B oost algorithm, AdaBoost.PL, shows that MMR obtains lower error rates.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, X., Wang, X., Matwin, S., & Japkowicz, N. (2015). Meta-MapReduce for scalable data mining. Journal of Big Data, 2(1). https://doi.org/10.1186/s40537-015-0021-4

Meta-MapReduce for scalable data mining

Abstract

Author supplied keywords

Cite

Register to see more suggestions