MiSeRe-Hadoop: A large-scale robust sequential classification rules mining framework

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sequence classification has become a fundamental problem in data mining and machine learning. Feature based classification is one of the techniques that has been used widely for sequence classification. Mining sequential classification rules plays an important role in feature based classification. Despite the abundant literature in this area, mining sequential classification rules is still a challenge; few of the available methods are sufficiently scalable to handle large-scale datasets. MapReduce is an ideal framework to support distributed computing on large data sets on clusters of computers. In this paper, we propose a distributed version of MiSeRe algorithm on MapReduce, called MiSeRe-Hadoop. MiSeRe-Hadoop holds the same valuable properties as MiSeRe, i.e., it is: (i) robust and user parameter-free anytime algorithm and (ii) it employs an instance-based randomized strategy to promote diversity mining. We have applied our method on two real-world large datasets: a marketing dataset and a text dataset. Our results confirm that our method is scalable for large scale sequential data analysis.

Cite

CITATION STYLE

APA

Egho, E., Gay, D., Trinquart, R., Boullé, M., Voisine, N., & Clérot, F. (2017). MiSeRe-Hadoop: A large-scale robust sequential classification rules mining framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10440 LNCS, pp. 105–119). Springer Verlag. https://doi.org/10.1007/978-3-319-64283-3_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free