Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The combination of Spark distributed platform and High-Utility Itemset Mining can solve the problem of long running time issue of High-Utility Itemset Mining. In the experiment, we conclude that Spark-based parallel D2HUP and EFIM algorithms have a greater improvement in running time efficiency than serial algorithms. The existing research has shown that the EFIM and D2HUP algorithms are the two best algorithms for High-Utility Itemset Mining. This paper generates 118 datasets by generating and collecting the running time of the two algorithms in the real and simulated datasets, taking into account the characteristics of each dataset's length, sparse degree, and dataset size as characteristics with running time as the prediction target and then establishing a model. The accuracy of the prediction was evaluated through experiments, and a set of rules based on decision trees was generated. According to the rules, the fastest algorithm between EFIM and D2HUP can be predicted very well.

Cite

CITATION STYLE

APA

Jie, H., & Dehua, W. (2022). Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark. Scientific Programming, 2022. https://doi.org/10.1155/2022/3362203

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free