Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark

Huang Jie; Wu Dehua

Journal ArticleOPEN ACCESS

Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark

Scientific Programming (2022) 2022

DOI: 10.1155/2022/3362203

0Citations

7Readers

Abstract

The combination of Spark distributed platform and High-Utility Itemset Mining can solve the problem of long running time issue of High-Utility Itemset Mining. In the experiment, we conclude that Spark-based parallel D2HUP and EFIM algorithms have a greater improvement in running time efficiency than serial algorithms. The existing research has shown that the EFIM and D2HUP algorithms are the two best algorithms for High-Utility Itemset Mining. This paper generates 118 datasets by generating and collecting the running time of the two algorithms in the real and simulated datasets, taking into account the characteristics of each dataset's length, sparse degree, and dataset size as characteristics with running time as the prediction target and then establishing a model. The accuracy of the prediction was evaluated through experiments, and a set of rules based on decision trees was generated. According to the rules, the fastest algorithm between EFIM and D2HUP can be predicted very well.

Cite

CITATION STYLE

APA

Jie, H., & Dehua, W. (2022). Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark. Scientific Programming, 2022. https://doi.org/10.1155/2022/3362203

Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark

Abstract

Cite

Register to see more suggestions