Abstract
The combination of Spark distributed platform and High-Utility Itemset Mining can solve the problem of long running time issue of High-Utility Itemset Mining. In the experiment, we conclude that Spark-based parallel D2HUP and EFIM algorithms have a greater improvement in running time efficiency than serial algorithms. The existing research has shown that the EFIM and D2HUP algorithms are the two best algorithms for High-Utility Itemset Mining. This paper generates 118 datasets by generating and collecting the running time of the two algorithms in the real and simulated datasets, taking into account the characteristics of each dataset's length, sparse degree, and dataset size as characteristics with running time as the prediction target and then establishing a model. The accuracy of the prediction was evaluated through experiments, and a set of rules based on decision trees was generated. According to the rules, the fastest algorithm between EFIM and D2HUP can be predicted very well.
Cite
CITATION STYLE
Jie, H., & Dehua, W. (2022). Design and Implementation of Automatic Selection of the Most Efficient Itemset Algorithm Based on Spark. Scientific Programming, 2022. https://doi.org/10.1155/2022/3362203
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.