A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Zaihe Cheng; Wei Shen; Wei Fang; Jerry Chun Wei Lin

Journal ArticleOPEN ACCESS

A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Complex System Modeling and Simulation (2023) 3(1) 47-58

DOI: 10.23919/CSMS.2022.0023

4Citations

11Readers

Abstract

High-utility itemset mining (HUIM) can consider not only the profit factor but also the profitable factor, which is an essential task in data mining. However, most HUIM algorithms are mainly developed on a single machine, which is inefficient for big data since limited memory and processing capacities are available. A parallel efficient high-utility itemset mining (P-EFIM) algorithm is proposed based on the Hadoop platform to solve this problem in this paper. In P-EFIM, the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework. Then the ordered itemsets are renumbered, and the low-utility itemsets are pruned to improve the dataset utility. In the Map phase, the P-EFIM algorithm divides the task into multiple independent subtasks. It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing. Furthermore, the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase. Experiments are performed on eight datasets, and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth, which is also HUIM algorithm based on the Hadoop framework.

Author supplied keywords

Cite

CITATION STYLE

APA

Cheng, Z., Shen, W., Fang, W., & Lin, J. C. W. (2023). A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop. Complex System Modeling and Simulation, 3(1), 47–58. https://doi.org/10.23919/CSMS.2022.0023

A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions