A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop

4Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

High-utility itemset mining (HUIM) can consider not only the profit factor but also the profitable factor, which is an essential task in data mining. However, most HUIM algorithms are mainly developed on a single machine, which is inefficient for big data since limited memory and processing capacities are available. A parallel efficient high-utility itemset mining (P-EFIM) algorithm is proposed based on the Hadoop platform to solve this problem in this paper. In P-EFIM, the transaction-weighted utilization values are calculated and ordered for the itemsets with the MapReduce framework. Then the ordered itemsets are renumbered, and the low-utility itemsets are pruned to improve the dataset utility. In the Map phase, the P-EFIM algorithm divides the task into multiple independent subtasks. It uses the proposed S-style distribution strategy to distribute the subtasks evenly across all nodes to ensure load-balancing. Furthermore, the P-EFIM uses the EFIM algorithm to mine each subtask dataset to enhance the performance in the Reduce phase. Experiments are performed on eight datasets, and the results show that the runtime performance of P-EFIM is significantly higher than that of the PHUI-Growth, which is also HUIM algorithm based on the Hadoop framework.

Cite

CITATION STYLE

APA

Cheng, Z., Shen, W., Fang, W., & Lin, J. C. W. (2023). A Parallel High-Utility Itemset Mining Algorithm Based on Hadoop. Complex System Modeling and Simulation, 3(1), 47–58. https://doi.org/10.23919/CSMS.2022.0023

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free