Mining high-utility itemsets (HUIs) is a popular data mining task, which consists of discovering sets of items that yield a high profit in a transaction database. Although HUI mining has numerous applications, a key limitation is that a single minimum utility threshold (minutil) is used to assess the utility of all items. This simplifying assumption is unrealistic since in real-life all items do not have the same unit profit, and thus do not have an equal chance of generating a high profit. As a result, if the minutil threshold is set high, patterns containing items having a low unit profit are often missed, while if minutil is set low, the number of patterns becomes unmanageable. To address this issue, this paper presents an efficient tree-based algorithm named HIMU for mining HUIs using multiple minimum utility thresholds. A novel tree structure called multiple item utility Set-enumeration (MIU)-tree and the global and conditional downward closure (GDC and CDC) properties of HUIs in the MIU-tree are proposed. Moreover, a vertical compact utility-list structure is adopted to store the information required for discovering HUIs without performing additional database scans and generating candidates. An extensive experimental study on real-world and synthetic datasets show that this greatly improves the efficiency of the algorithm in terms of runtime and scalability.
CITATION STYLE
Gan, W., Lin, J. C. W., Viger, P. F., & Chao, H. C. (2016). More efficient algorithms for mining high-utility itemsets with multiple minimum utility thresholds. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9827 LNCS, pp. 71–87). Springer Verlag. https://doi.org/10.1007/978-3-319-44403-1_5
Mendeley helps you to discover research relevant for your work.