One of the fundamental tasks of data analysis in many disciplines is to identify the significant differences between classes or groups. Contrast sets have previously been proposed as a useful tool for describing these differences. A contrast set is a set of association rules for which the antecedents describe distinct groups, a common consequent is shared by all the rules, and support for the rules is significantly different between groups. The intuition is that comparing the support between groups may provide some insight into the fundamental differences between the groups. In this chapter, we compare two contrast set mining methodologies that rely on different statistical philosophies: the well-known STUCCO approach and CIGAR, our proposed alternative approach. Following a brief introduction to general issues and problems related to statistical hypothesis testing in data mining, we survey and discuss the statistical measures underlying the two methods using an informal tutorial approach. Experimental results show that both methodologies are statistically sound, representing valid alternative solutions to the problem of identifying potentially interesting contrast sets. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Hilderman, R. J., & Peckham, T. (2007). Statistical methodologies for mining potentially interesting contrast sets. Studies in Computational Intelligence, 43, 153–177. https://doi.org/10.1007/978-3-540-44918-8_7
Mendeley helps you to discover research relevant for your work.