Statistical methodologies for mining potentially interesting contrast sets

Robert J. Hilderman; Terry Peckham

Journal Article

Statistical methodologies for mining potentially interesting contrast sets

Studies in Computational Intelligence (2007) 43 153-177

DOI: 10.1007/978-3-540-44918-8_7

15Citations

7Readers

Get full text

Abstract

One of the fundamental tasks of data analysis in many disciplines is to identify the significant differences between classes or groups. Contrast sets have previously been proposed as a useful tool for describing these differences. A contrast set is a set of association rules for which the antecedents describe distinct groups, a common consequent is shared by all the rules, and support for the rules is significantly different between groups. The intuition is that comparing the support between groups may provide some insight into the fundamental differences between the groups. In this chapter, we compare two contrast set mining methodologies that rely on different statistical philosophies: the well-known STUCCO approach and CIGAR, our proposed alternative approach. Following a brief introduction to general issues and problems related to statistical hypothesis testing in data mining, we survey and discuss the statistical measures underlying the two methods using an informal tutorial approach. Experimental results show that both methodologies are statistically sound, representing valid alternative solutions to the problem of identifying potentially interesting contrast sets. © Springer-Verlag Berlin Heidelberg 2007.

Author supplied keywords

Cite

CITATION STYLE

APA

Hilderman, R. J., & Peckham, T. (2007). Statistical methodologies for mining potentially interesting contrast sets. Studies in Computational Intelligence, 43, 153–177. https://doi.org/10.1007/978-3-540-44918-8_7

Statistical methodologies for mining potentially interesting contrast sets

Abstract

Author supplied keywords

Cite

Register to see more suggestions