Statistical methodologies for mining potentially interesting contrast sets

15Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the fundamental tasks of data analysis in many disciplines is to identify the significant differences between classes or groups. Contrast sets have previously been proposed as a useful tool for describing these differences. A contrast set is a set of association rules for which the antecedents describe distinct groups, a common consequent is shared by all the rules, and support for the rules is significantly different between groups. The intuition is that comparing the support between groups may provide some insight into the fundamental differences between the groups. In this chapter, we compare two contrast set mining methodologies that rely on different statistical philosophies: the well-known STUCCO approach and CIGAR, our proposed alternative approach. Following a brief introduction to general issues and problems related to statistical hypothesis testing in data mining, we survey and discuss the statistical measures underlying the two methods using an informal tutorial approach. Experimental results show that both methodologies are statistically sound, representing valid alternative solutions to the problem of identifying potentially interesting contrast sets. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Hilderman, R. J., & Peckham, T. (2007). Statistical methodologies for mining potentially interesting contrast sets. Studies in Computational Intelligence, 43, 153–177. https://doi.org/10.1007/978-3-540-44918-8_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free