Abstract
A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association rule is replaced by the new definition of dependence value, which is a single real number uniquely associated with a given itemset. Knowledge of dependence values is sufficient to describe all the dependencies characterizing a given data mining problem. The dependence value of an itemset is the difference between the occurrence probability of the itemset and a corresponding "maximum independence estimate." This can be determined as a function of joint probabilities of the subsets of the itemset being considered by maximizing a suitable entropy function. So it is possible to separate in an itemset of cardinality k the dependence inherited from its subsets of cardinality (k - 1) and the specific inherent dependence of that itemset. The absolute value of the difference between the probability P (i) of the event i that indicates the presence of the itemset {a, b, . . . }and its maximum independence estimate is constant for any combination of values of (a, b, . . . ). In addition, the Boolean function specifying the combinations of values for which the dependence is positive is a parity function. So the determination of such combinations is immediate. The model appears to be simple and powerful.
Author supplied keywords
Cite
CITATION STYLE
Meo, R. (2000). Theory of dependence values. ACM Transactions on Database Systems, 25(3), 380–406. https://doi.org/10.1145/363951.363956
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.