Recent pattern mining algorithms such as LAMP allow us to compute statistical significance of patterns with respect to an outcome variable. Their p-values are adjusted to control the family-wise error rate, which is the probability of at least one false discovery occurring. However, they are a poor fit for medical applications, due to their inability to handle potential confounding variables such as age or gender. We propose a novel pattern mining algorithm that evaluates statistical significance under confounding variables. Using a new testability bound based on the exact logistic regression model, the algorithm can exclude a large quantity of combination without testing them, limiting the amount of correction required for multiple testing. Using synthetic data, we showed that our method could remove the bias introduced by confounding variables while still detecting true patterns correlated with the class. In addition, we demonstrated application of data integration using a confounding variable.
CITATION STYLE
Terada, A., duVerle, D., & Tsuda, K. (2016). Significant pattern mining with confounding variables. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9651, pp. 277–289). Springer Verlag. https://doi.org/10.1007/978-3-319-31753-3_23
Mendeley helps you to discover research relevant for your work.