Abstract
This paper proposes a new mutual information estimator for discrete and continuous variables, and constructs a forest based on the Chow-Liu algorithm. The state-of-art method assumes Gaussian and ANOVA for continuous and discrete/continuous cases, respectively. Given data, the proposed algorithm constructs several pairs of quantizers for X and Y such that each interval of the both axes contains the equal number of samples, and estimate the mutual information values based on the discrete data for the histograms. Among the mutual information values, we choose the maximum one, which is validated in terms of the minimum description length principle. Although strong consistency is not proved mathematically, the proposed method does not distinguish discrete and continuous values when dealing with data, and independence is detected correctly with probability one as the sample size grows. The obtained forest construction procedure is applied to genome differential analysis in which a discrete variable (wild and mutant phenotypes) affects gene expression values.
Author supplied keywords
Cite
CITATION STYLE
Suzuki, J. (2015). Forest learning based on the Chow-Liu algorithm and its application to genome differential analysis: A novel mutual information estimation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9505, pp. 234–249). Springer Verlag. https://doi.org/10.1007/978-3-319-28379-1_17
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.