In machine learning, we are given a dataset of the form (Formula presented.), drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the xj's being μ*, and the marginals of the kth class (Formula presented.) possibly overlapping. We address the problem of detecting, with a high degree of certainty, for which x we have (Formula presented.) for all i ≠ k. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a “witness function” in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.
CITATION STYLE
Mhaskar, H. N., Cheng, X., & Cloninger, A. (2020). A Witness Function Based Construction of Discriminative Models Using Hermite Polynomials. Frontiers in Applied Mathematics and Statistics, 6. https://doi.org/10.3389/fams.2020.00031
Mendeley helps you to discover research relevant for your work.