Kernel Matrix Approximation on Class-Imbalanced Data with an Application to Scientific Simulation

25Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Generating low-rank approximations of kernel matrices that arise in nonlinear machine learning techniques holds the potential to significantly alleviate the memory and computational burdens. A compelling approach centers on finding a concise set of exemplars or landmarks to reduce the number of similarity measure evaluations from quadratic to linear concerning the data size. However, a key challenge is to regulate tradeoffs between the quality of landmarks and resource consumption. Despite the volume of research in this area, current understanding is limited regarding the performance of landmark selection techniques in the presence of class-imbalanced data sets that are becoming increasingly prevalent in many applications. Hence, this paper provides a comprehensive empirical investigation using several real-world imbalanced data sets, including scientific data, by evaluating the quality of approximate low-rank decompositions and examining their influence on the accuracy of downstream tasks. Furthermore, we present a new landmark selection technique called Distance-based Importance Sampling and Clustering (DISC), in which the relative importance scores are computed for improving accuracy-efficiency tradeoffs compared to existing works that range from probabilistic sampling to clustering methods. The proposed landmark selection method follows a coarse-to-fine strategy to capture the intrinsic structure of complex data sets, allowing us to substantially reduce the computational complexity and memory footprint with minimal loss in accuracy.

Cite

CITATION STYLE

APA

Hajibabaee, P., Pourkamali-Anaraki, F., & Hariri-Ardebili, M. A. (2021). Kernel Matrix Approximation on Class-Imbalanced Data with an Application to Scientific Simulation. IEEE Access, 9, 83579–83591. https://doi.org/10.1109/ACCESS.2021.3087730

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free