Tight Sensitivity Bounds for Smaller Coresets

12Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

An ϵ-coreset to the dimensionality reduction problem for a (possibly very large) matrix A ĝ Rn x d is a small scaled subset of its n rows that approximates their sum of squared distances to every affine k-dimensional subspace of Rd, up to a factor of 1±ϵ. Such a coreset is useful for boosting the running time of computing a low-rank approximation (k-SVD/k-PCA) while using small memory. Coresets are also useful for handling streaming, dynamic and distributed data in parallel. With high probability, non-uniform sampling based on the so called leverage score or sensitivity of each row in A yields a coreset. The size of the (sampled) coreset is then near-linear in the total sum of these sensitivity bounds. We provide algorithms that compute provably tight bounds for the sensitivity of each input row. It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each row up to arbitrary small precision for (non-affine) k-subspaces, and (ii) a general reduction for computing a coreset for affine subspaces, given a coreset for (non-affine) subspaces in Rd. Experimental results on real-world datasets, including the English Wikipedia documents-term matrix, show that our bounds provide significantly smaller and data-dependent coresets also in practice. Full open source code is also provided.

Cite

CITATION STYLE

APA

Maalouf, A., Statman, A., & Feldman, D. (2020). Tight Sensitivity Bounds for Smaller Coresets. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2051–2061). Association for Computing Machinery. https://doi.org/10.1145/3394486.3403256

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free