Abstract
Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix M and an integer r, find a dictionary D with r columns and a matrix B with k-sparse columns (that is, each column of B has at most k nonzero entries) such that M DB. A key issue in SCA is identifiability, that is, characterizing the conditions under which D and B are essentially unique (that is, they are unique up to permutation and scaling of the columns of D and rows of B). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of M) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when D is (under)complete. While previous bounds feature a combinatorial term (kr), we exhibit a sufficient condition involving O(r3/(r-k)2) samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by r-1 columns of D. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when k equals r-1. Our bounds provide a drastic improvement compared to the state of the art, and imply, for example, that for a fixed proportion of zeros (constant and independent of r, e.g., 10\% of zero entries in B), one only requires O(r) data points to guarantee identifiability.
Author supplied keywords
Cite
CITATION STYLE
Cohen, J. E., & Gillis, N. (2019). Identifiability of Complete Dictionary Learning. SIAM Journal on Mathematics of Data Science, 1(3), 518–536. https://doi.org/10.1137/18M1233339
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.