Tensor factorization methods have gained increased popularity in the data mining community. A key feature that renders tensors attractive is the essential uniqueness (identifiability) of their decomposition into latent factors: this is crucial for explanatory data analysis-model uniqueness makes interpretations well grounded. In this work, we propose ParaSketch, a distributed tensor factorization algorithm that enables massive parallelism, to deal with large tensors. The idea is to compress/sketch the large tensor into multiple small tensors, decompose each small tensor, and combine the results to reconstruct the desired latent factors. Prior art in this direction entails potentially very high complexity in the (Gaussian) compression and final combining stages. Utilizing sketching matrices for compression, the proposed method greatly reduces compression complexity, and features much simpler combining. Moreover, theoretical analysis shows that the compressed tensors inherit latent identifiability under mild conditions, hence establishing correctness of the overall approach. Our approach to establish identifiability for the sketched tensor is original, and of interest in its own right.
CITATION STYLE
Yang, B., Zamzam, A., & Sidiropoulos, N. D. (2018). ParaSketch: Parallel tensor factorization via sketching. In SIAM International Conference on Data Mining, SDM 2018 (pp. 396–404). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611975321.45
Mendeley helps you to discover research relevant for your work.