Abstract
Distributed matrix computation is common in large-scale data processing and machine learning applications. Existing systems that support distributed matrix computation already explore incremental evaluation for iterative-convergent algorithms. However, they are oblivious to the fact that non-zero increments are scattered in different blocks in a distributed environment. Additionally, we observe that incremental evaluation does not always outperform full evaluation. To address these issues, we propose matrix reorganization to optimize the physical layout upon the state-of-art optimized partition schemes, and thereby accelerate the incremental evaluation. More importantly, we propose a hybrid evaluation to efficiently interleave full and incremental evaluation during the iterative process. In particular, it employs a cost model to compare the overhead costs of two types of evaluations and a selective comparison mechanism to reduce the overhead incurred by comparison itself. To demonstrate the efficiency of our techniques, we implement HyMAC, a hybrid matrix computation system based on SystemML. Our experiments show that HyMAC reduces execution time on large datasets by 23% on average in comparison to the state-of-art optimization technique and consequently outperforms SystemML, ScaLAPACK, and SciDB by an order of magnitude.
Author supplied keywords
Cite
CITATION STYLE
Chen, Z., Xu, C., Soto, J., Markl, V., Qian, W., & Zhou, A. (2021). Hybrid Evaluation for Distributed Iterative Matrix Computation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 300–312). Association for Computing Machinery. https://doi.org/10.1145/3448016.3452843
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.