Shared-nothing, distributed databases scale transactional and analytical processing over a large data volume by spreading data across servers. However, static sharding of data across nodes makes such systems fail to timely adapt to changing workloads and struggle to obey the cloud pay-as-you-go model. Migrating shards between nodes online is a key technique to react to dynamic changes of workloads for cloud elasticity. Existing approaches introduce severely degraded performance and service interruption, resulting in SLA violation on the cloud; or they are tailor-made to deterministic databases. In this paper, we propose Remus, a new live migration approach for shared-nothing, distributed databases with snapshot isolation. Remus migrates shards between nodes with zero service interruption and minimal performance impact. This is achieved by an efficient unidirectional dual execution during migration. We implement Remus on a shared-nothing, distributed version of PolarDB-PG and evaluate it against state-of-the-art approaches using standard OLTP workloads TPC-C and YCSB, and hybrid workloads consisting of long-lived and short transactions. The results demonstrate Remus is the only effective approach to achieve the goal of zero transaction interruption, zero downtime and marginal performance impact, paving the way for applying the shared-nothing architecture to a cloud database which needs to provide elasticity while guaranteeing strict SLAs.
CITATION STYLE
Kang, J., Cai, L., Li, F., Zhou, X., Cao, W., Cai, S., & Shao, D. (2022). Remus: Efficient Live Migration for Distributed Databases with Snapshot Isolation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 2232–2245). Association for Computing Machinery. https://doi.org/10.1145/3514221.3526047
Mendeley helps you to discover research relevant for your work.