Abstract
Despite their wide deployment, distributed systems remain notoriously hard to reason about. Unexpected interleavings of concurrent operations and failures may lead to undefined behaviors and cause serious consequences. We present Morpheus, the first concurrency testing tool leveraging partial order sampling, a randomized testing method formally analyzed and empirically validated to provide strong probabilistic guarantees of error-detection, for real-world distributed systems. Morpheus introduces conflict analysis to further improve randomized testing by predicting and focusing on operations that affect the testing result. Inspired by the recent shift in building distributed systems using higher-level languages and frameworks, Morpheus targets Erlang. Evaluation on four popular distributed systems in Erlang including RabbitMQ, a message broker service, and Mnesia, a distributed database in the Erlang standard libraries, shows that Morpheus is effective: It found previously unknown errors in every system checked, 11 total, all of which are flaws in their core protocols that may cause deadlocks, unexpected crashes, or inconsistent states.
Author supplied keywords
Cite
CITATION STYLE
Yuan, X., & Yang, J. (2020). Effective concurrency testing for distributed systems. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (pp. 1141–1156). Association for Computing Machinery. https://doi.org/10.1145/3373376.3378484
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.