Abstract
Data-intensive scalable computing (DISC) systems such as Google's MapReduce, Apache Hadoop, and Apache Spark are prevalent in many production services. Despite their popularity, the quality of DISC applications suffers due to a lack of exhaustive and automated testing. Current practices of testing DISC applications are limited to using a small random sample of the entire input dataset which merely exposes any program faults. Unlike SQL queries, testing DISC applications has new challenges due to a composition of both dataflow and relational operators, and user-defined functions (UDF) that could be arbitrarily long and complex.To address this problem, we demonstrate a new white-box testing framework called BigTest that takes an Apache Spark program as input and automatically generates synthetic, concrete data for effective and efficient testing. BigTest combines the symbolic execution of UDFs with the logical specifications of dataflow and relational operators to explore all paths in a DISC application. Our experiments show that BigTest is capable of generating test data that can reveal up to 2X more faults than the entire data set with 194X less testing time. We implement BigTest in a Java-based command line tool with a pre-compile binary jar. It exposes a configuration file in which a user can edit preferences, including the path of a target program, the upper bound of loop exploration, and a choice of theorem solver. The demonstration video of BigTest is available at https://youtu.be/OeHhoKiDYso and BigTest is available at https://github.com/maligulzar/BigTest.
Author supplied keywords
Cite
CITATION STYLE
Ali Gulzar, M., Musuvathi, M., & Kim, M. (2020). BigTest: A Symbolic Execution Based Systematic Test Generation Tool for Apache Spark. In Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering: Companion, ICSE-Companion 2020 (pp. 61–64). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3377812.3382145
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.