BigSift: Automated debugging of big data analytics in data-intensive scalable computing

Muhammad Ali Gulzar; Siman Wang; Miryung Kim

Conference ProceedingsOPEN ACCESS

BigSift: Automated debugging of big data analytics in data-intensive scalable computing

ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2018) 863-866

DOI: 10.1145/3236024.3264586

17Citations

14Readers

Abstract

Developing Big Data Analytics often involves trial and error debugging, due to the unclean nature of datasets or wrong assumptions made about data. When errors (e.g. program crash, outlier results, etc.) arise, developers are often interested in pinpointing the root cause of errors. To address this problem, BigSift takes an Apache Spark program, a user-defined test oracle function, and a dataset as input and outputs a minimum set of input records that reproduces the same test failure by combining the insights from delta debugging with data provenance. The technical contribution of BigSift is the design of systems optimizations that bring automated debugging closer to a reality for data intensive scalable computing. BigSift exposes an interactive web interface where a user can monitor a big data analytics job running remotely on the cloud, write a user-defined test oracle function, and then trigger the automated debugging process. BigSift also provides a set of predefined test oracle functions, which can be used for explaining common types of anomalies in big data analytics-for example, finding the origin of the output value that is more than k standard deviations away from the median.

Author supplied keywords

Cite

CITATION STYLE

APA

Gulzar, M. A., Wang, S., & Kim, M. (2018). BigSift: Automated debugging of big data analytics in data-intensive scalable computing. In ESEC/FSE 2018 - Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 863–866). Association for Computing Machinery, Inc. https://doi.org/10.1145/3236024.3264586

BigSift: Automated debugging of big data analytics in data-intensive scalable computing

Abstract

Author supplied keywords

Cite

Register to see more suggestions