Applying combinatorial test data generation to big data applications

30Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Big data applications (e.g., Extract, Transform, and Load (ETL) applications) are designed to handle great volumes of data. However, processing such great volumes of data is time-consuming. There is a need to construct small yet effective test data sets during agile development of big data applications. In this paper, we apply a combinatorial test data generation approach to two real-world ETL applications at Medidata. In our approach, we first create Input Domain Models (IDMs) automatically by analyzing the original data source and incorporating constraints manually derived from requirements. Next, the IDMs are used to create test data sets that achieve t-way coverage, which has shown to be very effective in detecting software faults. The generated test data sets also satisfy all the constraints identified in the first step. To avoid creating IDMs from scratch when there is a change to the original data source or constraints, our approach extends the original IDMs with additional information. The new IDMs, which we refer to as Adaptive IDMs (AIDMs), are updated by comparing the changes against the additional information, and are then used to generate new test data sets. We implement our approach in a tool, called comBinatorial bIg daTa Test dAta Generator (BIT-TAG). Our experience shows that combinatorial testing can be effectively applied to big data applications. In particular, the test data sets created using our approach for the two ETL applications are only a small fraction of the original data source, but we were able to detect all the faults found with the original data source.

Cite

CITATION STYLE

APA

Li, N., Lei, Y., Khan, H. R., Liu, J., & Guo, Y. (2016). Applying combinatorial test data generation to big data applications. In ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (pp. 637–647). Association for Computing Machinery, Inc. https://doi.org/10.1145/2970276.2970325

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free