Abstract
Mutation testing research has often used the number of mutants as a surrogate measure for the true execution cost of generating and executing mutants. This poses a potential threat to the validity of the scientific findings reported in the literature. Out of 75 works surveyed in this paper, we found that 54 (72%) are vulnerable to this threat. To investigate the magnitude of the threat, we conducted an empirical evaluation using 10 real-world programs. The results reveal that: i) percentages of randomly sampled mutants differ from the true execution time, on average, by 44%, varying in difference from 19% to 91%; ii) errors arising from using the surrogate correlate with program size (ρ = 0.74) and number of mutants (ρ = 0.76), making the problem more pernicious for more realistic programs; iii) scientific findings concerning sampling strategies would have approximately 37% rank disagreement, indicating potentially dramatic impact on experiment validity. To investigate whether this threat matters in practice, we reproduced a seminal study on Selective Mutation (widely relied upon for more than two decades). The impact is stark: an inconclusive scientific finding using the surrogate is transformed to an unequivocal finding when using the true execution cost.
Author supplied keywords
Cite
CITATION STYLE
Guizzo, G., Sarro, F., & Harman, M. (2020). Cost measures matter for mutation testing study validity. In ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1127–1139). Association for Computing Machinery, Inc. https://doi.org/10.1145/3368089.3409742
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.