Abstract
In this paper, we do automatic correctness assessment for patches generated by program repair systems. We consider the human-written patch as ground truth oracle and randomly generate tests based on it, a technique proposed by Shamshiri et al., called Random testing with Ground Truth (RGT) in this paper. We build a curated dataset of 638 patches for Defects4J generated by 14 state-of-the-art repair systems, we evaluate automated patch assessment on this dataset. The results of this study are novel and significant: First, we improve the state of the art performance of automatic patch assessment with RGT by 190% by improving the oracle; Second, we show that RGT is reliable enough to help scientists to do overfitting analysis when they evaluate program repair systems; Third, we improve the external validity of the program repair knowledge with the largest study ever.
Author supplied keywords
Cite
CITATION STYLE
Ye, H., Martinez, M., & Monperrus, M. (2021). Automated patch assessment for program repair at scale. Empirical Software Engineering, 26(2). https://doi.org/10.1007/s10664-020-09920-w
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.