Test-based generate-and-validate automated program repair (APR) systems generate many patches that pass the test suite without fixing the bug. The generated patches must be manually inspected by the developers, a task that tends to be time-consuming, thereby diminishing the role of APR in reducing debugging costs. We present the design and implementation of a novel tool, named Shibboleth, for automatic assessment of the patches generated by test-based generate-and-validate APR systems. Shibboleth leverages lightweight static and dynamic heuristics from both test and production code to rank and classify the patches. Shibboleth is based on the idea that the buggy program is almost correct and the bugs are small mistakes that require small changes to fix and specifically the fix does not remove the code implementing correct functionality of the program. Thus, the tool measures the impact of patches on both production code (via syntactic and semantic similarity) and test code (via code coverage) to separate the patches that result in similar programs and that do not remove desired program elements. We have evaluated Shibboleth on 1,871 patches, generated by 29 Java-based APR systems for Defects4J programs. The technique outperforms state-of-the-art raking and classification techniques. Specifically, in our ranking data set, in 66% of the cases, Shibboleth ranks the correct patch in top-1 or top-2 positions and, in our classification data set, it achieves an accuracy and F1-score of 0.887 and 0.852, respectively, in classification mode. A demo video of the tool is available at https://bit.ly/3NvYJN8.
CITATION STYLE
Ghanbari, A., & Marcus, A. (2022). Shibboleth: Hybrid Patch Correctness Assessment in Automated Program Repair. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3551349.3559519
Mendeley helps you to discover research relevant for your work.