Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned

Soneya Binta Hossain; Antonio Filieri; Matthew B. Dwyer; Sebastian Elbaum; Willem Visser

Conference ProceedingsOPEN ACCESS

Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned

ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2023) 120-132

DOI: 10.1145/3611643.3616265

5Citations

17Readers

Abstract

Defining test oracles is crucial and central to test development, but manual construction of oracles is expensive. While recent neural-based automated test oracle generation techniques have shown promise, their real-world effectiveness remains a compelling question requiring further exploration and understanding. This paper investigates the effectiveness of TOGA, a recently developed neural-based method for automatic test oracle generation. TOGA utilizes EvoSuite-generated test inputs and generates both exception and assertion oracles. In a Defects4j study, TOGA outperformed specification, search, and neural-based techniques, detecting 57 bugs, including 30 unique bugs not detected by other methods. To gain a deeper understanding of its applicability in real-world settings, we conducted a series of external, extended, and conceptual replication studies of TOGA. In a large-scale study involving 25 real-world Java systems, 223.5K test cases, and 51K injected faults, we evaluate TOGA's ability to improve fault-detection effectiveness relative to the state-of-the-practice and the state-of-the-art. We find that TOGA misclassifies the type of oracle needed 24% of the time and that when it classifies correctly around 62% of the time it is not confident enough to generate any assertion oracle. When it does generate an assertion oracle, more than 47% of them are false positives, and the true positive assertions only increase fault detection by 0.3% relative to prior work. These findings expose limitations of the state-of-the-art neural-based oracle generation technique, provide valuable insights for improvement, and offer lessons for evaluating future automated oracle generation methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Hossain, S. B., Filieri, A., Dwyer, M. B., Elbaum, S., & Visser, W. (2023). Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned. In ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 120–132). Association for Computing Machinery, Inc. https://doi.org/10.1145/3611643.3616265

Neural-Based Test Oracle Generation: A Large-Scale Evaluation and Lessons Learned

Abstract

Author supplied keywords

Cite

Register to see more suggestions