Benchmarking deep learning models for surface defect detection: a reproducible and statistically-rigorous approach

2Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Automated surface defect detection has been a key research topic for many years, with deep learning-based object detection being one of the most widely used approaches. However, comparing the results of different models remains a challenge due to the use of varying dataset partitions and the stochastic nature of training, which can introduce variability in outcomes. This study highlights that improvements in performance metrics, such as average precision (AP50), do not always reflect a model’s true effectiveness, as other factors may influence these results. To address this challenge, a robust methodology is proposed, specifically designed for small datasets, which utilizes analysis of variance and Tukey’s test to ensure statistical significance. This methodology provides a reliable and reproducible framework for comparing results across models. The proposed methodology is demonstrated using the latest object detection models and the Northeastern University surface defect dataset, revealing that recent advancements do not always lead to statistically significant improvements. The source code has been made publicly available to promote reproducibility.

Cite

CITATION STYLE

APA

Lema, D. G., Sánchez-González, L., Usamentiaga, R., & delaCalle, F. J. (2025). Benchmarking deep learning models for surface defect detection: a reproducible and statistically-rigorous approach. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-025-02672-8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free