Many research studies in the past have shown that the distribution of bugs in software systems follows the Pareto principle. Some studies have also proposed the Pareto distribution (PD) to model bugs in software systems. However, several other probability distributions such as the Weibull, Bounded Generalized Pareto, Double Pareto (DP), Log Normal and Yule-Simon distributions have also been proposed and each of them has been evaluated for their fitness to model bugs in different studies. We investigate this problem further by making use of information theoretic (criterion-based) approaches to model selection by which several issues like overfitting, etc., that are prevalent in previous works, can be handled elegantly. By strengthening the model selection procedure and studying a large collection of fault data, the results are made more accurate and stable. We conduct experiments on fault data from 74 releases of various open source and proprietary software systems and observe that the DP distribution outperforms all others with statistical significance in the case of proprietary projects. For open source software systems, the top three performing distributions are DP, Bounded Generalized Pareto, Weibull models and they are significantly better than all others though there is no significant difference amongst three of them.
CITATION STYLE
Shriram, C. K., Muthukumaran, K., & Bhanu Murthy, N. L. (2018). Empirical Study on the Distribution of Bugs in Software Systems. International Journal of Software Engineering and Knowledge Engineering, 28(1), 97–122. https://doi.org/10.1142/S0218194018500055
Mendeley helps you to discover research relevant for your work.