Recognizing Faults in Software Related Difficult Data

Michał Choraś; Marek Pawlicki; Rafał Kozik

Conference ProceedingsOPEN ACCESS

Recognizing Faults in Software Related Difficult Data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11538 LNCS 263-272

DOI: 10.1007/978-3-030-22744-9_20

2Citations

4Readers

Abstract

In this paper we have investigated the use of numerous machine learning algorithms, with emphasis on multilayer artificial neural networks in the domain of software source code fault prediction. The main contribution lies in enhancing the data pre-processing step as the partial solution for handling software related difficult data. Before we put the data into an Artificial Neural Network, we are implementing PCA (Principal Component Analysis) and k-means clustering. The data-clustering step improves the quality of the whole dataset. Using the presented approach we were able to obtain 10% increase of accuracy of the fault detection. In order to ensure the most reliable results, we implement 10-fold cross-validation methodology during experiments. We have also evaluated a wide range of hyperparameter setups for the network, and compared the results to the state of the art, cost-sensitive approaches - Random Forest, AdaBoost, RepTrees and GBT.

Author supplied keywords

Cite

CITATION STYLE

APA

Choraś, M., Pawlicki, M., & Kozik, R. (2019). Recognizing Faults in Software Related Difficult Data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11538 LNCS, pp. 263–272). Springer Verlag. https://doi.org/10.1007/978-3-030-22744-9_20

Recognizing Faults in Software Related Difficult Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions