Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a crosscompany defect prediction model with high performance. To address such issues, this paper proposes a data filtering method based on Agglomerative Clustering (DFAC) for cross-company defect prediction. First, DFAC combines within-company instances and cross-company instances and uses Agglomerative clustering algorithms to group these instances. Second, DFAC selects sub-clusters which consist at least one WC instance, and collects the CC instances in the selected sub-clusters into a new CC data. Compared with existing data filter methods, the experimental results on 15 public PROMISE datasets show that DFAC increases PD value, reduces PF value and achieves higher G-measure and AUC values.
CITATION STYLE
Yu, X., Zhang, J., Zhou, P., & Liu, J. (2017). A data filtering method based on agglomerative clustering. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 392–397). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-043
Mendeley helps you to discover research relevant for your work.