A data filtering method based on agglomerative clustering

16Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a crosscompany defect prediction model with high performance. To address such issues, this paper proposes a data filtering method based on Agglomerative Clustering (DFAC) for cross-company defect prediction. First, DFAC combines within-company instances and cross-company instances and uses Agglomerative clustering algorithms to group these instances. Second, DFAC selects sub-clusters which consist at least one WC instance, and collects the CC instances in the selected sub-clusters into a new CC data. Compared with existing data filter methods, the experimental results on 15 public PROMISE datasets show that DFAC increases PD value, reduces PF value and achieves higher G-measure and AUC values.

Cite

CITATION STYLE

APA

Yu, X., Zhang, J., Zhou, P., & Liu, J. (2017). A data filtering method based on agglomerative clustering. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 392–397). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-043

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free