A data filtering method based on agglomerative clustering

Xiao Yu; Jiansheng Zhang; Peipei Zhou; Jin Liu

Conference ProceedingsOPEN ACCESS

A data filtering method based on agglomerative clustering

Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (2017) 392-397

DOI: 10.18293/SEKE2017-043

16Citations

10Readers

Abstract

Cross-company defect prediction (CCDP) is a practical way that trains a prediction model by exploiting one or multiple projects of a source company and then applies the model to target company. Unfortunately, larger irrelevant crosscompany (CC) data usually makes it difficult to build a crosscompany defect prediction model with high performance. To address such issues, this paper proposes a data filtering method based on Agglomerative Clustering (DFAC) for cross-company defect prediction. First, DFAC combines within-company instances and cross-company instances and uses Agglomerative clustering algorithms to group these instances. Second, DFAC selects sub-clusters which consist at least one WC instance, and collects the CC instances in the selected sub-clusters into a new CC data. Compared with existing data filter methods, the experimental results on 15 public PROMISE datasets show that DFAC increases PD value, reduces PF value and achieves higher G-measure and AUC values.

Author supplied keywords

Cite

CITATION STYLE

APA

Yu, X., Zhang, J., Zhou, P., & Liu, J. (2017). A data filtering method based on agglomerative clustering. In Proceedings of the International Conference on Software Engineering and Knowledge Engineering, SEKE (pp. 392–397). Knowledge Systems Institute Graduate School. https://doi.org/10.18293/SEKE2017-043

A data filtering method based on agglomerative clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions