A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method

Kosuke Ohara; Hirohisa Aman; Sousuke Amasaki; Tomoyuki Yokogawa; Minoru Kawahara

Journal ArticleOPEN ACCESS

A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method

IEICE Transactions on Information and Systems (2023) E106D(2) 166-169

DOI: 10.1587/transinf.2022MPL0002

1Citations

12Readers

Abstract

This paper focuses on the "data collection period"for training a better Just-In-Time (JIT) defect prediction model-the early commit data vs. the recent one-, and conducts a large-scale comparative study to explore an appropriate data collection period. Since there are many possible machine learning algorithms for training defect prediction models, the selection of machine learning algorithms can become a threat to validity. Hence, this study adopts the automatic machine learning method to mitigate the selection bias in the comparative study. The empirical results using 122 open-source software projects prove the trend that the dataset composed of the recent commits would become a better training set for JIT defect prediction models.

Author supplied keywords

Cite

CITATION STYLE

APA

Ohara, K., Aman, H., Amasaki, S., Yokogawa, T., & Kawahara, M. (2023). A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method. IEICE Transactions on Information and Systems, E106D(2), 166–169. https://doi.org/10.1587/transinf.2022MPL0002

A Comparative Study of Data Collection Periods for Just-In-Time Defect Prediction Using the Automatic Machine Learning Method

Abstract

Author supplied keywords

Cite

Register to see more suggestions