Mining extremely small data sets with application to software reuse

Yuan Jiang; Ming Li; Zhi Hua Zhou

Journal ArticleOPEN ACCESS

Mining extremely small data sets with application to software reuse

Software - Practice and Experience (2009) 39(4) 423-440

DOI: 10.1002/spe.905

14Citations

29Readers

Get full text

Abstract

A serious problem encountered by machine learning and data mining techniques in software engineering is the lack of sufficient data. For example, there are only 24 examples in the current largest data set on software reuse. In this paper, a recently proposed machine learning algorithm is modified for mining extremely small data sets. This algorithm works in a twice-learning style. In detail, a random forest is trained from the original data set at first. Then, virtual examples are generated from the random forest and used to train a single decision tree. In contrast to the numerous discrepancies between the empirical data and expert opinions reported by previous research, our mining practice shows that the empirical data are actually consistent with expert opinions. Copyright © 2008 John Wiley & Sons, Ltd.

Author supplied keywords

Cite

CITATION STYLE

APA

Jiang, Y., Li, M., & Zhou, Z. H. (2009). Mining extremely small data sets with application to software reuse. Software - Practice and Experience, 39(4), 423–440. https://doi.org/10.1002/spe.905

Mining extremely small data sets with application to software reuse

Abstract

Author supplied keywords

Cite

Register to see more suggestions