A Data Extraction Algorithm from Open Source Software Project Repositories for Building Duration Estimation Models: Case Study of Github

  • K. Moulla D
  • Abran A
  • yang K
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Software project estimation is important for allocating resources and planning a reasonable work schedule. Estimation models are typically built using data from completed projects. While organizations have their historical data repositories, it is difficult to obtaintheir collaboration due to privacy and competitive concerns. To overcome the issue of public access to private data repositories this study proposes an algorithm to extract sufficient data from the GitHub repository for building duration estimation models. More specifically, this study extracts and analyses historical data on WordPress projects to estimate OSS project duration using commits as an independent variable as well as an improved classification of contributors based on the number of active days for each contributor within a release period. The results indicate that duration estimation models using data from OSS repositories perform well and partially solves the problem of lack of data encountered in empirical research in software engineering.

Cite

CITATION STYLE

APA

K. Moulla, D., Abran, A., & yang, K. (2020). A Data Extraction Algorithm from Open Source Software Project Repositories for Building Duration Estimation Models: Case Study of Github. International Journal of Software Engineering & Applications, 11(6), 31–46. https://doi.org/10.5121/ijsea.2020.11603

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free