Creating Evolving Project Data Sets in Software Engineering

7Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

While the amount of research in the area of software engineering is ever increasing, it is still a challenge to select a research data set. Quite a number of data sets have been proposed, but we still lack a systematic approach to creating ones that would evolve together with the industry. We aim to present a systematic method of selecting data sets of industry-relevant software projects for the purposes of software engineering research. We present a set of guidelines for filtering GitHub projects and implement those guidelines in a form of an R script. In particular, we select mostly projects from the biggest industrial open source contributors and remove projects in the first quartile in any of several categories from the data set. We use the latest GitHub GraphQL API to select the desired set of repositories. We evaluate the technique on Java projects. Presented technique systematizes methods for creating software development data sets and their evolution. Proposed algorithm has reasonable precision—between 0.65 and 0.80—and can be used as a baseline for further refinements.

Cite

CITATION STYLE

APA

Lewowski, T., & Madeyski, L. (2020). Creating Evolving Project Data Sets in Software Engineering. In Studies in Computational Intelligence (Vol. 851, pp. 1–14). Springer Verlag. https://doi.org/10.1007/978-3-030-26574-8_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free