Large-scale dataset of local java software build results

Matúš Sulír; Michaela Bačíková; Matej Madeja; Sergej Chodarev; Ján Juhár

Journal ArticleOPEN ACCESS

Large-scale dataset of local java software build results

Data (2020) 5(3) 1-11

DOI: 10.3390/data5030086

11Citations

10Readers

Abstract

When a person decides to inspect or modify a third-party software project, the first necessary step is its successful compilation from source code using a build system. However, such attempts often end in failure. In this data descriptor paper, we provide a dataset of build results of open source Java software systems. We tried to automatically build a large number of Java projects from GitHub using their Maven, Gradle, and Ant build scripts in a Docker container simulating a standard programmer’s environment. The dataset consists of the output of two executions: 7264 build logs from a study executed in 2016 and 7233 logs from the 2020 execution. In addition to the logs, we collected exit codes, file counts, and various project metadata. The proportion of failed builds in our dataset is 38% in the 2016 execution and 59% in the 2020 execution. The published data can be helpful for multiple purposes, such as correlation analysis of factors affecting build success, build failure prediction, and research in the area of build breakage repair.

Author supplied keywords

Cite

CITATION STYLE

APA

Sulír, M., Bačíková, M., Madeja, M., Chodarev, S., & Juhár, J. (2020). Large-scale dataset of local java software build results. Data, 5(3), 1–11. https://doi.org/10.3390/data5030086

Large-scale dataset of local java software build results

Abstract

Author supplied keywords

Cite

Register to see more suggestions