A large-scale study on repetitiveness, containment, and composability of routines in open-source projects

12Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Source code in software systems has been shown to have a good degree of repetitiveness at the lexical, syntactical, and API usage levels. This paper presents a large-scale study on the repetitiveness, containment, and composability of source code at the semantic level. We collected a large dataset consisting of 9,224 Java projects with 2.79M class files, 17.54M methods with 187M SLOCs. For each method in a project, we build the program dependency graph (PDG) to represent a routine, and compare PDGs with one another as well as the subgraphs within them. We found that within a project, 12.1% of the routines are repeated, and most of them repeat from 2-7 times. As entirety, the routines are quite project-specific with only 3.3% of them exactly repeating in 1-4 other projects with at most 8 times. We also found that 26.1% and 7.27% of the routines are contained in other routine(s), i.e., implemented as part of other routine(s) elsewhere within a project and in other projects, respectively. Except for trivial routines, their repetitiveness and containment is independent of their complexity. Defining a subroutine via a per-variable slicing subgraph in a PDG, we found that 14.3% of all routines have all of their subroutines repeated. A high percentage of subroutines in a routine can be found/reused elsewhere. We collected 8,764,971 unique subroutines (with 323,564 unique JDK subroutines) as basic units for code searching/synthesis. We also provide practical implications of our findings to automated tools.

Cite

CITATION STYLE

APA

Nguyen, A. T., Nguyen, H. A., & Nguyen, T. N. (2016). A large-scale study on repetitiveness, containment, and composability of routines in open-source projects. In Proceedings - 13th Working Conference on Mining Software Repositories, MSR 2016 (pp. 362–373). Association for Computing Machinery, Inc. https://doi.org/10.1145/2901739.2901759

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free