Mining concepts from code with probabilistic topic models

80Citations
Citations of this article
77Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We develop and apply statistical topic models to software as a means of extracting concepts from source code. The effectiveness of the technique is demonstrated on 1,555 projects from SourceForge and Apache consisting of 113,000 files and 19 million lines of code. In addition to providing an automated, unsupervised, solution to the problem of summarizing program functionality, the approach provides a probabilistic framework with which to analyze and visualize source file similarity. Finally, we introduce an information-theoretic approach for computing tangling and scattering of extracted concepts, and present preliminary results. Copyright 2007 ACM.

Cite

CITATION STYLE

APA

Linstead, E., Rigor, P., Bajracharya, S., Lopes, C., & Baldi, P. (2007). Mining concepts from code with probabilistic topic models. In ASE’07 - 2007 ACM/IEEE International Conference on Automated Software Engineering (pp. 461–464). https://doi.org/10.1145/1321631.1321709

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free