Mining concepts from code with probabilistic topic models

  • Linstead E
  • Rigor P
  • Bajracharya S
 et al. 
  • 1


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


We develop and apply statistical topic models to software as a means of extracting concepts from source code. The effectiveness of the technique is demontrated on 1,555 projects from sourceforge and Apache consisting of 113,000 files and 19 milion lines of code. In addition to providing an automated, unsupervised solution to the problem of summarizing program functionality, the approach provides a probabilistic framework with which to analyze and visualize source file similarity. Finally, we introduce an information-theoretic approach for computing tangling and scattering of extracted concepts and present preliminary results.

Author-supplied keywords

  • mining software
  • program understanding
  • topic models

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in


  • Erik Linstead

  • Paul Rigor

  • Sushil Bajracharya

  • Cristina Lopes

  • Pierre Baldi

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free