Abstract
The availability of large corpora of online software-related documents today presents an opportunity to use machine learning to improve integrated development environments by first automatically collecting code examples along with associated descriptions. Digital libraries of computer science research and education conference and journal articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. Because they are used as examples in an article, these code examples are accompanied by descriptions of their functionality, properties, or other associated information expressed in natural language text. Identifying code segments in these documents is relatively straightforward, thus this paper tackles the problem of extracting the natural language text that is associated with each code segment in an article. We present and evaluate a set of heuristics that address the challenges of the text often not being colocated with the code segment as in developer communications such as online forums.
Author supplied keywords
Cite
CITATION STYLE
Chatterjee, P., Gause, B., Hedinger, H., & Pollock, L. (2017). Extracting code segments and their descriptions from research articles. In IEEE International Working Conference on Mining Software Repositories (Vol. 0, pp. 91–101). IEEE Computer Society. https://doi.org/10.1109/MSR.2017.10
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.