Extracting code segments and their descriptions from research articles

Preetha Chatterjee; Benjamin Gause; Hunter Hedinger; Lori Pollock

Conference Proceedings

Extracting code segments and their descriptions from research articles

IEEE International Working Conference on Mining Software Repositories (2017) 0 91-101

DOI: 10.1109/MSR.2017.10

10Citations

40Readers

Get full text

Abstract

The availability of large corpora of online software-related documents today presents an opportunity to use machine learning to improve integrated development environments by first automatically collecting code examples along with associated descriptions. Digital libraries of computer science research and education conference and journal articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. Because they are used as examples in an article, these code examples are accompanied by descriptions of their functionality, properties, or other associated information expressed in natural language text. Identifying code segments in these documents is relatively straightforward, thus this paper tackles the problem of extracting the natural language text that is associated with each code segment in an article. We present and evaluate a set of heuristics that address the challenges of the text often not being colocated with the code segment as in developer communications such as online forums.

Author supplied keywords

Cite

CITATION STYLE

APA

Chatterjee, P., Gause, B., Hedinger, H., & Pollock, L. (2017). Extracting code segments and their descriptions from research articles. In IEEE International Working Conference on Mining Software Repositories (Vol. 0, pp. 91–101). IEEE Computer Society. https://doi.org/10.1109/MSR.2017.10

Extracting code segments and their descriptions from research articles

Abstract

Author supplied keywords

Cite

Register to see more suggestions