Programming with “big code”

Eran Yahav

Conference Proceedings

Programming with “big code”

Yahav E

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9458 3-8

DOI: 10.1007/978-3-319-26529-2_1

5Citations

12Readers

Get full text

Abstract

The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. We will consider a range of semantic representations based on symbolic automata [11,15], tracelets [3], numerical abstractions [13,14], and textual descriptions [1,22], as well as different notions of code similarity based on these representations. To leverage these semantic representations, we will consider a number of prediction techniques, including statistical language models [19,20], variable order Markov models [2], and other distance-based and modelbased sequence classification techniques. Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.

Cite

CITATION STYLE

APA

Yahav, E. (2015). Programming with “big code.” In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9458, pp. 3–8). Springer Verlag. https://doi.org/10.1007/978-3-319-26529-2_1

Programming with “big code”

Abstract

Cite

Register to see more suggestions