The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. We will consider a range of semantic representations based on symbolic automata [11,15], tracelets [3], numerical abstractions [13,14], and textual descriptions [1,22], as well as different notions of code similarity based on these representations. To leverage these semantic representations, we will consider a number of prediction techniques, including statistical language models [19,20], variable order Markov models [2], and other distance-based and modelbased sequence classification techniques. Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.
CITATION STYLE
Yahav, E. (2015). Programming with “big code.” In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9458, pp. 3–8). Springer Verlag. https://doi.org/10.1007/978-3-319-26529-2_1
Mendeley helps you to discover research relevant for your work.