Programming with “big code”

5Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. We will consider a range of semantic representations based on symbolic automata [11,15], tracelets [3], numerical abstractions [13,14], and textual descriptions [1,22], as well as different notions of code similarity based on these representations. To leverage these semantic representations, we will consider a number of prediction techniques, including statistical language models [19,20], variable order Markov models [2], and other distance-based and modelbased sequence classification techniques. Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.

Cite

CITATION STYLE

APA

Yahav, E. (2015). Programming with “big code.” In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9458, pp. 3–8). Springer Verlag. https://doi.org/10.1007/978-3-319-26529-2_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free