An extended benchmark system of word embedding methods for vulnerability detection

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Security researchers have used Natural Language Processing (NLP) and Deep Learning techniques for programming code analysis tasks such as automated bug detection and vulnerability prediction or classification. These studies mainly generate the input vectors for the deep learning models based on the NLP embedding methods. Nevertheless, while there are many existing embedding methods, the structures of neural networks are diverse and usually heuristic. This makes it difficult to select effective combinations of neural models and the embedding techniques for training the code vulnerability detectors. To address this challenge, we extended a benchmark system to analyze the compatibility of four popular word embedding techniques with four different neural networks, including the standard Bidirectional Long Short-Term Memory (Bi-LSTM), the Bi-LSTM applied attention mechanism, the Convolutional Neural Network (CNN), and the classic Deep Neural Network (DNN). We trained and tested the models by using two types of vulnerable function datasets written in C code. Our results revealed that the Bi-LSTM model combined with the FastText embedding technique showed the most efficient detection rate on a real-world but not on an artificially constructed dataset. Further comparisons with the other combinations are also discussed in detail in our result.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Ngoc, H. N., Viet, H. N., & Uehara, T. (2020). An extended benchmark system of word embedding methods for vulnerability detection. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3440749.3442661

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

75%

Lecturer / Post doc 1

25%

Readers' Discipline

Tooltip

Computer Science 4

100%

Save time finding and organizing research with Mendeley

Sign up for free