An extended benchmark system of word embedding methods for vulnerability detection

Hai Nguyen Ngoc; Hoang Nguyen Viet; Tetsutaro Uehara

Conference ProceedingsOPEN ACCESS

An extended benchmark system of word embedding methods for vulnerability detection

ACM International Conference Proceeding Series (2020)

DOI: 10.1145/3440749.3442661

1Citations

9Readers

Abstract

Security researchers have used Natural Language Processing (NLP) and Deep Learning techniques for programming code analysis tasks such as automated bug detection and vulnerability prediction or classification. These studies mainly generate the input vectors for the deep learning models based on the NLP embedding methods. Nevertheless, while there are many existing embedding methods, the structures of neural networks are diverse and usually heuristic. This makes it difficult to select effective combinations of neural models and the embedding techniques for training the code vulnerability detectors. To address this challenge, we extended a benchmark system to analyze the compatibility of four popular word embedding techniques with four different neural networks, including the standard Bidirectional Long Short-Term Memory (Bi-LSTM), the Bi-LSTM applied attention mechanism, the Convolutional Neural Network (CNN), and the classic Deep Neural Network (DNN). We trained and tested the models by using two types of vulnerable function datasets written in C code. Our results revealed that the Bi-LSTM model combined with the FastText embedding technique showed the most efficient detection rate on a real-world but not on an artificially constructed dataset. Further comparisons with the other combinations are also discussed in detail in our result.

Author supplied keywords

Cite

CITATION STYLE

APA

Ngoc, H. N., Viet, H. N., & Uehara, T. (2020). An extended benchmark system of word embedding methods for vulnerability detection. In ACM International Conference Proceeding Series. Association for Computing Machinery. https://doi.org/10.1145/3440749.3442661

Readers' Seniority

PhD / Post grad / Masters / Doc 3

75%

Lecturer / Post doc 1

25%

Readers' Discipline

Computer Science 4

100%

An extended benchmark system of word embedding methods for vulnerability detection

Abstract

Author supplied keywords

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline