An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Rongcun Wang; Senlei Xu; Xingyu Ji; Yuan Tian; Lina Gong; Ke Wang

Journal Article

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Automated Software Engineering (2024) 31(1)

DOI: 10.1007/s10515-024-00413-4

0Citations

4Readers

Get full text

Abstract

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, R., Xu, S., Ji, X., Tian, Y., Gong, L., & Wang, K. (2024). An extensive study of the effects of different deep learning models on code vulnerability detection in Python code. Automated Software Engineering, 31(1). https://doi.org/10.1007/s10515-024-00413-4

An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

Abstract

Author supplied keywords

Cite

Register to see more suggestions