An extensive study of the effects of different deep learning models on code vulnerability detection in Python code

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep learning has achieved great progress in automated code vulnerability detection. Several code vulnerability detection approaches based on deep learning have been proposed. However, few studies empirically studied the impacts of different deep learning models on code vulnerability detection in Python. For this reason, we strive to cover many more code representation learning models and classification models for vulnerability detection. We design and conduct an empirical study for evaluating the effects of the eighteen deep learning architectures derived from combinations of three representation learning models, i.e., Word2Vec, fastText, and CodeBERT, and six classification models, i.e., random forest, XGBoost, Multi-Layer Perception (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gate Recurrent Unit (GRU) on code vulnerability detection in total. Additionally, two machine learning strategies i.e., the attention and bi-directional mechanisms are also empirically compared. The statistical significance and effect size analysis between different models are also conducted. In terms of precision, recall, and F-score, Word2Vec is better than Bidirectional Encoder Representations from Transformers CodeBERT and fastText. Likewise, long short-term memory (LSTM) and gated recurrent unit (GRU) are superior to other classification models we studied. The bi-directional LSTM and GRU with attention using Word2Vec are two optimal models for solving code vulnerability detection for Python code. Moreover, they have medium or large effect sizes on LSTM and GRU using only a single mechanism. Both the representation learning models and classification models have important influences on vulnerability detection in Python code. Likewise, the bi-directional and attention mechanisms can impact the performance of code vulnerability detection.

Cite

CITATION STYLE

APA

Wang, R., Xu, S., Ji, X., Tian, Y., Gong, L., & Wang, K. (2024). An extensive study of the effects of different deep learning models on code vulnerability detection in Python code. Automated Software Engineering, 31(1). https://doi.org/10.1007/s10515-024-00413-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free