An attempt to read network traffic with doc2vec

6Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Detecting new malicious traffic is a challenging task. There are many behavior-based detection methods which extract the features of malicious traffic. However, many previous methods require knowledge of how to extract feature vectors. If attackers modify the attack techniques, these previous methods may have to extract new feature representation to detect them. To address this problem, neural networks can be applied to perform feature learning. Doc2vec is one of these models that learn fixed-length feature representation from variable-length documents and has been applied to proxy logs. However, some attackers still use protocols other than http or https. In this paper, we extend the previous method to a generic detection method which supports any protocol. The key idea of this research is reading network packets as a natural language. In our method, a protocol analyzer reads network packets, and summarizes the traffic. Our method extracts the feature representation from the summary with Doc2vec. We apply several classifiers to the automatically extracted feature representation, and classify traffic into benign and malicious traffic. In the fundamental experiment, the best F-measure achieves 0.98 in the timeline analysis and 0.97 in the cross-dataset validation. Furthermore, we generate imbalanced datasets which simulate actual network traffic. In the practical experiment, the best F-measure achieves 0.82 in the timeline analysis and 0.73 in the cross-dataset validation.

Cite

CITATION STYLE

APA

Mimura, M. (2019). An attempt to read network traffic with doc2vec. Journal of Information Processing, 27, 711–719. https://doi.org/10.2197/IPSJJIP.27.711

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free