An attempt to read network traffic with doc2vec

Mamoru Mimura

Journal ArticleOPEN ACCESS

An attempt to read network traffic with doc2vec

Mimura M

Journal of Information Processing (2019) 27 711-719

DOI: 10.2197/IPSJJIP.27.711

6Citations

11Readers

Abstract

Detecting new malicious traffic is a challenging task. There are many behavior-based detection methods which extract the features of malicious traffic. However, many previous methods require knowledge of how to extract feature vectors. If attackers modify the attack techniques, these previous methods may have to extract new feature representation to detect them. To address this problem, neural networks can be applied to perform feature learning. Doc2vec is one of these models that learn fixed-length feature representation from variable-length documents and has been applied to proxy logs. However, some attackers still use protocols other than http or https. In this paper, we extend the previous method to a generic detection method which supports any protocol. The key idea of this research is reading network packets as a natural language. In our method, a protocol analyzer reads network packets, and summarizes the traffic. Our method extracts the feature representation from the summary with Doc2vec. We apply several classifiers to the automatically extracted feature representation, and classify traffic into benign and malicious traffic. In the fundamental experiment, the best F-measure achieves 0.98 in the timeline analysis and 0.97 in the cross-dataset validation. Furthermore, we generate imbalanced datasets which simulate actual network traffic. In the practical experiment, the best F-measure achieves 0.82 in the timeline analysis and 0.73 in the cross-dataset validation.

Author supplied keywords

Cite

CITATION STYLE

APA

Mimura, M. (2019). An attempt to read network traffic with doc2vec. Journal of Information Processing, 27, 711–719. https://doi.org/10.2197/IPSJJIP.27.711

An attempt to read network traffic with doc2vec

Abstract

Author supplied keywords

Cite

Register to see more suggestions