Detecting new malicious traffic is a challenging task. There are many behavior-based detection methods which extract the features of malicious traffic. However, many previous methods require knowledge of how to extract feature vectors. If attackers modify the attack techniques, these previous methods may have to extract new feature representation to detect them. To address this problem, neural networks can be applied to perform feature learning. Doc2vec is one of these models that learn fixed-length feature representation from variable-length documents and has been applied to proxy logs. However, some attackers still use protocols other than http or https. In this paper, we extend the previous method to a generic detection method which supports any protocol. The key idea of this research is reading network packets as a natural language. In our method, a protocol analyzer reads network packets, and summarizes the traffic. Our method extracts the feature representation from the summary with Doc2vec. We apply several classifiers to the automatically extracted feature representation, and classify traffic into benign and malicious traffic. In the fundamental experiment, the best F-measure achieves 0.98 in the timeline analysis and 0.97 in the cross-dataset validation. Furthermore, we generate imbalanced datasets which simulate actual network traffic. In the practical experiment, the best F-measure achieves 0.82 in the timeline analysis and 0.73 in the cross-dataset validation.
CITATION STYLE
Mimura, M. (2019). An attempt to read network traffic with doc2vec. Journal of Information Processing, 27, 711–719. https://doi.org/10.2197/IPSJJIP.27.711
Mendeley helps you to discover research relevant for your work.