Cyberattack techniques continue to evolve every day. Detecting unseen drive-by-download attacks or C&C traffic is a challenging task. Pattern-matching-based techniques and using malicious blacklists are not efficient anymore, because attackers easily change the traffic pattern or infrastructure to avoid detection. Therefore, many behaviorbased detection methods have been proposed, which use the immutable characteristic of the traffic. These previous methods, however, focus on the attack technique, and can only detect drive-by-download (DbD) attacks or C&C traffic which have the immutable characteristic. These traditional methods have to devise the feature vectors. This paper proposes a generic detection method, which is independent of attack methods and does not need devising feature vectors. Our method uses Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length texts and classifiers. Our method uses Paragraph Vector to capture the context in proxy server logs. We conducted cross-validation, timeline analysis and cross-dataset validation with multiple datasets. The experimental results show our method can detect unseen DbD attacks and C&C traffic in proxy server logs. The best F-measure achieved 0.97 in the timeline analysis and 0.96 on the other dataset.
CITATION STYLE
Mimura, M., & Tanaka, H. (2018). Leaving all proxy server logs to paragraph vector. Journal of Information Processing, 26, 804–812. https://doi.org/10.2197/ipsjjip.26.804
Mendeley helps you to discover research relevant for your work.