Abstract
Modern malware imitates benign http traffic to evade detection. To detect unseen malicious traffic, a linguistic-based detection method for proxy logs has been proposed. This method uses Paragraph Vector to extract features automatically. To generate discriminative feature representation, a balanced corpus is required. In actual proxy logs, benign traffic is dominant, and occupies malicious feature representation. Therefore, the previous method does not perform accuracy in practical environment. This paper demonstrates that the previous method is not effective in actual proxy logs because of the imbalance. To mitigate the imbalance, our method extracts important words from proxy logs based on the TFIDF (Term Frequency Inverse Document Frequency) scores. The experimental results show our method can detect unseen malicious traffic in actual proxy logs. The best F-measure achieves 0.94 in the timeline analysis.
Author supplied keywords
Cite
CITATION STYLE
Mimura, M., & Tanaka, H. (2018). A linguistic approach towards intrusion detection in actual proxy logs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11149 LNCS, pp. 708–718). Springer Verlag. https://doi.org/10.1007/978-3-030-01950-1_42
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.