Malicious code has been a serious threat in the field of network security. PDF (Portable Document Format) is a widely used file format, and often utilized as a vehicle for malicious behavior. In this paper, machine learning algorithm will be used to detect malicious PDF document, and evaluated on experimental data. The main work of this paper is to implement a malware detection method, which utilizes static pre-processing and machine learning algorithm for classification. During the period of classifying, the differences in structure and content between malicious and benign PDF files will be taken as the classification basis. What’s more, we boost training for the PDF malware classifier via active learning based on mutual agreement analysis. The detector is retrained according to the truth value of the uncertain samples, which can not only reduce the training time consumption of the detector, but also improve the detection performance.
CITATION STYLE
Wang, X., Li, Y., Zhang, Q., & Kuang, X. (2019). Boosting training for PDF malware classifier via active learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11983 LNCS, pp. 101–110). Springer. https://doi.org/10.1007/978-3-030-37352-8_9
Mendeley helps you to discover research relevant for your work.