In order to deal with more sophisticated Advanced Persistent Threat (APT) attacks, it is indispensable to convert cybersecurity threat intelligence via structured or semi-structured data specifications. In this paper, we convert the task of extracting indicators of compromises (IOC) information into a sequence labeling task of named entity recognition. We construct the dataset used for named entity identification in the threat intelligence domain and train word vectors in the threat intelligence domain. Meanwhile, we propose a new loss function TSFL, triplet loss function based on metric learning and sorted focal loss function, to solve the problem of unbalanced distribution of data labels. Experiments show that named entity recognition experiments show that F1 value have improved in both public domain datasets and threat intelligence.
CITATION STYLE
Wang, X., Xiong, Z., Du, X., Jiang, J., Jiang, Z., & Xiong, M. (2020). NER in Threat Intelligence Domain with TSFL. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12430 LNAI, pp. 157–169). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60450-9_13
Mendeley helps you to discover research relevant for your work.