NER in Threat Intelligence Domain with TSFL

Xuren Wang; Zihan Xiong; Xiangyu Du; Jun Jiang; Zhengwei Jiang; Mengbo Xiong

Conference Proceedings

NER in Threat Intelligence Domain with TSFL

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12430 LNAI 157-169

DOI: 10.1007/978-3-030-60450-9_13

7Citations

7Readers

Get full text

Abstract

In order to deal with more sophisticated Advanced Persistent Threat (APT) attacks, it is indispensable to convert cybersecurity threat intelligence via structured or semi-structured data specifications. In this paper, we convert the task of extracting indicators of compromises (IOC) information into a sequence labeling task of named entity recognition. We construct the dataset used for named entity identification in the threat intelligence domain and train word vectors in the threat intelligence domain. Meanwhile, we propose a new loss function TSFL, triplet loss function based on metric learning and sorted focal loss function, to solve the problem of unbalanced distribution of data labels. Experiments show that named entity recognition experiments show that F1 value have improved in both public domain datasets and threat intelligence.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang, X., Xiong, Z., Du, X., Jiang, J., Jiang, Z., & Xiong, M. (2020). NER in Threat Intelligence Domain with TSFL. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12430 LNAI, pp. 157–169). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60450-9_13

NER in Threat Intelligence Domain with TSFL

Abstract

Author supplied keywords

Cite

Register to see more suggestions