Knowledge Extraction: Automatic Classification of Matching Rules

Yunyi Tang; Le Wang; Xiaolong Chen; Zhaoquan Gu; Zhihong Tian

Book Chapter

Knowledge Extraction: Automatic Classification of Matching Rules

Springer Science and Business Media Deutschland GmbH, (2021), 117-130

DOI: 10.1007/978-3-030-71590-8_7

1Citations

2Readers

Get full text

Abstract

With the fast development of information technologies, more massive amounts of data are produced in cyberspace. Traditional web search methods cannot satisfy users’ demands timely and accurately, and it is an urgent task to develop big search techniques in cyberspace. MDATA (Multi-dimensional Data Association and Intelligent Analysis) is a knowledge representation model with temporal and spatial characteristics. Through the effective expression of temporal and spatial characteristics, it supports efficient updating of dynamic knowledge. Pattern matching is often used to extract the needed knowledge from massive data for constructing the MDATA. Pattern matching requires matching rules to acquire needed substrings from a string. In practical application scenarios, some matching rules can be divided into several categories. The same category of the matching rules has the same meaning, but with different expressions. Regular expressions can aggregate matching rules with consistent structure and strong regularity together. However, in practical scenarios such as cyber security knowledge, such homogeneous matching rules are rare, and most of them are random and disordered. For random matching rules, manually designing regular expressions to aggregate them becomes time consuming and laborious. In order to address the problem, we apply word embedding algorithm to automatic classifying matching rules. Word embedding is a kind of representation learning algorithms which is usually adopted in recommendation systems, relation mining, text similarity matching and so on. It can convert words into low-dimensional space vectors based on neural network models. However, word embedding algorithms take into account the relationship between semantic information and context, which needs a large number of data. When we only consider the matching rules in pattern matching, such data is insufficient to reflect the context relationship, which leads to the failure of deriving accurate results. In this chapter, we design an automatic classification method which only needs a small number of data to meet the practical requirement.

Author supplied keywords

Cite

CITATION STYLE

APA

Tang, Y., Wang, L., Chen, X., Gu, Z., & Tian, Z. (2021). Knowledge Extraction: Automatic Classification of Matching Rules. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12647 LNCS, pp. 117–130). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-71590-8_7

Knowledge Extraction: Automatic Classification of Matching Rules

Abstract

Author supplied keywords

Cite

Register to see more suggestions