Dependency Tree Kernels for Relat...
Dependency Tree Kernels for Relation Extraction from Natural Language Text Frank Reichartz, Hannes Korte, and Gerhard Paass Fraunhofer IAIS, Schloss Birlinghoven, 53754 St. Augustin, Germany Abstract. The automatic extraction of relations from unstructured nat- ural text is challenging but offers practical solutions for many problems like automatic text understanding and semantic retrieval. Relation ex- traction can be formulated as a classification problem using support vec- tor machines and kernels for structured data that may include parse trees to account for syntactic structure. In this paper we present new tree ker- nels over dependency parse trees automatically generated from natural language text. Experiments on a public benchmark data set show that our kernels with richer structural features significantly outperform all published approaches for kernel-based relation extraction from depen- dency trees. In addition we optimize kernel computations to improve the actual runtime compared to previous solutions. 1 Introduction Current search engines usually are not effective for complex queries, e.g. ���com- posers born in Berlin���. The retrieved documents among others contain composers who stayed some time in Berlin or have the name ���Berlin���. Obviously the internal representation of text in a search index as a sequence of words is insu���cient to recover semantics from unstructured text. An important step towards automatic knowledge discovery is to extract semantic relations between entities. Information extraction tackles this goal in two steps. First entity or phrase taggers detect objects of different types, such as persons, descriptions or pro- nouns, mentioned in the text. Some of these techniques have reached a su���- cient performance level on many datasets [18]. They offer the basis for the next step: the extraction of relations that exist between the recognized entities, e.g. composer-born-in(John White, Berlin). An early approach to relation extraction is based on patterns [6], usually expressed as regular expressions for words with wildcards. The underlying hy- pothesis assumes that terms sharing similar linguistic contexts are connected by similar semantic relations. Various authors follow this approach, e.g. [1] use frequent itemset mining to extract word patterns and [7] employ logic-based frequent structural patterns for relation extraction. Syntactic parse trees provide extensive information on syntactic structure and can, for instance, represent the relation between subject, verb and object in a sentence. For feature-based methods only a limited number of structural details may be compared. On the other hand, kernel-based methods offer e���cient W. Buntine et al. (Eds.): ECML PKDD 2009, Part II, LNAI 5782, pp. 270���285, 2009. c Springer-Verlag Berlin Heidelberg 2009
DTK for Relation Extraction from Natural Language Text 271 solutions that allow to explore a much larger (often exponential, or in some cases, infinite) characteristics of trees in polynomial time, without the need to explicitly represent the features. [20] and [4] proposed kernels for dependency trees inspired by string kernels. [2] investigated a kernel that computes similarities between nodes on the shortest path of a dependency tree that connect the entities. All these kernels are used as input for a kernel classifier. In this paper we extend current dependency tree kernels by including richer structural features. To tackle the different shortcomings of previous work we use the ordering properties as well as the labeling of nodes in dependency trees in a novel fashion to create kernels which consider most of the available information in dependency trees. To allow the usage of more substructure properties while main- taining an acceptable runtime we propose two new computation algorithms tai- lored for relation extraction tree kernels. Our new kernels are shown to outperform all previously published kernels in classification quality by a significant margin on a public benchmark. Our kernels reach F-measures of 77% ��� 80% on selected re- lations which is su���cient for some applications like information retrieval. The remainder of the paper is organized as follows. In the next section we describe dependency parse trees used for relation classification. Subsequently we give a generic description of the current dependency parse trees in the literature. The following two sections outline our new kernels for relation extraction, the All-Pairs Dependency Tree Kernel as well as the Dependency Path Tree Kernel. Next we describe different versions of the algorithms optimized for e���ciency. For the experiments we re-implemented existing kernels and compare them to our new kernels on a benchmark dataset. We close with a summary and conclusions. 2 Dependency Parse Trees A dependency tree is a structured representation of the grammatical dependency between the words of a sentence by a labeled directed tree [11]. Structure is determined by the relation between a word (a head) and its dependents. The dependent in turn can be the head for other words yielding a tree structure. Each node in the tree corresponds to a word in the sentence with arrows pointing from the head to the dependents. Dependency trees may be typed, specializing the ���dependent��� relation into many subtypes, e.g. as ���auxiliary���, ���subject���, ���object���, while in the untyped case there is only a ���dependent��� relation. In this paper we consider untyped dependency trees only, generated by the Stanford Parser [10] from the sentences of a text. As an example consider the two sentences a = ���Recently Obama became the president of the USA��� and b = ���Ballmer is the CEO of Microsoft��� which have the tree representations as shown in figure 1. 2.1 Notation Let w = w1 w2 . . . wn be a sequence of words, a natural language sentence with words wj ��� W. The parser will generate a representation of the sentence w as
272 F. Reichartz, H. Korte, and G. Paass b1 2 : is b2 1 : Ballmer b3 4 : CEO b4 3 : the b5 5 : of b6 6 : Microsoft a1 3 : became a2 1 : Recently a3 2 : Obama a4 5 : president a5 4 : the a6 6 : of a7 8 : USA a8 7 : the To(a): To(b): Fig. 1. The dependency trees of the two example sentences. The nodes are labeled with the position of the word in the sentence and the word itself. A thick border marks an entity mention. a labeled rooted connected tree T (w) = (V, E) with nodes V = {v1, . . . , vn} and edges in E ��� V �� V . Each node vi is labeled with a corresponding word w��(vi ) in the sentence w, where ��(vi) is a bijective function mapping a node vi to the index j of it corresponding word wj ��� W in w. For example in figure 1 we have ��(a4) = 5. For each node v ��� V the ordered sequence of its m children ch(v) = (u1, . . . , um) with ui ��� V satisfies ��(ui) ��(ui+1) for all i ��� {1, . . . , m ��� 1}. The node a1 =���became��� in figure 1, for instance, has ch(a1) = (a2 = ���Recently���, a3 = ���Obama���, a4 = ���president���) as ordered sequence of child nodes. With the order induced by �� we get an ordered tree To(w). For a sequence s = (sj )j���[1:k] of length k the set of all possible subsequence index sets can be denoted as Ik . Each ordered sequence o(i) = i = (i1, . . . , im) of a subset i ��� I of the indices I of a sequence s = (sj )j���I implies a subsequence (si1 , . . . , sim ) which we denote by s (i), which is inline with the notation of [17]. Therefore we can write the subsequence of children referenced by i from a node v of an ordered tree as ch (v, i). In figure 1, for example, ch(a1) = (a2, a3, a4) has the index set {1, 2, 3}. The subset {1, 3} in its representation as ordered sequence (1, 3) defines the subsequence ch(a1, (1, 3)) = (a2 = ���Recently���, a4 = ���president���) of child nodes of a1. We consider relations connecting entities or objects, which in this paper are collectively called entities. We assume that these entities have been extracted in a prior step, e.g. by named entity recognition tools. They are treated as a single