In this paper, we investigate the q-gram distance for ordered unlabeled trees (trees, for short). First, we formulate a q-gram as simply a tree with q nodes isomorphic to a line graph, and the q-gram distance between two trees as similar as one between two strings. Then, by using the depth sequence based on postorder, we design the algorithm EnumGram to enumerate all q-grams in a tree T with n nodes which runs in O(n2) time and in O(q) space. Furthermore, we improve it to the algorithm LinearEnumGram which runs in O(qn) time and in O(qd) space, where d is the depth of T. Hence, we can evaluate the q-gram distance Dq(T1, T2) between T 1 and T2 in O(q maxn1, n2}) time and in O(q max{d1, d2}) space, where ni and di are the number of nodes in Ti and the depth of T i, respectively. Finally, we show the relationship between the q-gram distance Dq(T1,T2) and the edit distance E(T1, T2) that Dq(T1, T2) ≤ (gl+ 1) E(T1, T2), where g = max{g1, g2}, l = max{l1, l2}, p i is the degree of Ti and li is the number of leaves in Ti. In particular, for the top-down tree edit distance F(T1, T2), this result implies that Dq(T 1, T2) ≤ min{sq-2, l-1} F(T1. T2). © Springer.Verlag Berlin Heidelberg 2005.
CITATION STYLE
Ohkura, N., Hirata, K., Kuboyama, T., & Harao, M. (2005). The q-gram distance for ordered unlabeled trees. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3735 LNAI, pp. 189–202). https://doi.org/10.1007/11563983_17
Mendeley helps you to discover research relevant for your work.