The theory of indexing texts is well-researched, which does not hold for indexing other data structures, such as trees for example. In this paper a simple method of indexing a tree for subsequences of string paths in the tree by finite automaton is presented. The use of the index is shown on indexing XML documents for XPath descendant-orself axis inspired queries. Given a subject tree T with n nodes, the tree is preprocessed and an index, which is a directed acyclic subsequence graph for a set of strings, is constructed. The searching phase uses the index, reads an input string path subsequence Q inspired by the specific XPath query of size m and computes the list of positions of all occurrences of Q in the tree T . The searching is performed in time O(m) and does not depend on n. Although the number of distinct valid queries is O(2n), the size of the index is O(hk), where h is the height of the tree T and k is the number of its leaves. Moreover, we discuss that in the case of indexing a common XML document the size of the index is even smaller O(h.2k).
CITATION STYLE
Šestáková, E., & Janoušek, J. (2015). Tree string path subsequences automaton and its use for indexing XML documents. In Communications in Computer and Information Science (Vol. 563, pp. 171–181). Springer Verlag. https://doi.org/10.1007/978-3-319-27653-3_17
Mendeley helps you to discover research relevant for your work.