Many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and all edges of t have labels. Such data is called tree structured data. Analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information, we can speed up such a heavy process. In this paper, we consider a problem of effective compression of an ordered rooted tree, which represents given tree structured data, without loss of information. Firstly, in order to define this problem in a grammar-based compression scheme, we present a variable replacement grammar (VRG for short) over ordered rooted trees. The grammar-based compression problem for an ordered rooted tree T is defined as a problem of finding a VRG which generates only T and whose size is minimum. For the grammar-based compression problem for an ordered rooted tree, we show that there is no polynomial time algorithm with approximation ratio less than 8593/8592 unless P=NP. Secondly, based on this theoretical result, we present an effective compression algorithm for finding a VRG which generates only a given ordered rooted tree and whose size is as small as possible. Finally, in order to evaluate the performance of our grammar-based compression algorithm, we report some experimental results.
CITATION STYLE
Yamagata, K., Uchida, T., Shoudai, T., & Nakamura, Y. (2003). An effective grammar-based compression algorithm for tree structured data. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2835, pp. 383–400). Springer Verlag. https://doi.org/10.1007/978-3-540-39917-9_25
Mendeley helps you to discover research relevant for your work.