Inference of a concise regular expression considering interleaving from XML documents

8Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

XML schemas are useful in various applications. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Therefore, it is essential to design efficient algorithms for schema learning. Each element in XML schema has its content model defined by a regular expression. Schema learning can be reduced to the inference of restricted regular expressions. In this paper, we focus on learning restricted regular expressions with interleaving from a set of XML documents. The new subclass is named as CHAin Regular Expression with Interleaving (ICHARE). Then based on single occurrence automaton (SOA) and maximum independent set (MIS), we introduce an inference algorithm GenICHARE. The algorithm is proved to infer a descriptive ICHARE from a set of given sample. At last, based on the data set crawled from the Web, we compare the coverage proportion of ICHARE compared with other existing subclasses. Besides, we analyze the conciseness of regular expressions inferred by GenICHARE based on DBLP. Experimental results show that ICHARE is more concise and useful in practice, and the inference algorithm is promising and effective.

Cite

CITATION STYLE

APA

Zhang, X., Li, Y., Cui, F., Dong, C., & Chen, H. (2018). Inference of a concise regular expression considering interleaving from XML documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10938 LNAI, pp. 389–401). Springer Verlag. https://doi.org/10.1007/978-3-319-93037-4_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free