Inference of a concise regular expression considering interleaving from XML documents

Xiaolan Zhang; Yeting Li; Fanlin Cui; Chunmei Dong; Haiming Chen

Conference Proceedings

Inference of a concise regular expression considering interleaving from XML documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10938 LNAI 389-401

DOI: 10.1007/978-3-319-93037-4_31

8Citations

6Readers

Get full text

Abstract

XML schemas are useful in various applications. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Therefore, it is essential to design efficient algorithms for schema learning. Each element in XML schema has its content model defined by a regular expression. Schema learning can be reduced to the inference of restricted regular expressions. In this paper, we focus on learning restricted regular expressions with interleaving from a set of XML documents. The new subclass is named as CHAin Regular Expression with Interleaving (ICHARE). Then based on single occurrence automaton (SOA) and maximum independent set (MIS), we introduce an inference algorithm GenICHARE. The algorithm is proved to infer a descriptive ICHARE from a set of given sample. At last, based on the data set crawled from the Web, we compare the coverage proportion of ICHARE compared with other existing subclasses. Besides, we analyze the conciseness of regular expressions inferred by GenICHARE based on DBLP. Experimental results show that ICHARE is more concise and useful in practice, and the inference algorithm is promising and effective.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhang, X., Li, Y., Cui, F., Dong, C., & Chen, H. (2018). Inference of a concise regular expression considering interleaving from XML documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10938 LNAI, pp. 389–401). Springer Verlag. https://doi.org/10.1007/978-3-319-93037-4_31

Inference of a concise regular expression considering interleaving from XML documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions