Schema inference has been an essential task in database management, and can be reduced to learning regular expressions from sets of positive finite-sample. In this paper, we extend the single-occurrence regular expressions (SOREs) to single-occurrence regular expressions with unorder (uSOREs), and give an inference algorithm for uSOREs. First, we present an unorder-countable finite automaton (uCFA). Then, we construct an uCFA for recognizing the given finite sample. Next, the uCFA runs on the given finite sample to count the number of occurrences of the subexpressions (connectable via unorder) for every possibly repeated matching. Finally we transform the uCFA to an uSORE according to the above results of counting. Experimental results demonstrate that, for larger samples, our algorithm can efficiently infer an uSORE with better generalization ability.
CITATION STYLE
Wang, X., & Chen, H. (2020). Inferring Deterministic Regular Expression with Unorder. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12011 LNCS, pp. 325–337). Springer. https://doi.org/10.1007/978-3-030-38919-2_27
Mendeley helps you to discover research relevant for your work.