Adding missing words to regular expressions

Thomas Rebele; Katerina Tzompanaki; Fabian M. Suchanek

Conference Proceedings

Adding missing words to regular expressions

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10938 LNAI 67-79

DOI: 10.1007/978-3-319-93037-4_6

7Citations

4Readers

Get full text

Abstract

Regular expressions (regexes) are patterns that are used in many applications to extract words or tokens from text. However, even hand-crafted regexes may fail to match all the intended words. In this paper, we propose a novel way to generalize a given regex so that it matches also a set of missing (previously non-matched) words. Our method finds an approximate match between the missing words and the regex, and adds disjunctions for the unmatched parts appropriately. We show that this method can not just improve the precision and recall of the regex, but also generate much shorter regexes than baselines and competitors on various datasets.

Cite

CITATION STYLE

APA

Rebele, T., Tzompanaki, K., & Suchanek, F. M. (2018). Adding missing words to regular expressions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10938 LNAI, pp. 67–79). Springer Verlag. https://doi.org/10.1007/978-3-319-93037-4_6

Adding missing words to regular expressions

Abstract

Cite

Register to see more suggestions