Regular expression learning for information extraction

Yunyao Li; Rajasekar Krishnamurthy; Sriram Raghavan; Shivakumar Vaithyanathan; H. V. Jagadish

Conference ProceedingsOPEN ACCESS

Regular expression learning for information extraction

EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (2008) 21-30

DOI: 10.3115/1613715.1613719

133Citations

214Readers

Abstract

Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose Re-LIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE. © 2008 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Li, Y., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., & Jagadish, H. V. (2008). Regular expression learning for information extraction. In EMNLP 2008 - 2008 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference: A Meeting of SIGDAT, a Special Interest Group of the ACL (pp. 21–30). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1613715.1613719

Regular expression learning for information extraction

Abstract

Cite

Register to see more suggestions