A reference architecture to devise web information extractors

Hassan A. Sleiman; Rafael Corchuelo

Conference Proceedings

A reference architecture to devise web information extractors

Lecture Notes in Business Information Processing (2012) 112 LNBIP 235-248

DOI: 10.1007/978-3-642-31069-0_21

5Citations

6Readers

Get full text

Abstract

The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new information extraction technique requires an array of tasks that are specific to this technique and many tasks that are actually common between all techniques. The lack of a reference architectural proposal in the literature to guide software engineers in the design and implementation of information extractors, amounts to little reuse and the focus is usually blurred because of irrelevant details. In this paper, we present a reference architecture to design and implement rule learners for information extractors. We have implemented a software framework to support our architecture, and we have validated it by means of four case studies and a number of experiments that prove that our proposal helps reduce development costs significantly. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Sleiman, H. A., & Corchuelo, R. (2012). A reference architecture to devise web information extractors. In Lecture Notes in Business Information Processing (Vol. 112 LNBIP, pp. 235–248). Springer Verlag. https://doi.org/10.1007/978-3-642-31069-0_21

A reference architecture to devise web information extractors

Abstract

Author supplied keywords

Cite

Register to see more suggestions