A reference architecture to devise web information extractors

5Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new information extraction technique requires an array of tasks that are specific to this technique and many tasks that are actually common between all techniques. The lack of a reference architectural proposal in the literature to guide software engineers in the design and implementation of information extractors, amounts to little reuse and the focus is usually blurred because of irrelevant details. In this paper, we present a reference architecture to design and implement rule learners for information extractors. We have implemented a software framework to support our architecture, and we have validated it by means of four case studies and a number of experiments that prove that our proposal helps reduce development costs significantly. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Sleiman, H. A., & Corchuelo, R. (2012). A reference architecture to devise web information extractors. In Lecture Notes in Business Information Processing (Vol. 112 LNBIP, pp. 235–248). Springer Verlag. https://doi.org/10.1007/978-3-642-31069-0_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free