Resource-bounded information extraction: Acquiring missing feature values on demand

0Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a general framework for the task of extracting specific information "on demand" from a large corpus such as the Web under resource-constraints. Given a database with missing or uncertain information, the proposed system automatically formulates queries, issues them to a search interface, selects a subset of the documents, extracts the required information from them, and fills the missing values in the original database. We also exploit inherent dependency within the data to obtain useful information with fewer computational resources. We build such a system in the citation database domain that extracts the missing publication years using limited resources from the Web. We discuss a probabilistic approach for this task and present first results. The main contribution of this paper is to propose a general, comprehensive architecture for designing a system adaptable to different domains. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Kanani, P., McCallum, A., & Hu, S. (2010). Resource-bounded information extraction: Acquiring missing feature values on demand. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6118 LNAI, pp. 415–427). https://doi.org/10.1007/978-3-642-13657-3_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free