Learning to harvest information for the semantic web

Fabio Ciravegna; Sam Chapman; Alexiei Dingli; Yorick Wilks

Journal Article

Learning to harvest information for the semantic web

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3053 312-326

DOI: 10.1007/978-3-540-25956-5_22

51Citations

52Readers

Get full text

Abstract

In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved information is then used to partially annotate documents. Annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Ciravegna, F., Chapman, S., Dingli, A., & Wilks, Y. (2004). Learning to harvest information for the semantic web. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3053, 312–326. https://doi.org/10.1007/978-3-540-25956-5_22

Learning to harvest information for the semantic web

Abstract

Cite

Register to see more suggestions