DIADEM: Domains to databases

Tim Furche; Georg Gottlob; Christian Schallhart

Conference Proceedings

DIADEM: Domains to databases

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7446 LNCS(PART 1) 1-8

DOI: 10.1007/978-3-642-32600-4_1

0Citations

8Readers

Get full text

Abstract

What if you could turn all websites of an entire domain into a single database? Imagine all real estate offers, all airline flights, or all your local restaurants' menus automatically collected from hundreds or thousands of agencies, travel agencies, or restaurants, presented as a single homogeneous dataset. Historically, this has required tremendous effort by the data providers and whoever is collecting the data: Vertical search engines aggregate offers through specific interfaces which provide suitably structured data. The semantic web vision replaces the specific interfaces with a single one, but still requires providers to publish structured data. Attempts to turn human-oriented HTML interfaces back into their underlying databases have largely failed due to the variability of web sources. In this paper, we demonstrate that this is about to change: The availability of comprehensive entity recognition together with advances in ontology reasoning have made possible a new generation of knowledgedriven, domain-specific data extraction approaches. To that end, we introduce diadem, the first automated data extraction system that can turn nearly any website of a domain into structured data, working fully automatically, and present some preliminary evaluation results. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Furche, T., Gottlob, G., & Schallhart, C. (2012). DIADEM: Domains to databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7446 LNCS, pp. 1–8). https://doi.org/10.1007/978-3-642-32600-4_1

DIADEM: Domains to databases

Abstract

Cite

Register to see more suggestions