DIADEM: Domains to databases

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

What if you could turn all websites of an entire domain into a single database? Imagine all real estate offers, all airline flights, or all your local restaurants' menus automatically collected from hundreds or thousands of agencies, travel agencies, or restaurants, presented as a single homogeneous dataset. Historically, this has required tremendous effort by the data providers and whoever is collecting the data: Vertical search engines aggregate offers through specific interfaces which provide suitably structured data. The semantic web vision replaces the specific interfaces with a single one, but still requires providers to publish structured data. Attempts to turn human-oriented HTML interfaces back into their underlying databases have largely failed due to the variability of web sources. In this paper, we demonstrate that this is about to change: The availability of comprehensive entity recognition together with advances in ontology reasoning have made possible a new generation of knowledgedriven, domain-specific data extraction approaches. To that end, we introduce diadem, the first automated data extraction system that can turn nearly any website of a domain into structured data, working fully automatically, and present some preliminary evaluation results. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Furche, T., Gottlob, G., & Schallhart, C. (2012). DIADEM: Domains to databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7446 LNCS, pp. 1–8). https://doi.org/10.1007/978-3-642-32600-4_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free