Business specific online information extraction from German websites

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifies business specific information. We therefore concentrate on the extraction of characteristic vocabulary like company names, addresses, contact details, CEOs, etc. Above all, we interpret the HTML structure of documents and analyze some contextual facts to transform the unstructured web pages into structured forms. Our approach is quite robust in variability of the DOM, upgradeable and keeps data up-to-date. The evaluation experiments show high efficiency of information access to the generated data. Hence, the developed technique is adaptive to non-German websites with slight language-specific modifications, and experimental results on reallife websites confirm the feasibility of the approach. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Lee, Y. S., & Geierhos, M. (2009). Business specific online information extraction from German websites. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5449 LNCS, pp. 369–381). https://doi.org/10.1007/978-3-642-00382-0_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free