Entity extraction and consolidation for social web content preservation

ISSN: 16130073
7Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time becomes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the information objects related to them. Therefore, the crucial challenge is to extract, detect and correlate entities from a vast number of heterogeneous Web resources where the nature and quality of the content may vary heavily. While a wealth of information extraction tools aid this process, we believe that, the consolidation of automatically extracted data has to be treated as an equally important step in order to ensure high quality and non-ambiguity of generated data. In this paper we present an approach which is based on an iterative cycle exploiting Web data for (1) targeted archiving/crawling of Web objects, (2) entity extraction, and detection, and (3) entity correlation. The long-term goal is to preserve Web content over time and allow its navigation and analysis based on well-formed structured RDF data about entities.

Cite

CITATION STYLE

APA

Dietze, S., Maynard, D., Demidova, E., Risse, T., Peters, W., Doka, K., & Stavrakas, Y. (2012). Entity extraction and consolidation for social web content preservation. In CEUR Workshop Proceedings (Vol. 912, pp. 18–29).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free