Entity extraction and consolidation for social web content preservation

Stefan Dietze; Diana Maynard; Elena Demidova; Thomas Risse; Wim Peters; Katerina Doka; Yannis Stavrakas

Conference Proceedings

Entity extraction and consolidation for social web content preservation

CEUR Workshop Proceedings (2012) 912 18-29

ISSN: 16130073

7Citations

13Readers

Abstract

With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time becomes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the information objects related to them. Therefore, the crucial challenge is to extract, detect and correlate entities from a vast number of heterogeneous Web resources where the nature and quality of the content may vary heavily. While a wealth of information extraction tools aid this process, we believe that, the consolidation of automatically extracted data has to be treated as an equally important step in order to ensure high quality and non-ambiguity of generated data. In this paper we present an approach which is based on an iterative cycle exploiting Web data for (1) targeted archiving/crawling of Web objects, (2) entity extraction, and detection, and (3) entity correlation. The long-term goal is to preserve Web content over time and allow its navigation and analysis based on well-formed structured RDF data about entities.

Author supplied keywords

Cite

CITATION STYLE

APA

Dietze, S., Maynard, D., Demidova, E., Risse, T., Peters, W., Doka, K., & Stavrakas, Y. (2012). Entity extraction and consolidation for social web content preservation. In CEUR Workshop Proceedings (Vol. 912, pp. 18–29).

Entity extraction and consolidation for social web content preservation

Abstract

Author supplied keywords

Cite

Register to see more suggestions