YAWN: A Semantically Annotated Wikipedia XML Corpus

Ralf Schenkel; Fabian Suchanek; Gjergji Kasneci

Conference Proceedings

YAWN: A Semantically Annotated Wikipedia XML Corpus

Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI) (2007) P-103 277-291

ISSN: 16175468

3Citations

44Readers

Abstract

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

Cite

CITATION STYLE

APA

Schenkel, R., Suchanek, F., & Kasneci, G. (2007). YAWN: A Semantically Annotated Wikipedia XML Corpus. In Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI) (Vol. P-103, pp. 277–291). Gesellschaft fur Informatik (GI).

YAWN: A Semantically Annotated Wikipedia XML Corpus

Abstract

Cite

Register to see more suggestions