YAWN: A Semantically Annotated Wikipedia XML Corpus

ISSN: 16175468
3Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

Abstract

The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.

Cite

CITATION STYLE

APA

Schenkel, R., Suchanek, F., & Kasneci, G. (2007). YAWN: A Semantically Annotated Wikipedia XML Corpus. In Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft fur Informatik (GI) (Vol. P-103, pp. 277–291). Gesellschaft fur Informatik (GI).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free