YAWN: A Semantically Annotated Wikipedia XML Corpus

  • Schenkel R
  • Suchanek F
  • Kasneci G
  • 40


    Mendeley users who have this article in their library.
  • 54


    Citations of this article.


The paper presents YAWN, a system to convert the well-known and widely usedWikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information inWiki- pedia, which is a high-quality, manually assigned source of information, extracts addi- tional information from lists, and utilizes the invocations of templates with named pa- rameters. We give examples how such annotations can be exploited for high-precision queries.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

  • PUI: 368337900
  • SGR: 84873920881
  • SCOPUS: 2-s2.0-84873920881
  • ISBN: 9783885791973


  • Ralf Schenkel

  • Fabian M Suchanek

  • Gjergji Kasneci

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free