The paper presents YAWN, a system to convert the well-known and widely usedWikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information inWiki- pedia, which is a high-quality, manually assigned source of information, extracts addi- tional information from lists, and utilizes the invocations of templates with named pa- rameters. We give examples how such annotations can be exploited for high-precision queries.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below