Automatically linking Wikipedia pages is done mostly by two strategies:(i) a content based strategy based on word similarities or (ii) astructural similarity exploiting link characteristics. In our approachwe focus on a content based strategy by finding anchors using thetitle of candidate Wikipedia pages and resolving matching links bytaking the context of the link anchor, i.e. its surrounding text,into account. Bestentry-points are estimated on a combination oftitle and content basedsimilarity. Our goal was to evaluate syntactic title matching propertiesand the influence of the context around anchors for disambiguationand best-entry-point detection. Results show, that the whole Wikipediapage provides the best context for resolving links and that simpleinverse document frequency based scoring of anchor texts is alsocapable of achieving high accuracy.
Granitzer, M., Seifert, C., & Zechner, M. (2008). Context Resolution Strategies for Automatic Wikipedia Learning. In S. Geva, J. Kamps, & A. Trotman (Eds.), INEX 2008 pre-proceedings (pp. 292–304). Dagstuhl, Germany.