In this chapter we discuss the evaluation of automatic word sense disam- biguation (WSD) systems. Some issues, such as evaluation metrics and the basic methodology for hand-tagging evaluation data, are well agreed upon by the WSD community. However, other important issues remain to be resolved, including the question of which sense distinctions are important and relevant to the sense-tagging task, and how to evaluate WSD systems in real NLP applications. We give an overview of previous evaluation exercises and investigate sources of human inter-annotator disagreements. The errors are at least partially reconciled by a more coarse-grained view of the senses, and we present the groupings that were used for quantitative coarse-grained evaluation. Well-defined sense groups can be of value in improving sense tagging consistency for both humans and machines.
CITATION STYLE
Palmer, M., Ng, H. T., & Dang, H. T. (2006). Evaluation of WSD Systems. In Word Sense Disambiguation (pp. 75–106). Kluwer Academic Publishers. https://doi.org/10.1007/1-4020-4809-2_4
Mendeley helps you to discover research relevant for your work.