A central problem of word sense disambiguation (WSD) is the lack of manually sense-tagged data required for supervised learning. In this paper, we evaluate an approach to automatically acquire sense-tagged training data from English-Chinese parallel corpora, which are then used for disambiguating the nouns in the SENSEVAL-2 English lexical sample task. Our investigation reveals that this method of acquiring sense-tagged data is promising. On a subset of the most difficult SENSEVAL-2 nouns, the accuracy difference between the two approaches is only 14.0%, and the difference could narrow further to 6.5% if we disregard the advantage that manually sense-tagged data have in their sense coverage. Our analysis also highlights the importance of the issue of domain dependence in evaluating WSD programs.
CITATION STYLE
Ng, H. T., Wang, B., & Chan, Y. S. (2003). Exploiting parallel texts for word sense disambiguation: An empirical study. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2003-July). Association for Computational Linguistics (ACL).
Mendeley helps you to discover research relevant for your work.