Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus-thesaurus hybrid method that uses soft constraints to generate word-sense-aware distributional profiles (DPs) from coarser "concept DPs" (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back gracefully on the word's co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domain-specific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others. © 2009 ACL and AFNLP.
CITATION STYLE
Marton, Y., Mohammad, S., & Resnik, P. (2009). Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models. In EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009 (pp. 775–783). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1699571.1699614
Mendeley helps you to discover research relevant for your work.