Abstract
We present a new unsupervised mechanism, which ranks word n-grams according to their multiwordness. It heavily relies on a new uniqueness measure that computes, based on a distributional thesaurus, how often an n-gram could be replaced in context by a single-worded term. In addition with a downweighting mechanism for incomplete terms this forms a new measure called DRUID. Results show large improvements on two small test sets over competitive baselines. We demonstrate the scalability of the method to large corpora, and the independence of the measure of shallow syntactic filtering.
Cite
CITATION STYLE
Riedl, M., & Biemann, C. (2015). A single word is not enough: Ranking multiword expressions using distributional semantics. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 2430–2440). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d15-1290
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.