Abstract
In this paper we present a word decompounding method that is based on distributional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like "candle" and "stick") are semantically similar to the entire compound, which helps to exclude spurious splits (like "candles" and "tick"). We report results for German and Dutch: For German, our unsupervised method comes on par with the performance of a rule-based and a supervised method and significantly outperforms two unsupervised baselines. For Dutch, our method performs only slightly below a rule-based optimized compound splitter.
Cite
CITATION STYLE
Riedl, M., & Biemann, C. (2016). Unsupervised compound splitting with distributional semantics rivals supervised methods. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 617–622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1075
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.