Unsupervised compound splitting with distributional semantics rivals supervised methods

24Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.

Abstract

In this paper we present a word decompounding method that is based on distributional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like "candle" and "stick") are semantically similar to the entire compound, which helps to exclude spurious splits (like "candles" and "tick"). We report results for German and Dutch: For German, our unsupervised method comes on par with the performance of a rule-based and a supervised method and significantly outperforms two unsupervised baselines. For Dutch, our method performs only slightly below a rule-based optimized compound splitter.

Cite

CITATION STYLE

APA

Riedl, M., & Biemann, C. (2016). Unsupervised compound splitting with distributional semantics rivals supervised methods. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 617–622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1075

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free