Unsupervised compound splitting with distributional semantics rivals supervised methods

Martin Riedl; Chris Biemann

Conference ProceedingsOPEN ACCESS

Unsupervised compound splitting with distributional semantics rivals supervised methods

2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (2016) 617-622

DOI: 10.18653/v1/n16-1075

24Citations

84Readers

Abstract

In this paper we present a word decompounding method that is based on distributional semantics. Our method does not require any linguistic knowledge and is initialized using a large monolingual corpus. The core idea of our approach is that parts of compounds (like "candle" and "stick") are semantically similar to the entire compound, which helps to exclude spurious splits (like "candles" and "tick"). We report results for German and Dutch: For German, our unsupervised method comes on par with the performance of a rule-based and a supervised method and significantly outperforms two unsupervised baselines. For Dutch, our method performs only slightly below a rule-based optimized compound splitter.

Cite

CITATION STYLE

APA

Riedl, M., & Biemann, C. (2016). Unsupervised compound splitting with distributional semantics rivals supervised methods. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 617–622). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1075

Unsupervised compound splitting with distributional semantics rivals supervised methods

Abstract

Cite

Register to see more suggestions