Unsupervised Concept Representation Learning for Length-Varying Text Similarity

3Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.

Abstract

Measuring document similarity plays an important role in natural language processing tasks. Most existing document similarity approaches suffer from the information gap caused by context and vocabulary mismatches when comparing varying-length texts. In this paper, we propose an unsupervised concept representation learning approach to address the above issues. Specifically, we propose a novel Concept Generation Network (CGNet) to learn concept representations from the perspective of the entire text corpus. Moreover, a concept-based document matching method is proposed to leverage advances in the recognition of local phrase features and corpus-level concept features. Extensive experiments on real-world data sets demonstrate that new method can achieve a considerable improvement in comparing length-varying texts. In particular, our model achieved 6.5% better F1 Score compared to the best of the baseline models for a concept-project benchmark dataset.

Cite

CITATION STYLE

APA

Zhang, X., Zong, B., Cheng, W., Ni, J., Liu, Y., & Chen, H. (2021). Unsupervised Concept Representation Learning for Length-Varying Text Similarity. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 5611–5620). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.445

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free