Faced with the large amount of unlabeled short text data appearing on the Internet, it is necessary to categorize them using clustering that can divide text into several clusters based on similarity degree of text semantics. Recently, combining clustering with contrastive learning has been the focus of clustering research. Due to the excellent representation learning ability of contrastive learning, clustering achieves better results on short texts that are high-dimensional and sparse. However, contrastive learning pays more attention to general feature representation at the instance-level and ignores the semantic-level correlation of data belonging to same cluster in clustering. The inconsistent training objectives of contrastive learning and clustering lead to lower confidence of clustering results and sparse cluster space. To improve this problem, we propose a clustering method based on Dynamic Adjustment for Contrastive Learning (DACL). The method smoothly transitions loss weight of model from contrastive learning to clustering during training and filters negative samples in contrastive learning by the pseudo-labels generated by clustering. To demonstrate the effectiveness of the method, DACL is compared with eight short text clustering models on eight datasets. The results show that we achieve considerable performance improvements on most datasets compared to state-of-the-art short text clustering methods. In addition, The effectiveness of loss smooth transition and negative filtering is proved by ablation experiments.
CITATION STYLE
Li, R., & Wang, H. (2022). Clustering of Short Texts Based on Dynamic Adjustment for Contrastive Learning. IEEE Access, 10, 76069–76078. https://doi.org/10.1109/ACCESS.2022.3192442
Mendeley helps you to discover research relevant for your work.