The lack of labeled data is one of the main challenges when building a task-oriented dialogue system. Existing dialogue datasets usually rely on human labeling, which is expensive, limited in size, and in low coverage. In this paper, we instead propose our framework auto-dialabel to automatically cluster the dialogue intents and slots. In this framework, we collect a set of context features, leverage an autoencoder for feature assembly, and adapt a dynamic hierarchical clustering method for intent and slot labeling. Experimental results show that our framework can promote human labeling cost to a great extent, achieve good intent clustering accuracy (84.1%), and provide reasonable and instructive slot labeling results.
CITATION STYLE
Shi, C., Chen, Q., Sha, L., Li, S., Sun, X., Wang, H., & Zhang, L. (2018). Auto-dialabel: Labeling dialogue data with unsupervised learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 684–689). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1072
Mendeley helps you to discover research relevant for your work.