Text classification to a hierarchical taxonomy of topics is a common and practical problem. Traditional approaches simply use bag-of-words and have achieved good results. However, when there are a lot of labels with different topical granularities, bag-of-words representation may not be enough. Deep learning models have been proven to be effective to automatically learn different levels of representations for image data. It is interesting to study what is the best way to represent texts. In this paper, we propose a graph-CNN based deep learning model to first convert texts to graph-of-words, and then use graph convolution operations to convolve the word graph. Graph-of-words representation of texts has the advantage of capturing non-consecutive and long-distance semantics. CNN models have the advantage of learning different level of semantics. To further leverage the hierarchy of labels, we regularize the deep architecture with the dependency among labels. Our results on both RCV1 and NYTimes datasets show that we can significantly improve large-scale hierarchical text classification over traditional hierarchical text classification and existing deep models.
CITATION STYLE
Peng, H., Li, J., He, Y., Liu, Y., Bao, M., Wang, L., … Yang, Q. (2018). Large-scale hierarchical text classification with recursively regularized deep graph-CNN. In The Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018 (pp. 1063–1072). Association for Computing Machinery, Inc. https://doi.org/10.1145/3178876.3186005
Mendeley helps you to discover research relevant for your work.