Abstract
Numerous studies have demonstrated that the neural network model can achieve satisfactory performance in various natural language processing (NLP) tasks. In recent years, document classification is one of the NLP tasks that has gain considerable attention from researchers. For NLP tasks, convolutional neural network (CNN), recurrent neural network (RNN) and attention mechanism can be used. In this work, it is assumed that a document can be divided into two levels, word level and sentence level. In this paper, an effective and novel model called C-HAN (Convolutional Neural Network-based and Hierarchical Attention Network with RNN as basic units-based model) is proposed for document classification by combining the advantages of CNN, RNN and attention model. The CNN is used to extract the abstract relations between different words that are then fed into an attention based bidirectional long short-term memory recurrent neural network (Bi-LSTM) to obtain the high-level abstract representation of sentences. The representation of a document consists of sentences is obtained by using another attention based Bi-LSTM. Lastly, the classification ability of the proposed C-HAN model is evaluated on two datasets. The experimental results demonstrate that the C-HAN model outperforms previous deep learning methods and achieves the state-of-art performance.
Author supplied keywords
Cite
CITATION STYLE
Cheng, Y., Ye, Z., Wang, M., & Zhang, Q. (2019). Document classification based on convolutional neural network and hierarchical attention network. Neural Network World, 29(2), 83–98. https://doi.org/10.14311/NNW.2019.29.007
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.