Long document classification from local word glimpses via recurrent attention learning

Jun He; Liqun Wang; Liu Liu; Jiao Feng; Hao Wu

Journal ArticleOPEN ACCESS

Long document classification from local word glimpses via recurrent attention learning

IEEE Access (2019) 7 40707-40718

DOI: 10.1109/ACCESS.2019.2907992

33Citations

21Readers

Abstract

Document classification requires to extract high-level features from low-level word vectors. Typically, feature extraction by deep neural networks makes use of all words in a document, which cannot scale well for a long document. In this paper, we propose to tackle the long document classification task by incorporating the recurrent attention learning framework, which can produce the discriminative features with significantly less words. Specifically, the core work is to train a recurrent neural network (RNN)-based controller, which can focus its attention on the discriminative parts. Then, the glimpsed feature is extracted by a typical short text level convolutional neural network (CNN) from the focused group of words. The controller locates its attention according to the context information, which consists of the coarse representation of the original document and the memorized glimpsed features. By glimpsing a few groups, the document can be classified by aggregating these glimpsed features and the coarse representation. For our collected 11-class 10 000-word arXiv paper data set, the proposed method outperforms two subsampled deep CNN baseline models by a large margin given much less observed words.

Author supplied keywords

Cite

CITATION STYLE

APA

He, J., Wang, L., Liu, L., Feng, J., & Wu, H. (2019). Long document classification from local word glimpses via recurrent attention learning. IEEE Access, 7, 40707–40718. https://doi.org/10.1109/ACCESS.2019.2907992

Long document classification from local word glimpses via recurrent attention learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions