Efficient Classification of Long Documents via State-Space Models

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Transformer-based models have achieved state-of-the-art performance on numerous NLP applications. However, long documents which are prevalent in real-world scenarios cannot be efficiently processed by transformers with the vanilla self-attention module due to their quadratic computation complexity and limited length extrapolation ability. Instead of tackling the computation difficulty for self-attention with sparse or hierarchical structures, in this paper, we investigate the use of State-Space Models (SSMs) for long document classification tasks. We conducted extensive experiments on six long document classification datasets, including binary, multi-class, and multi-label classification, comparing SSMs (with and without pre-training) to self-attention-based models. We also introduce the SSM-pooler model and demonstrate that it achieves comparable performance while being on average 36% more efficient. Additionally our method exhibits higher robustness to the input noise even in the extreme scenario of 40%.

Cite

CITATION STYLE

APA

Lu, P., Wang, S., Rezagholizadeh, M., Liu, B., & Kobyzev, I. (2023). Efficient Classification of Long Documents via State-Space Models. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6559–6565). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.404

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free