Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

Lukas Galke; Ansgar Scherp

Conference ProceedingsOPEN ACCESS

Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2022) 1 4038-4051

DOI: 10.18653/v1/2022.acl-long.279

28Citations

85Readers

Abstract

Graph neural networks have triggered a resurgence of graph-based text classification methods, defining today's state of the art. We show that a wide multi-layer perceptron (MLP) using a Bag-of-Words (BoW) outperforms the recent graph-based models TextGCN and HeteGCN in an inductive text classification setting and is comparable with HyperGAT. Moreover, we fine-tune a sequence-based BERT and a lightweight DistilBERT model, which both outperform all state-of-the-art models. These results question the importance of synthetic graphs used in modern text classifiers. In terms of efficiency, DistilBERT is still twice as large as our BoW-based wide MLP, while graph-based models like TextGCN require setting up an O(N2) graph, where N is the vocabulary plus corpus size. Finally, since Transformers need to compute O(L2) attention weights with sequence length L, the MLP models show higher training and inference speeds on datasets with long sequences.

Cite

CITATION STYLE

APA

Galke, L., & Scherp, A. (2022). Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 4038–4051). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-long.279

Bag-of-Words vs. Graph vs. Sequence in Text Classification: Questioning the Necessity of Text-Graphs and the Surprising Strength of a Wide MLP

Abstract

Cite

Register to see more suggestions