GVdoc: Graph-based Visual Document Classification

0Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multimodality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set.

Cite

CITATION STYLE

APA

Mohbat, F., Zaki, M. J., Finegan-Dollak, C., & Verma, A. (2023). GVdoc: Graph-based Visual Document Classification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5342–5357). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.329

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free