DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

1Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce DOCTRACK, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at https://github.com/hint-lab/doctrack.

Cite

CITATION STYLE

APA

Wang, H., Wang, Q., Li, Y., Wang, C., Chu, C., & Wang, R. (2023). DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 5176–5189). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.344

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free