PAWLS: PDF Annotation with Labels and Structure

10Citations
Citations of this article
61Readers
Mendeley users who have this article in their library.

Abstract

Adobe’s Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information contained within PDF documents for training models or data analysis, because annotating these documents is difficult. In this paper, we present PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format. PAWLS is particularly suited for mixed-mode annotation and scenarios in which annotators require extended context to annotate accurately. PAWLS supports span-based textual annotation, N-ary relations and freeform, non-textual bounding boxes, all of which can be exported in convenient formats for training multi-modal machine learning models. A PAWLS demo server is available at https://pawls.apps.allenai.org/ and the source code can be accessed at https://github.com/allenai/pawls.

Cite

CITATION STYLE

APA

Neumann, M., Shen, Z., & Skjonsberg, S. (2021). PAWLS: PDF Annotation with Labels and Structure. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the System Demonstrations (pp. 258–264). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-demo.31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free