FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation

Tanya Goyal; Junyi Jessy Li; Greg Durrett

Conference Proceedings

FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation

EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Demonstrations Session (2022) 351-358

DOI: 10.18653/v1/2022.emnlp-demos.35

8Citations

23Readers

Get full text

Abstract

A growing swath of NLP research is tackling problems related to generating long text, including tasks such as open-ended story generation, summarization, dialogue, and more. However, we currently lack appropriate tools to evaluate these long outputs of generation models: classic automatic metrics such as ROUGE have been shown to perform poorly, and newer learned metrics do not necessarily work well for all tasks and domains of text. Human rating and error analysis remain a crucial component for any evaluation of long text generation. In this paper, we introduce FALTE, a web-based annotation toolkit designed to streamline such evaluations. Our tool allows researchers to collect fine-grained judgments of text quality from crowdworkers using an error taxonomy specific to the downstream task. Using the task interface, annotators can select and assign error labels to text span selections in an incremental paragraph-level annotation workflow. The latter functionality is designed to simplify the document-level task into smaller units and reduce cognitive load on the annotators. Our tool has previously been used to run a large-scale annotation study that evaluates the coherence of long generated summaries, demonstrating its utility.

Cite

CITATION STYLE

APA

Goyal, T., Li, J. J., & Durrett, G. (2022). FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation. In EMNLP 2022 - 2022 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Demonstrations Session (pp. 351–358). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-demos.35

FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation

Abstract

Cite

Register to see more suggestions