Multi-Dimensional Evaluation of Text Summarization with In-Context Learning

5Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Evaluation of natural language generation (NLG) is complex and multi-dimensional. Generated text can be evaluated for fluency, coherence, factuality, or any other dimensions of interest. Most frameworks that perform such multi-dimensional evaluation require training on large manually or synthetically generated datasets. In this paper, we study the efficacy of large language models as multi-dimensional evaluators using in-context learning, obviating the need for large training datasets. Our experiments show that in-context learning-based evaluators are competitive with learned evaluation frameworks for the task of text summarization, establishing state-of-the-art on dimensions such as relevance and factual consistency. We then analyze the effects of factors such as the selection and number of in-context examples on performance. Finally, we study the efficacy of in-context learning-based evaluators in evaluating zero-shot summaries written by large language models such as GPT-3. Our code is available at https://github.com/JainSameer06/ICE.

Cite

CITATION STYLE

APA

Jain, S., Keshava, V., Sathyendra, S. M., Fernandes, P., Liu, P., Neubig, G., & Zhou, C. (2023). Multi-Dimensional Evaluation of Text Summarization with In-Context Learning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 8487–8495). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.537

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free