GO FIGURE: A Meta Evaluation of Factuality in Summarization

Saadia Gabriel; Asli Celikyilmaz; Rahul Jha; Yejin Choi; Jianfeng Gao

Conference ProceedingsOPEN ACCESS

GO FIGURE: A Meta Evaluation of Factuality in Summarization

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021) 478-487

DOI: 10.18653/v1/2021.findings-acl.42

44Citations

93Readers

Abstract

While neural language models can generate text with remarkable fluency and coherence, controlling for factual correctness in generation remains an open research question. This major discrepancy between the surface-level fluency and the content-level correctness of neural generation has motivated a new line of research that seeks automatic metrics for evaluating the factuality of machine text. In this paper, we introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics. We propose five necessary conditions to evaluate factuality metrics on diagnostic factuality data across three different summarization tasks. Our benchmark analysis on ten factuality metrics reveals that our metaevaluation framework provides a robust and efficient evaluation that is extensible to multiple types of factual consistency and standard generation metrics, including QA metrics. It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.

Cite

CITATION STYLE

APA

Gabriel, S., Celikyilmaz, A., Jha, R., Choi, Y., & Gao, J. (2021). GO FIGURE: A Meta Evaluation of Factuality in Summarization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (pp. 478–487). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-acl.42

GO FIGURE: A Meta Evaluation of Factuality in Summarization

Abstract

Cite

Register to see more suggestions