In generating natural language descriptions for knowledge graph triples, prior works used either small-scale, human-annotated datasets or datasets with limited variety of graph shapes, e.g., those having mostly star graphs. Graph-to-text models trained and evaluated on such datasets are largely not assessed for more realistic large-scale, open-domain settings. We introduce a new dataset, GraphNarrative, to fill this gap. Fine-tuning transformer-based pretrained language models has achieved state-of-the-art performance among graph-to-text models. However, this method suffers from information hallucination-the generated text may contain fabricated facts not present in input graphs. We propose a novel approach that, given a graph-sentence pair in GraphNarrative, trims the sentence to eliminate portions that are not present in the corresponding graph, by utilizing the sentence's dependency parse tree. Our experiment results verify this approach using models trained on GraphNarrative and existing datasets. The dataset, source code, and trained models are released at https://github.com/idirlab/graphnarrator.
CITATION STYLE
Shi, X., Zhu, Z., Zhang, Z., & Li, C. (2023). Hallucination Mitigation in Natural Language Generation from Large-Scale Open-Domain Knowledge Graphs. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 12506–12521). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.770
Mendeley helps you to discover research relevant for your work.