GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

29Citations
Citations of this article
78Readers
Mendeley users who have this article in their library.

Abstract

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

Cite

CITATION STYLE

APA

Jin, Z., Guo, Q., Qiu, X., & Zhang, Z. (2020). GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 2398–2409). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.217

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free