GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Zhijing Jin; Qipeng Guo; Xipeng Qiu; Zheng Zhang

Conference ProceedingsOPEN ACCESS

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (2020) 2398-2409

DOI: 10.18653/v1/2020.coling-main.217

35Citations

81Readers

Abstract

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.

Cite

CITATION STYLE

APA

Jin, Z., Guo, Q., Qiu, X., & Zhang, Z. (2020). GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 2398–2409). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.217

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Abstract

Cite

Register to see more suggestions