DART: Open-Domain Structured Data Record to Text Generation

114Citations
Citations of this article
134Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present DART, an open domain structured DAta-Record-to-Text generation dataset with over 82k instances (DARTs). Data-to-text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures. To this end, we propose a procedure of extracting semantic triples from tables that encodes their structures by exploiting the semantic dependencies among table headers and the table title. Our dataset construction framework effectively merged heterogeneous sources from open domain semantic parsing and spoken dialogue systems by utilizing techniques including tree ontology annotation, question-answer pair to declarative sentence conversion and predicate unification, all with minimum post-editing. We present systematic evaluation on DART as well as new state-of-the-art results on WebNLG 2017 to show that DART (1) poses new challenges to existing data-to-text datasets and (2) facilitates out-of-domain generalization. Our data and code can be found at https://github.com/Yale-LILY/dart.

Cite

CITATION STYLE

APA

Nan, L., Radev, D., Zhang, R., Rau, A., Sivaprasad, A., Hsieh, C., … Rajani, N. F. (2021). DART: Open-Domain Structured Data Record to Text Generation. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 432–447). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.467

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free