Previous studies on Natural Language Generation (NLG) from structured data have primarily focused on surface-level descriptions of record sequences. However, for complex structured data, e.g., multi-row tables, it is often desirable for an NLG system to describe interesting facts from logical inferences across records. If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations. In this work, we formulate high-fidelity NLG as generation from logical forms in order to obtain controllable and faithful generations. We present a new large-scale dataset, LOGIC2TEXT, with 10,753 descriptions involving common logic types paired with the underlying logical forms. The logical forms show diversified graph structure of free schema, which pose great challenges on the model’s ability to understand the semantics. We experiment on (1) Fully-supervised training with the full datasets, and (2) Few-shot setting, provided with hundreds of paired examples; We compare several popular generation models and analyze their performances. We hope our dataset can encourage research towards building an advanced NLG system capable of natural, faithful, and human-like generation. The dataset and code is available at https://github.com/czyssrs/Logic2Text.
CITATION STYLE
Chen, Z., Chen, W., Zha, H., Zhou, X., Zhang, Y., Sundaresan, S., & Wang, W. Y. (2020). Logic2Text: High-fidelity natural language generation from logical forms. In Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020 (pp. 2096–2111). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.findings-emnlp.190
Mendeley helps you to discover research relevant for your work.