Neural CRF model for sentence alignment in text simplification

Chao Jiang; Mounica Maddela; Wuwei Lan; Yang Zhong; Wei Xu

Conference ProceedingsOPEN ACCESS

Neural CRF model for sentence alignment in text simplification

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2020) 7943-7960

DOI: 10.18653/v1/2020.acl-main.709

122Citations

161Readers

Abstract

The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, NEWSELA-AUTO and WIKI-AUTO, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

Cite

CITATION STYLE

APA

Jiang, C., Maddela, M., Lan, W., Zhong, Y., & Xu, W. (2020). Neural CRF model for sentence alignment in text simplification. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 7943–7960). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.709

Neural CRF model for sentence alignment in text simplification

Abstract

Cite

Register to see more suggestions