Split-and-Rephrase in a Cross-Lingual Manner: a Complete Pipeline

2Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Split-and-rephrase is a challenging task that promotes the transformation of a given complex input sentence into multiple shorter sentences retaining equivalent meaning. This rewriting approach conceptualizes that shorter sentences benefit human readers and improve NLP downstream tasks attending as a preprocessing step. This work presents a complete pipeline capable of performing the split-and-rephrase method in a cross-lingual manner. We trained sequence-to-sequence neural models as from English corpora and applied them to predict the transformations in English and Brazilian Portuguese sentences jointly with BERT's masked language modeling. Contrary to traditional approaches that seek training models with extensive vocabularies, we present a non-trivial way to construct symbolic ones generalized solely by grammatical classes (POS tags) and their respective recurrences, reducing the amount of necessary training data. This pipeline contribution showed competitive results encouraging the expansion of the method to languages other than English.

Cite

CITATION STYLE

APA

Neto, P. B., & Ruiz, E. E. S. (2021). Split-and-Rephrase in a Cross-Lingual Manner: a Complete Pipeline. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 155–164). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_019

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free