XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages

5Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Multiple critical scenarios need automated generation of descriptive text in low-resource (LR) languages given English fact triples. For example, Wikipedia text generation given English Infoboxes, automated generation of non-English product descriptions using English product attributes, etc. Previous work on fact-to-text (F2T) generation has focused on English only. Building an effective cross-lingual F2T (XF2T) system requires alignment between English structured facts and LR sentences. Either we need to manually obtain such alignment data at a large scale, which is expensive, or build automated models for cross-lingual alignment. To the best of our knowledge, there has been no previous attempt on automated cross-lingual alignment or generation for LR languages. We propose two unsupervised methods for cross-lingual alignment. We contribute XAlign, an XF2T dataset with 0.45M pairs across 8 languages, of which 5402 pairs have been manually annotated. We also train strong baseline XF2T generation models on XAlign. We make our code and dataset publicly available1, and hope that this will help advance further research in this critical area.

Author supplied keywords

Cite

CITATION STYLE

APA

Abhishek, T., Sagare, S., Singh, B., Sharma, A., Gupta, M., & Varma, V. (2022). XAlign: Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages. In WWW 2022 - Companion Proceedings of the Web Conference 2022 (pp. 171–175). Association for Computing Machinery, Inc. https://doi.org/10.1145/3487553.3524265

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free