Translating Headers of Tabular Data: A Pilot Study of Schema Translation

1Citations
Citations of this article
50Readers
Mendeley users who have this article in their library.

Abstract

Schema translation is the task of automatically translating headers of tabular data from one language to another. High-quality schema translation plays an important role in crosslingual table searching, understanding and analysis. Despite its importance, schema translation is not well studied in the community, and state-of-the-art neural machine translation models cannot work well on this task because of two intrinsic differences between plain text and tabular data: morphological difference and context difference. To facilitate the research study, we construct the first parallel dataset for schema translation, which consists of 3,158 tables with 11,979 headers written in 6 different languages, including English, Chinese, French, German, Spanish, and Japanese. Also, we propose the first schema translation model called CAST, which is a header-to-header neural machine translation model augmented with schema context. Specifically, we model a target header and its context as a directed graph to represent their entity types and relations. Then CAST encodes the graph with a relational-aware transformer and uses another transformer to decode the header in the target language. Experiments on our dataset demonstrate that CAST significantly outperforms state-of-the-art neural machine translation models. Our dataset will be released at https://github.com/microsoft/ContextualSP.

Cite

CITATION STYLE

APA

Zhu, K., Gao, Y., Guo, J., & Lou, J. G. (2021). Translating Headers of Tabular Data: A Pilot Study of Schema Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 56–66). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free