Abstract
With massively parallel corpora of hun-dreds or thousands of translations of the same text, it is possible to automatically perform typological studies of language structure using very large language sam-ples. We investigate the domain of word order using multilingual word alignment and high-precision annotation transfer in a corpus with 1144 translations in 986 lan-guages of the New Testament. Results are encouraging, with 86% to 96% agreement between our method and the manually cre-ated WALS database for a range of differ-ent word order features. Beyond reproduc-ing the categorical data in WALS and ex-tending it to hundreds of other languages, we also provide quantitative data for the relative frequencies of different word or-ders, and show the usefulness of this for language comparison. Our method has applications for basic research in linguis-tic typology, as well as for NLP tasks like transfer learning for dependency pars-ing, which has been shown to benefit from word order information.
Cite
CITATION STYLE
Östling, R. (2015). Word order typology through multilingual word alignment. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 205–211). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2034
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.