JW300: A wide-coverage parallel corpus for low-resource languages

Željko Agic; Ivan Vulic

Conference ProceedingsOPEN ACCESS

JW300: A wide-coverage parallel corpus for low-resource languages

ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (2020) 3204-3210

DOI: 10.18653/v1/p19-1310

158Citations

171Readers

Abstract

Viable cross-lingual transfer critically depends on the availability of parallel texts. Shortage of such resources imposes a development and evaluation bottleneck in multilingual processing. We introduce JW300, a parallel corpus of over 300 languages with around 100 thousand parallel sentences per language pair on average. In this paper, we present the resource and showcase its utility in experiments with cross-lingual word embedding induction and multi-source part-of-speech projection.

Cite

CITATION STYLE

APA

Agic, Ž., & Vulic, I. (2020). JW300: A wide-coverage parallel corpus for low-resource languages. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 3204–3210). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p19-1310

JW300: A wide-coverage parallel corpus for low-resource languages

Abstract

Cite

Register to see more suggestions