Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification

Aditya Mogadala; Achim Rettinger

Conference Proceedings

Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification

2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (2016) 692-702

DOI: 10.18653/v1/n16-1083

70Citations

151Readers

Get full text

Abstract

In many languages, sparse availability of resources causes numerous challenges for textual analysis tasks. Text classification is one of such standard tasks that is hindered due to limited availability of label information in lowresource languages. Transferring knowledge (i.e. label information) from high-resource to low-resource languages might improve text classification as compared to the other approaches like machine translation. We introduce BRAVE (Bilingual paRAgraph VEctors), a model to learn bilingual distributed representations (i.e. embeddings) of words without word alignments either from sentencealigned parallel or label-aligned non-parallel document corpora to support cross-language text classification. Empirical analysis shows that classification models trained with our bilingual embeddings outperforms other stateof-the-art systems on three different crosslanguage text classification tasks.

Cite

CITATION STYLE

APA

Mogadala, A., & Rettinger, A. (2016). Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016 - Proceedings of the Conference (pp. 692–702). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n16-1083

Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification

Abstract

Cite

Register to see more suggestions