Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora

15Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.

Abstract

We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.

Cite

CITATION STYLE

APA

Kontonatsios, G., Korkontzelos, I., Tsujii, J., & Ananiadou, S. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. In EACL 2014 - 14th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 111–116). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/e14-4022

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free