We describe a machine learning approach, a Random Forest (RF) classifier, that is used to automatically compile bilingual dictionaries of technical terms from comparable corpora. We evaluate the RF classifier against a popular term alignment method, namely context vectors, and we report an improvement of the translation accuracy. As an application, we use the automatically extracted dictionary in combination with a trained Statistical Machine Translation (SMT) system to more accurately translate unknown terms. The dictionary extraction method described in this paper is freely available.
CITATION STYLE
Kontonatsios, G., Korkontzelos, I., Tsujii, J., & Ananiadou, S. (2014). Using a Random Forest Classifier to Compile Bilingual Dictionaries of Technical Terms from Comparable Corpora. In EACL 2014 - 14th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (pp. 111–116). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/e14-4022
Mendeley helps you to discover research relevant for your work.