Abstract
We present a language-pair independent terminology extraction module that is based on a sub-sentential alignment system that links linguistically motivated phrases in parallel texts. Statistical filters are applied on the bilingual list of candidate terms that is extracted from the alignment output. We compare the performance of both the alignment and terminology extraction module for three different language pairs (French-English, French-Italian and French-Dutch) and highlight language-pair specific problems (e.g. different compounding strategy in French and Dutch). Comparisons with standard terminology extraction programs show an improvement of up to 20% for bilingual terminology extraction and competitive results (85% to 90% accuracy) for monolingual terminology extraction, and reveal that the linguistically based alignment module is particularly well suited for the extraction of complex multiword terms. © 2009 Association for Computational Linguistics.
Cite
CITATION STYLE
Lefever, E., Macken, L., & Hoste, V. (2009). Language-independent bilingual terminology extraction from a multilingual parallel corpus. In EACL 2009 - 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings (pp. 496–504). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1609067.1609122
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.