The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such has a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to Yalign’s mining methodology by reimplementing the comparison algorithm, introducing a tuning scripts and by improving performance using GPU computing acceleration. The experiments are conducted on various text domains and bi-data is extracted from the Wikipedia dumps.
CITATION STYLE
Wołk, K., & Marasek, K. (2015). Tuned and GPU-accelerated parallel data mining from comparable corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 32–40). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_4
Mendeley helps you to discover research relevant for your work.