Tuned and GPU-accelerated parallel data mining from comparable corpora

Krzysztof Wołk; Krzysztof Marasek

Conference Proceedings

Tuned and GPU-accelerated parallel data mining from comparable corpora

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9302 32-40

DOI: 10.1007/978-3-319-24033-6_4

6Citations

9Readers

Get full text

Abstract

The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such has a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to Yalign’s mining methodology by reimplementing the comparison algorithm, introducing a tuning scripts and by improving performance using GPU computing acceleration. The experiments are conducted on various text domains and bi-data is extracted from the Wikipedia dumps.

Author supplied keywords

Cite

CITATION STYLE

APA

Wołk, K., & Marasek, K. (2015). Tuned and GPU-accelerated parallel data mining from comparable corpora. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 32–40). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_4

Tuned and GPU-accelerated parallel data mining from comparable corpora

Abstract

Author supplied keywords

Cite

Register to see more suggestions