Unsupervised construction of quasi-comparable corpora and probing for parallel textual data

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The multilingual nature of the world makes translation a crucial requirement today. Parallel dictionaries constructed by humans are a widely-available resource, but they are limited and do not provide enough coverage for good quality translation purposes, due to out-of-vocabulary words and neologisms. This motivates the use of statistical translation systems, which are unfortunately dependent on the quantity and quality of training data. Such systems have a very limited availability especially for some languages and very narrow text domains. Is this research we present our improvements to current quasi-comparable corpora mining methodologies by re-implementing the comparison algorithms, introducing a tuning script and improving performance using GPU acceleration. The experiments are conducted on lectures text domain and bi-data is extracted from web crawl from the WWW. The modifications made a positive impact on the quality and quantity of mined data and on the translation quality as well and used the BLEU, NIST and TER metrics. By defining proper translation parameters to morphologically rich languages we improve the translation quality and draw the conclusions.

Cite

CITATION STYLE

APA

Wołk, K., & Marasek, K. (2017). Unsupervised construction of quasi-comparable corpora and probing for parallel textual data. In Advances in Intelligent Systems and Computing (Vol. 506, pp. 307–320). Springer Verlag. https://doi.org/10.1007/978-3-319-43982-2_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free