BUCC2017: A hybrid approach for identifying parallel sentences in comparable corpora

7Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

Abstract

A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.

Cite

CITATION STYLE

APA

Mahata, S. K., Das, D., & Bandyopadhyay, S. (2017). BUCC2017: A hybrid approach for identifying parallel sentences in comparable corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 56–59). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2511

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free