A Statistical Machine Translation (SMT) system is always trained using large parallel corpus to produce effective translation. Not only is the corpus scarce, it also involves a lot of manual labor and cost. Parallel corpus can be prepared by employing comparable corpora where a pair of corpora is in two different languages pointing to the same domain. In the present work, we try to build a parallel corpus for French-English language pair from a given comparable corpus. The data and the problem set are provided as part of the shared task organized by BUCC 2017. We have proposed a system that first translates the sentences by heavily relying on Moses and then group the sentences based on sentence length similarity. Finally, the one to one sentence selection was done based on Cosine Similarity algorithm.
CITATION STYLE
Mahata, S. K., Das, D., & Bandyopadhyay, S. (2017). BUCC2017: A hybrid approach for identifying parallel sentences in comparable corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 56–59). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-2511
Mendeley helps you to discover research relevant for your work.