Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Statistical machine translation models are known to benefit from the availability of a domain bilingual lexicon. Bilingual lexicons are traditionally comprised of multiword expressions, either extracted from parallel corpora or manually curated. We claim that “patterns”, comprised of words and higher order categories, generalize better in capturing the syntax and semantics of the domain. In this work, we present an approach to extract such patterns from a domain corpus and curate a high quality bilingual lexicon. We discuss several features of these patterns, that, define the “consensus” between their underlying multiwords. We incorporate the bilingual lexicon in a baseline SMT model and detailed experiments show that the resulting translation model performs much better than the baseline and other similar systems.

Cite

CITATION STYLE

APA

Singh, P., Kulkarni, A., Ojha, H., Kumar, V., & Ramakrishnan, G. (2016). Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9651, pp. 290–303). Springer Verlag. https://doi.org/10.1007/978-3-319-31753-3_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free