Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets

Pankaj Singh; Ashish Kulkarni; Himanshu Ojha; Vishwajeet Kumar; Ganesh Ramakrishnan

Conference Proceedings

Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9651 290-303

DOI: 10.1007/978-3-319-31753-3_24

1Citations

6Readers

Get full text

Abstract

Statistical machine translation models are known to benefit from the availability of a domain bilingual lexicon. Bilingual lexicons are traditionally comprised of multiword expressions, either extracted from parallel corpora or manually curated. We claim that “patterns”, comprised of words and higher order categories, generalize better in capturing the syntax and semantics of the domain. In this work, we present an approach to extract such patterns from a domain corpus and curate a high quality bilingual lexicon. We discuss several features of these patterns, that, define the “consensus” between their underlying multiwords. We incorporate the bilingual lexicon in a baseline SMT model and detailed experiments show that the resulting translation model performs much better than the baseline and other similar systems.

Author supplied keywords

Cite

CITATION STYLE

APA

Singh, P., Kulkarni, A., Ojha, H., Kumar, V., & Ramakrishnan, G. (2016). Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9651, pp. 290–303). Springer Verlag. https://doi.org/10.1007/978-3-319-31753-3_24

Building compact lexicons for cross-domain SMT by mining near-optimal pattern sets

Abstract

Author supplied keywords

Cite

Register to see more suggestions