We present a French to English translation system for Wikipedia biography articles. We use training data from out-of-domain corpora and adapt the system for biographies. We propose two forms of domain adaptation. The first biases the system towards words likely in biographies and encourages repetition of words across the document. Since biographies in Wikipedia follow a regular structure, our second model exploits this structure as a sequence of topic segments, where each segment discusses a narrower subtopic of the biography domain. In this structured model, the system is encouraged to use words likely in the current segment's topic rather than in biographies as a whole. We implement both systems using cache-based translation techniques. We show that a system trained on Europarl and news can be adapted for biographies with 0.5 BLEU score improvement using our models. Further the structure-Aware model outperforms the system which treats the entire document as a single segment. © 2014 Association for Computational Linguistics.
CITATION STYLE
Louis, A., & Webber, B. (2014). Structured and unstructured cache models for SMT domain adaptation. In 14th Conference of the European Chapter of the Association for Computational Linguistics 2014, EACL 2014 (pp. 155–163). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/e14-1017
Mendeley helps you to discover research relevant for your work.