We report experiments with multi-modal neural machine translation models that incorporate global visual features in different parts of the encoder and decoder, and use the VGG19 network to extract features for all images. In our experiments, we explore both different strategies to include global image features and also how ensembling different models at inference time impact translations. Our submissions ranked 3rd best for translating from English into French, always improving considerably over an neural machine translation baseline across all language pair evaluated, e.g. an increase of 7.0-9.2 METEOR points.
CITATION STYLE
Calixto, I., Chowdhury, K. D., & Liu, Q. (2017). DCU system report on the WMT 2017 multi-modal machine translation task. In WMT 2017 - 2nd Conference on Machine Translation, Proceedings (pp. 440–444). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4747
Mendeley helps you to discover research relevant for your work.