In this paper, we propose a Multi-task Learning Approach for Image Captioning (MLAIC), motivated by the fact that humans have no difficulty performing such task because they have the capabilities of multiple domains. Specifically, MLAIC consists of three key components: (i) A multi-object classification model that learns rich category-aware image representations using a CNN image encoder; (ii) A syntax generation model that learns better syntax-aware LSTM based decoder; (iii) An image captioning model that generates image descriptions in text, sharing its CNN encoder and LSTM decoder with the object classification task and the syntax generation task, respectively. In particular, the image captioning model can benefit from the additional object categorization and syntax knowledge. The experimental results on MS-COCO dataset demonstrate that our model achieves impressive results compared to other strong competitors.
CITATION STYLE
Zhao, W., Wang, B., Ye, J., Yang, M., Zhao, Z., Luo, R., & Qiao, Y. (2018). A multi-task learning approach for image captioning. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 1205–1211). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/168
Mendeley helps you to discover research relevant for your work.