Adaptive Text Denoising Network for Image Caption Editing

5Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Image caption editing, which aims at editing the inaccurate descriptions of the images, is an interdisciplinary task of computer vision and natural language processing. As the task requires encoding the image and its corresponding inaccurate caption simultaneously and decoding to generate an accurate image caption, the encoder-decoder framework is widely adopted for image caption editing. However, existing methods mostly focus on the decoder, yet ignore a big challenge on the encoder: the semantic inconsistency between image and caption. To this end, we propose a novel Adaptive Text Denoising Network (ATD-Net) to filter out noises at the word level and improve the model's robustness at sentence level. Specifically, at the word level, we design a cross-attention mechanism called Textual Attention Mechanism (TAM), to differentiate the misdescriptive words. The TAM is designed to encode the inaccurate caption word by word based on the content of both image and caption. At the sentence level, in order to minimize the influence of misdescriptive words on the semantic of an entire caption, we introduce a Bidirectional Encoder to extract the correct semantic representation from the raw caption. The Bidirectional Encoder is able to model the global semantics of the raw caption, which enhances the robustness of the framework. We extensively evaluate our proposals on the MS-COCO image captioning dataset and prove the effectiveness of our method when compared with the state-of-the-arts.

Cite

CITATION STYLE

APA

Yuan, M., Bao, B. K., Tan, Z., & Xu, C. (2023). Adaptive Text Denoising Network for Image Caption Editing. ACM Transactions on Multimedia Computing, Communications and Applications, 19(1). https://doi.org/10.1145/3532627

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free