We study the task of image inpainting, where an image with missing region is recovered with plausible context. Recent approaches based on deep neural networks have exhibited potential for producing elegant detail and are able to take advantage of background information, which gives texture information about missing region in the image. These methods often perform pixel/patch level replacement on the deep feature maps of missing region and therefore enable the generated content to have similar texture as background region. However, this kind of replacement is a local strategy and often performs poorly when the background information is misleading. To this end, in this study, we propose to use a multi-scale image contextual attention learning (MUSICAL) strategy that helps to flexibly handle richer background information while avoid to misuse of it. However, such strategy may not promising in generating context of reasonable style. To address this issue, both of the style loss and the perceptual loss are introduced into the proposed method to achieve the style consistency of the generated image. Furthermore, we have also noticed that replacing some of the down sampling layers in the baseline network with the stride 1 dilated convolution layers is beneficial for producing sharper and fine-detailed results. Experiments on the Paris Street View, Places, and CelebA datasets indicate the superior performance of our approach compares to the state-of-the-arts.
CITATION STYLE
Wang, N., Li, J., Zhang, L., & Du, B. (2019). Musical: Multi-scale image contextual attention learning for inpainting. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3748–3754). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/520
Mendeley helps you to discover research relevant for your work.