In semantic image segmentation, multi scale contextual information is collected by probing the features with dilated large convolution filters or spatial pooling operations. Such enlargement of the receptive field promotes a more stable and global consistence segmentation prediction. Dilated convolution can be treated as the combination of a sampling process and a common convolution. For example, a 3 × 3 convolution with a large dilation rate picks 9 positions in a very large window. In this paper we propose a more rational way to sample features from a very large receptive field. Specifically Gaussian kernels are used to accumulate features in each position to produce a more stable representation. We also delve into the difference of up-sampling logits and down-sampling ground truth and provide a theoretical explanation. We demonstrate the effectiveness of Gaussian dilated convolution on the semantic image segmentation datasets of Pascal VOC 2012, Cityscapes and ADE20k. Gaussian dilated convolution performs consistently superior to dilated convolution throughout our experiments, which verifies the effectiveness of this method. Code will be released for reproduction.
CITATION STYLE
Shen, F., & Zeng, G. (2018). Gaussian dilated convolution for semantic image segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11164 LNCS, pp. 324–334). Springer Verlag. https://doi.org/10.1007/978-3-030-00776-8_30
Mendeley helps you to discover research relevant for your work.