Iterative Few-shot Semantic Segmentation from Image Label Text

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Few-shot semantic segmentation aims to learn to segment unseen class objects with the guidance of only a few support images. Most previous methods rely on the pixel-level label of support images. In this paper, we focus on a more challenging setting, in which only the image-level labels are available. We propose a general framework to firstly generate coarse masks from image label text with the help of the powerful vision-language model CLIP, and then refine the mask predictions of support and query images iteratively and mutually. During the refinement, we design an Iterative Mutual Refinement (IMR) module to adapt to the varying quality of the coarse support mask. Extensive experiments on PASCAL-5i and COCO-20i datasets demonstrate that our method not only outperforms the state-of-the-art weakly supervised approaches by a significant margin, but also achieves comparable or better results to recent supervised methods. Moreover, our method owns an excellent generalization ability for the images in the wild and uncommon classes. Code will be available at https://github.com/Whileherham/IMR-HSNet.

Cite

CITATION STYLE

APA

Wang, H., Liu, L., Zhang, W., Zhang, J., Gan, Z., Wang, Y., … Wang, H. (2022). Iterative Few-shot Semantic Segmentation from Image Label Text. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1385–1392). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2022/193

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free