High-Resolution Swin Transformer for Automatic Medical Image Segmentation

27Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

The resolution of feature maps is a critical factor for accurate medical image segmentation. Most of the existing Transformer-based networks for medical image segmentation adopt a U-Net-like architecture, which contains an encoder that converts the high-resolution input image into low-resolution feature maps using a sequence of Transformer blocks and a decoder that gradually generates high-resolution representations from low-resolution feature maps. However, the procedure of recovering high-resolution representations from low-resolution representations may harm the spatial precision of the generated segmentation masks. Unlike previous studies, in this study, we utilized the high-resolution network (HRNet) design style by replacing the convolutional layers with Transformer blocks, continuously exchanging feature map information with different resolutions generated by the Transformer blocks. The proposed Transformer-based network is named the high-resolution Swin Transformer network (HRSTNet). Extensive experiments demonstrated that the HRSTNet can achieve performance comparable with that of the state-of-the-art Transformer-based U-Net-like architecture on the 2021 Brain Tumor Segmentation dataset, the Medical Segmentation Decathlon’s liver dataset, and the BTCV multi-organ segmentation dataset.

Cite

CITATION STYLE

APA

Wei, C., Ren, S., Guo, K., Hu, H., & Liang, J. (2023). High-Resolution Swin Transformer for Automatic Medical Image Segmentation. Sensors, 23(7). https://doi.org/10.3390/s23073420

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free