Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., not-another transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
2021: Hong-Yu Zhou, Jiansen Guo, Yinghao Zhang, Lequan Yu, Liansheng Wang, Yizhou Yu
Ranked #1 on Medical Image Segmentation on ACDC
https://arxiv.org/pdf/2109.03201v3.pdf
view more