In this episode we discuss Masked Image Modeling with Local Multi-Scale Reconstruction
by Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han. The paper proposes a new self-supervised representation learning approach called Masked Image Modeling (MIM) that achieves outstanding success, but with a huge computational burden and slow learning process. To address this, the paper proposes a design that applies MIM to multiple local layers, including lower and upper layers, to explicitly guide them. The approach also facilitates multi-scale semantic understanding by reconstructing fine and coarse-scale supervision signals. This approach achieves comparable or better performance on classification, detection, and segmentation tasks than existing MIM models, with significantly less pre-training burden. Code is available with both MindSpore and PyTorch.
view more