In this episode we discuss Learning Customized Visual Models with Retrieval-Augmented Knowledge
by Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li. The paper proposes a framework called REACT (REtrieval-Augmented CusTomization) to build customized visual models for specific domains. Instead of using expensive pre-training, REACT retrieves relevant image-text pairs from a web-scale database as external knowledge and only trains new modularized blocks while freezing original weights. The framework is shown to be effective in various tasks, including zero-shot classification, with up to 5.4% improvement on ImageNet and 3.7% on the ELEVATER benchmark compared to CLIP.
view more