arxiv preprint - A Generative Approach for Wikipedia-Scale Visual Entity Recognition
In this episode, we discuss A Generative Approach for Wikipedia-Scale Visual Entity Recognition by Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid. The paper introduces a new Generative Entity Recognition (GER) framework for visual entity recognition, aimed at associating images with corresponding entities on Wikipedia, surpassing the typical dual-encoder and captioning model methods. GER functions by decoding a unique "code" linked to an entity from the image, facilitating effective identification. The authors' tests show that GER outperforms existing methods according to the OVEN benchmark, advancing the capabilities of web-scale image-based entity recognition.
Create your
podcast in
minutes
It is Free