In this episode we discuss A Bag-of-Prototypes Representation for Dataset-Level Applications by Authors: 1. Weijie Tu 2. Weijian Deng 3. Tom Gedeon 4. Liang Zheng Affiliations: 1. Australian National University 2. Curtin University. The paper proposes a bag-of-prototypes (BoP) dataset representation for measuring the relationship between datasets for two dataset-level tasks: assessing training set suitability and test set difficulty. The BoP representation consists of a codebook of K prototypes clustered from a reference dataset and is used to obtain a K-dimensional histogram for each dataset to be encoded. Without assuming access to dataset labels, the BoP representation provides a detailed characterization of the dataset's semantic distribution and cooperates well with Jensen-Shannon divergence for measuring dataset-to-dataset similarity. The authors demonstrate the superiority of the BoP representation over existing representations on multiple benchmarks.
Create your
podcast in
minutes
It is Free