In this paper, we provide an in-depth analysis of how to tackle high cardinality categorical features with the quantile. Our proposal outperforms state-of-the-art encoders, including the traditional statistical mean target encoder, when considering the Mean Absolute Error, especially in the presence of long tailed or skewed distributions.
2021: Carlos Mougán, D. Masip, Jordi Nin, O. Pujol
https://arxiv.org/pdf/2105.13783v2.pdf
view more