In this episode we discuss ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
by Jeeseung Park, Jin-Woo Park, Jong-Seok Lee. The paper proposes a new method for improving the performance of human-object interaction (HOI) detectors, which are used in scene understanding. The proposed method, called Vision Transformer based Pose-Conditioned Self-Loop Graph (ViPLO), combines a novel feature extraction method with a graph structure that updates human node encoding with local features of human joints. This approach achieves state-of-the-art results on two public benchmarks, with a significant performance gain on the HICO-DET dataset. The source code is also available for public use.
view more