In this episode we discuss Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
by Xianghui Xie, Bharat Lal Bhatnagar, Gerard Pons-Moll. The paper proposes a method to track the 3D human and object, their contacts, and their relative translation across frames from a single RGB camera while being robust to heavy occlusions. The authors improved on the previous methods that assumed a fixed depth and suffered from significant drops in performance when the object was occluded. The proposed method uses a neural field reconstruction conditioned on per-frame SMPL model estimates and a transformer-based neural network that leverages neighboring frames to make predictions for the occluded frames, achieving significantly better performance than state-of-the-art methods.
view more