In this episode we discuss Egocentric Video Task Translation
by Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani. The paper proposes a more unified approach to video understanding tasks, specifically in the context of wearable cameras. The authors argue that the egocentric perspective of a person presents an interconnected web of tasks, such as object manipulation and navigation, which should be addressed in conjunction rather than in isolation. The proposed EgoTask Translation (EgoT2) model takes multiple task-specific models and learns to translate their outputs for improved performance on all tasks simultaneously. The model demonstrated superior results compared to existing transfer paradigms on four benchmark challenges.
view more