In this episode we discuss Query-Dependent Video Representation
by Authors: WonJun Moon, Sangeek Hyun, SangUk Park, Dongchan Park, and Jae-Pil Heo.
Affiliation:
- WonJun Moon, Sangeek Hyun, and Jae-Pil Heo: Sungkyunkwan University.
- SangUk Park and Dongchan Park: Pyler.. The paper presents Query-Dependent DETR (QD-DETR), a detection transformer that is tailored for video moment retrieval and highlight detection (MR/HD). The authors identify a key issue with existing transformer-based models, which is their failure to fully exploit the information of a given query. To address this issue, QD-DETR introduces cross-attention layers to explicitly inject query context into video representation and trains the model on negative video-query pairs to encourage precise accordance between query-video pairs. QD-DETR outperforms state-of-the-art methods on several datasets.
view more