In this episode we discuss Seeing What You Said: Talking Face Generation Guided
by Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li. The paper discusses the generation of talking faces, also known as speech-to-lip generation, which reconstructs facial motions concerning lips based on speech input. The authors propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing incorrect generation results. They also introduce contrastive learning and a transformer in their approach to enhance lip-speech synchronization and audio-video encoding. The proposal achieved superior performance in reading intelligibility and lip-speech synchronization compared to other state-of-the-art methods.
view more