In this episode we discuss Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
by Authors:
- Kun Su*
- Kaizhi Qian
- Eli Shlizerman
- Antonio Torralba
- Chuang Gan
Affiliations:
- Kun Su: University of Washington
- Kaizhi Qian: MIT-IBM Watson AI Lab
- Eli Shlizerman: University of Washington
- Antonio Torralba: MIT
- Chuang Gan: MIT-IBM Watson AI Lab, UMass Amherst. The paper proposes a physics-driven diffusion model to synthesize high-fidelity impact sound for silent video clips. The model incorporates physics priors, including physics parameters estimated from noisy real-world impact sound examples and learned residual parameters interpreting the sound environment via neural networks. The diffusion model combines physics priors and visual information for impact sound synthesis. Experimental results show that the proposed model outperforms existing systems in generating realistic impact sounds while maintaining interpretability and transparency for sound editing.
view more