Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Paper • 2411.16331 • Published • 8
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
🍴 Fork notice — this is a downstream fork of smthemex/ComfyUI_Sonic. Default branch
fix/torchaudio-soundfilereplacestorchaudio.savewithsoundfile.writeto sidestep thetorchcodecABI mismatch on torch 2.11+cu130. See IMPROVEMENTS.md for the full list of downstream changes and rationale.
Sonic is a method about ' Shifting Focus to Global Audio Perception in Portrait Animation',you can use it in comfyUI
git clone https://github.com/smthemex/ComfyUI_Sonic.git
pip install -r requirements.txt
-- ComfyUI/models/sonic/
|-- audio2bucket.pth
|-- audio2token.pth
|-- unet.pth
|-- yoloface_v5m.pt
|-- whisper-tiny/
|--config.json
|--model.safetensors
|--preprocessor_config.json
|-- RIFE/
|--flownet.pkl
-- ComfyUI/models/checkpoints
├── svd_xt.safetensors or svd_xt_1_1.safetensors
@article{ji2024sonic,
title={Sonic: Shifting Focus to Global Audio Perception in Portrait Animation},
author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
journal={arXiv preprint arXiv:2411.16331},
year={2024}
}
@article{ji2024realtalk,
title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
journal={arXiv preprint arXiv:2406.18284},
year={2024}
}