Official implementation of EmoTaG (CVPR 2026). HDTF Multi-identity pre-training 70 videos (one identity each, 90–240 s) sampled from HDTF, to learn the identity-agnostic audio-motion prior. MEAD ...
TL;DR: Text Prompt -> LLM as a Request Parser -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image. [2023.8] Our repo has been largely improved: now we have a repo ...