Comparison to Contemporary Baselines
Direct comparisons with state-of-the-art methods demonstrating AvatarForcing's superior lip synchronization quality and temporal consistency across various challenging scenarios.
Demo Gallery
Phoneme-Level Lip Sync
Tight close-ups highlighting frame-accurate mouth articulation and viseme timing aligned to audio across diverse speakers.
Scene Variety & Global Coherence
Richer contexts (street, studio, home) with long-range identity stability and co-speech motion consistency.
Minute-Level Long Video Generation
Extended talking avatar generation maintaining temporal consistency and high quality over minute-long sequences with our sliding-window denoising approach.
Ablation Studies
Ablations diagnose what drives stable long-form streaming and real-time efficiency in AvatarForcing, and how allocating compute between local-future look-ahead (L) and per-step refinement (N) shapes the quality–latency trade-off.
Qualitative Ablations (Long Rollouts)
We compare long-rollout stability across anchor and alignment ablations: removing the style or temporal anchor increases drift/flicker (Tab. 4 / Fig. 6), removing anchor-audio zero padding causes mouth jitter/artifacts, and removing RoPE re-indexing leads to gradual appearance/color drift (Tab. 6 / Fig. 8). Against one-step baselines (Self-Forcing (1-step), Causal ODE), AvatarForcing preserves sharper motion with less drift/blur (Tab. 5 / Fig. 7).
L/N Decoupling Sweep
The sweep in Fig. 4 / Tab. 3 fixes the dual-anchor design and varies window length L and denoising steps N to track how latency scales with stability: larger L tends to improve long-horizon consistency more than simply stacking denoising passes, but excessively large L can over-smooth motion, which the merged comparison below captures.
BibTeX
@misc{cui2026avatarforcingonestepstreamingtalking,
title={AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising},
author={Liyuan Cui and Wentao Hu and Wenyuan Zhang and Zesong Yang and Fan Shi and Xiaoqiang Liu},
year={2026},
eprint={2603.14331},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.14331},
}