🎬 Video Generation¶
💬 ACL2026 · 3 paper notes
- Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity
-
This paper proposes a Local Optimization + Representation Continuity (ReCo) training strategy that optimizes within local windows while constraining smooth transitions of hidden states, achieving 2× training speedup for autoregressive video generation models without sacrificing generation quality.
- OSCBench: Benchmarking Object State Change in Text-to-Video Generation
-
This paper proposes OSCBench — the first benchmark dedicated to evaluating object state change (OSC) capabilities in text-to-video (T2V) models. Built on cooking scenarios with 1,120 prompts covering conventional/novel/compositional scenarios, it reveals that even the strongest T2V model achieves only 0.786 OSC accuracy.
- Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement
-
This paper proposes VideoRepair, the first training-free, model-agnostic text-to-video self-correction framework that detects fine-grained text-video misalignment via MLLM, preserves correct regions, and selectively repairs problematic regions, consistently improving alignment quality across four T2V backbone models on EvalCrafter and T2V-CompBench.