Skip to content

🎬 Video Generation

💬 ACL2026 · 3 paper notes

Accelerating Training of Autoregressive Video Generation Models via Local Optimization with Representation Continuity

This paper proposes a Local Optimization + Representation Continuity (ReCo) training strategy that optimizes within local windows while constraining smooth transitions of hidden states, achieving 2× training speedup for autoregressive video generation models without sacrificing generation quality.

OSCBench: Benchmarking Object State Change in Text-to-Video Generation

This paper proposes OSCBench — the first benchmark dedicated to evaluating object state change (OSC) capabilities in text-to-video (T2V) models. Built on cooking scenarios with 1,120 prompts covering conventional/novel/compositional scenarios, it reveals that even the strongest T2V model achieves only 0.786 OSC accuracy.

Self-Correcting Text-to-Video Generation with Misalignment Detection and Localized Refinement

This paper proposes VideoRepair, the first training-free, model-agnostic text-to-video self-correction framework that detects fine-grained text-video misalignment via MLLM, preserves correct regions, and selectively repairs problematic regions, consistently improving alignment quality across four T2V backbone models on EvalCrafter and T2V-CompBench.