🎬 Video Generation¶

💬 ACL2025 · 2 paper notes

🔥 Top topics: Video Generation ×2

Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval: Q2E proposes a zero-shot query-to-event decomposition method. It leverages the parameterized world knowledge of LLMs and VLMs to decompose simple queries into prequel, current, and sequel events. Combining these with dense video descriptions and speech transcriptions, it achieves SOTA multilingual text-to-video retrieval performance through inverse entropy fusion ranking.
VidCapBench: A Comprehensive Benchmark of Video Captioning for Controllable Text-to-Video Generation: This work proposes VidCapBench, the first video captioning evaluation benchmark designed specifically for controllable text-to-video (T2V) generation. It evaluates caption quality across four dimensions: aesthetics, content, motion, and physical laws. Comprising 643 videos and 10,644 QA pairs, experiments demonstrate that VidCapBench scores are highly positively correlated with T2V generation quality.