ICCV2025 LLM Reasoning AI paper notes paper summaries Reasoning LLM Multimodal/VLM Alignment/RLHF Video Generation

💡 LLM Reasoning¶

📹 ICCV2025 · 3 paper notes

🔥 Top topics: Reasoning ×2

CoRVid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning: This paper proposes Corvid, which comprehensively enhances the chain-of-thought reasoning capability of MLLMs through a hybrid visual encoder, a GateMixer connector, a high-quality CoT dataset, and a test-time self-verification strategy, surpassing open-source models of comparable parameter scale on mathematical reasoning and scientific problem solving.
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization: This paper proposes UV-CoT, a framework that enables image-level chain-of-thought (Visual CoT) reasoning without any manual bounding box annotations, by automatically constructing preference data and introducing an improved Score-DPO loss. UV-CoT surpasses the supervised Visual-CoT method on 6 benchmarks.
Video-T1: Test-Time Scaling for Video Generation: This paper transfers the test-time scaling (TTS) paradigm from LLMs to video generation by reformulating TTS as a search problem over trajectories from Gaussian noise space to the target video distribution. It proposes the Tree-of-Frames (ToF) search algorithm for efficient inference-time compute scaling, achieving consistent quality improvements across diverse video generation models on VBench.