Skip to content

💡 LLM Reasoning

📹 ICCV2025 · 3 paper notes

CoRVid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

This paper proposes Corvid, which comprehensively enhances the chain-of-thought reasoning capability of MLLMs through a hybrid visual encoder, a GateMixer connector, a high-quality CoT dataset, and a test-time self-verification strategy, surpassing open-source models of comparable parameter scale on mathematical reasoning and scientific problem solving.

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

This paper proposes UV-CoT, a framework that enables image-level chain-of-thought (Visual CoT) reasoning without any manual bounding box annotations, by automatically constructing preference data and introducing an improved Score-DPO loss. UV-CoT surpasses the supervised Visual-CoT method on 6 benchmarks.

Video-T1: Test-Time Scaling for Video Generation

This paper transfers the test-time scaling (TTS) paradigm from LLMs to video generation by reformulating TTS as a search problem over trajectories from Gaussian noise space to the target video distribution. It proposes the Tree-of-Frames (ToF) search algorithm for efficient inference-time compute scaling, achieving consistent quality improvements across diverse video generation models on VBench.