Skip to content

⚡ VLM Efficiency

📷 CVPR2025 · 3 paper notes

📌 Same area in other venues: 📷 CVPR2026 (63) · 🔬 ICLR2026 (18) · 💬 ACL2026 (6) · 🧪 ICML2026 (4) · 🤖 AAAI2026 (5) · 🧠 NeurIPS2025 (8)

🔥 Top topics: Model Compression ×2

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

This work identifies that the sensitivity of vision tokens and language tokens to quantization errors in large VLMs differs by more than tenfold. It proposes MBQ, a post-training quantization method that introduces a gradient-based modality-balancing factor during calibration. Under W3A16 and W4A8 configurations, MBQ improves accuracy by up to 4.4% and 11.6%, respectively, while achieving a 1.4× end-to-end acceleration.

Quantization without Tears

This paper proposes the QwT (Quantization without Tears) method, which compensates for quantization information loss by adding a lightweight linear compensation layer after each block of the quantized network. The parameters of this compensation layer can be obtained via a closed-form solution in under 2 minutes, significantly improving PTQ accuracy across various tasks including vision, language, and multimodality.