Skip to content

⚡ LLM Efficiency

📷 CVPR2026 · 4 paper notes

GeoCodeBench: Benchmarking PhD-Level Coding in 3D Geometric Computer Vision

The first PhD-level code generation benchmark for 3D geometric computer vision, GeoCodeBench, comprising 100 function completion tasks curated from top-venue 2025 papers and codebases, with automated diverse unit tests. The strongest model GPT-5 achieves only 36.6% pass rate, revealing a significant gap in LLM scientific-level 3D code implementation.

CHEEM: Continual Learning by Reuse, New, Adapt and Skip -- A Hierarchical Exploration-Exploitation Approach

Proposes the CHEEM framework that leverages hierarchical exploration-exploitation (HEE) NAS to automatically learn task-aware dynamic ViT backbones—selecting Reuse/New/Adapt/Skip operations at each layer—significantly outperforming prompt-based methods on MTIL and VDD continual learning benchmarks, approaching the full fine-tuning upper bound.

SparVAR: Exploring Sparsity in Visual Autoregressive Modeling for Training-Free Acceleration

Systematically analyzes attention activation patterns in VAR models, revealing three sparsity properties (attention sinks, cross-scale similarity, spatial locality), and proposes SparVAR, a training-free acceleration framework with two plug-and-play modules—Cross-Scale Self-Similar Sparse Attention (CS⁴A) and Cross-Scale Local Sparse Attention (CSLA)—achieving sub-second generation for 8B models at 1024×1024 (1.57× speedup) with virtually no loss in high-frequency details.

StoryTailor: A Zero-Shot Pipeline for Action-Rich Multi-Subject Visual Narratives

Proposes StoryTailor, a zero-shot visual narrative generation pipeline that uses Gaussian-Centered Attention (GCA) to mitigate subject overlap and background leakage, Action-Boost SVR (AB-SVR) to amplify action semantics, and Selective Forgetting Cache (SFC) to maintain cross-frame background continuity, achieving multi-subject, action-rich visual narrative generation on a single RTX 4090 with 10–15% CLIP-T improvement over baselines.