Speculative Verification: Exploiting Information Gain to Refine Speculative Decoding¶

Conference: ACL 2026 arXiv: 2509.24328 Code: N/A Area: LLM Efficiency Keywords: Speculative Decoding, Information Gain, Inference Acceleration, Companion Model, Dynamic Verification

TL;DR¶

Speculative Verification (SV) introduces a companion model of equal size to the drafter, using draft-companion distribution similarity \(S\) and companion acceptance probability \(A\) to predict target model acceptance probability, dynamically selecting optimal verification length to maximize goodput, achieving average 1.4x and up to 1.9x speedup over standard speculative decoding in large-batch inference.

Method¶

Key Designs¶

Companion Model Information Gain Framework: \(S = \sum_{i \in \text{vocab}} \min(P_d(t_i), P_c(t_i))\) and \(A = \min(1, P_c(t_d)/P_d(t_d))\). Observing \(S\) and \(A\) reduces acceptance probability uncertainty by 34% and improves acceptance rate by 20%.
Goodput-Based Dynamic Verification Length Scheduling: Selects \(\gamma\) maximizing expected accepted tokens per unit verification latency. Goodput is concave in verification length, solvable via incremental search.
Execution Optimization (Overlap + Data-Parallel): Companion model adds only 1.3-5.3% compute and 2.8-8.1% memory overhead.

Key Experimental Results¶

Average 1.4x speedup over standard SD across large-batch configurations
18-45% verification TFLOPs reduction
Positive information gain observed across 90 public model combinations

Highlights & Insights¶

Information-theoretic perspective is elegant: requiring only positive information gain (weak assumption) makes the method broadly applicable
Particularly valuable for large-batch scenarios where SD's benefits diminish most

Rating¶

Novelty: ⭐⭐⭐⭐⭐
Experimental Thoroughness: ⭐⭐⭐⭐⭐
Writing Quality: ⭐⭐⭐⭐
Value: ⭐⭐⭐⭐⭐