Skip to content

Multi-Drafter Speculative Decoding with Alignment Feedback

Conference: ACL 2026 arXiv: 2604.05417 Code: Available Area: LLM Efficiency Keywords: Speculative Decoding, Multi-Armed Bandit, Multi-Drafter, Alignment Feedback, Inference Acceleration

TL;DR

MetaSD is a unified framework integrating multiple heterogeneous drafters into speculative decoding, modeling drafter selection as a multi-armed bandit problem with Block Divergence (BD) reward signals to dynamically select the most aligned drafter, consistently outperforming single-drafter methods in both black-box and white-box configurations.

Method

Key Designs

  1. Block Divergence (BD) Reward: Provides more informative alignment feedback than traditional block efficiency by computing TV distance averages across all positions in a draft block, with theoretically proven stronger feedback signal.

  2. Stopping-Time Regret Objective: Minimizes the number of speculative decoding rounds to generate \(B\) tokens vs the optimal strategy, achieving \(O(\ln B)\) regret bound.

  3. MetaSD-UCB Algorithm: Balances exploration and exploitation via UCB, naturally extending to the non-standard speculative decoding setting with rigorous regret analysis.

Key Experimental Results

MetaSD-UCB automatically approaches near-optimal expert drafter performance without knowing the task type, significantly outperforming random selection and static ensembles. Framework naturally handles inter-query non-stationarity and requires no additional training.

Highlights & Insights

  • Speculative decoding + multi-armed bandit combination is natural: alignment feedback inherently provides reward signals
  • Theoretical analysis of BD vs BE feedback signal strength is deep and generalizable

Rating

  • Novelty: ⭐⭐⭐⭐
  • Experimental Thoroughness: ⭐⭐⭐⭐
  • Writing Quality: ⭐⭐⭐⭐⭐
  • Value: ⭐⭐⭐⭐