Multi-Drafter Speculative Decoding with Alignment Feedback¶
Conference: ACL 2026 arXiv: 2604.05417 Code: Available Area: LLM Efficiency Keywords: Speculative Decoding, Multi-Armed Bandit, Multi-Drafter, Alignment Feedback, Inference Acceleration
TL;DR¶
MetaSD is a unified framework integrating multiple heterogeneous drafters into speculative decoding, modeling drafter selection as a multi-armed bandit problem with Block Divergence (BD) reward signals to dynamically select the most aligned drafter, consistently outperforming single-drafter methods in both black-box and white-box configurations.
Method¶
Key Designs¶
-
Block Divergence (BD) Reward: Provides more informative alignment feedback than traditional block efficiency by computing TV distance averages across all positions in a draft block, with theoretically proven stronger feedback signal.
-
Stopping-Time Regret Objective: Minimizes the number of speculative decoding rounds to generate \(B\) tokens vs the optimal strategy, achieving \(O(\ln B)\) regret bound.
-
MetaSD-UCB Algorithm: Balances exploration and exploitation via UCB, naturally extending to the non-standard speculative decoding setting with rigorous regret analysis.
Key Experimental Results¶
MetaSD-UCB automatically approaches near-optimal expert drafter performance without knowing the task type, significantly outperforming random selection and static ensembles. Framework naturally handles inter-query non-stationarity and requires no additional training.
Highlights & Insights¶
- Speculative decoding + multi-armed bandit combination is natural: alignment feedback inherently provides reward signals
- Theoretical analysis of BD vs BE feedback signal strength is deep and generalizable
Rating¶
- Novelty: ⭐⭐⭐⭐
- Experimental Thoroughness: ⭐⭐⭐⭐
- Writing Quality: ⭐⭐⭐⭐⭐
- Value: ⭐⭐⭐⭐