QKD: Quantum-Gated Task-interaction Knowledge Distillation for Class-Incremental Learning¶
Conference: CVPR 2026 arXiv: 2604.11112 Code: https://github.com/Frank-lilinjie/CVPR26-QKD Area: Physics Keywords: Class-Incremental Learning, Quantum Computing, Knowledge Distillation, Pre-trained Models, Adapters
TL;DR¶
QKD introduces quantum gating into class-incremental learning (CIL), modeling sample-task correlations in high-dimensional Hilbert space via parameterized quantum circuits to guide cross-task knowledge distillation during training and adaptive adapter fusion during inference, achieving state-of-the-art performance on 5 benchmarks.
Background & Motivation¶
Background: Pre-trained model (PTM)-based CIL freezes the backbone and learns lightweight adapters per task. Prompt-based methods retrieve prompts via similarity search; adapter-based methods assign independent adapters to each task.
Limitations of Prior Work: Prompt-based methods produce noisy matches when task subspaces overlap due to local similarity retrieval. Adapter-based methods treat adapters as independent subspaces, ignoring cross-task correlations; heuristic routing/fusion at inference cannot handle entangled subspaces.
Key Challenge: Routing and fusion lack an explicit learned task-interaction mechanism — how to quantify the correlation between a current sample and each historical task, and leverage it for knowledge transfer during training and adapter selection during inference.
Goal: Design a unified learnable mechanism that dynamically quantifies sample-task correlations to jointly serve knowledge distillation during training and adaptive routing during inference.
Core Idea: Map sample features and task embeddings into a quantum Hilbert space, exploiting quantum superposition and interference to naturally encode complex multi-way task dependencies.
Method¶
Overall Architecture¶
Frozen ViT backbone + per-task lightweight adapters → construct task embeddings from each adapter (SVD dimensionality reduction) → quantum gating module computes sample-task correlation scores → during training: correlation-weighted feature distillation from old adapters to new adapters → during inference: the same correlation scores are used for adaptive adapter fusion.
Key Designs¶
-
Quantum-Gated Task Modulation (QGTM):
- Function: Computes geometric mutual information between a sample and each historical task.
- Mechanism: Extracts task embeddings from each adapter via truncated SVD; encodes normalized sample features and task embeddings through a parameterized quantum circuit (\(R_y\) rotations + learnable rotations + CNOT entanglement chains); measures the quantum state and computes correlation scores via a Projected Quantum Kernel (PQK), followed by softmax normalization.
- Design Motivation: The exponential dimensionality and interference effects of quantum Hilbert space compactly encode complex multi-way dependencies among overlapping task subspaces, which classical cosine similarity or MLPs fail to capture.
-
Task-Interaction Knowledge Distillation (TIKD):
- Function: Uses quantum correlations to guide cross-task feature transfer.
- Mechanism: Computes feature outputs of the current sample through each old adapter; aggregates these features weighted by quantum-gated correlation scores; applies MSE distillation loss against the new adapter's features.
- Design Motivation: Highly correlated old tasks contribute more knowledge while low-correlation tasks are suppressed, avoiding interference from irrelevant tasks.
-
Training-Inference Consistent Routing:
- Function: Reuses the correlation scores learned during training for adapter fusion at inference time.
- Mechanism: At inference, the same quantum gating module computes sample-task correlations across all tasks and fuses classification logits from each adapter via weighted aggregation.
- Design Motivation: Using the same routing mechanism at both training and inference eliminates inconsistency.
Loss & Training¶
Cross-entropy classification loss + correlation-weighted feature distillation loss. Quantum circuit parameters and adapter parameters are trained jointly.
Key Experimental Results¶
Main Results¶
| Dataset | QKD Accuracy | Prev. SOTA | Gain |
|---|---|---|---|
| CIFAR-100 | SOTA | EASE | +gain |
| CUB-200 | SOTA | MOE-Adapters | +gain |
| ImageNet-R | SOTA | — | — |
Ablation Study¶
| Configuration | Accuracy | Note |
|---|---|---|
| Quantum gating | Best | Full model |
| Replace with cosine similarity | Drop | Insufficient expressiveness |
| Replace with MLP | Drop | Poor capture of complex dependencies |
| w/o TIKD | Drop | Missing cross-task knowledge transfer |
Key Findings¶
- Quantum gating consistently outperforms cosine similarity and MLP alternatives, validating the superior geometric expressiveness of quantum Hilbert space.
- TIKD yields greater gains as the number of tasks increases, indicating that selective knowledge transfer becomes increasingly important as subspace overlap grows.
- Training-inference routing consistency is critical; inconsistency leads to notable performance degradation.
Highlights & Insights¶
- Practical application of quantum computing: The use of quantum computation is not gratuitous — the geometric properties of quantum Hilbert space are genuinely suited to modeling multi-way task dependencies.
- Training-inference consistency: The same set of correlation scores serves both distillation and routing, yielding an elegant unified design.
Limitations & Future Work¶
- The quantum circuit is currently simulated on classical hardware; efficiency on actual quantum devices remains unknown.
- The SVD computation for task embeddings grows with the number of tasks.
- Future work may explore deeper quantum circuits or integration with real quantum hardware.
Related Work & Insights¶
- vs. EASE: EASE performs cross-task alignment via class-prototype similarity, which offers limited expressiveness.
- vs. MOE-Adapters: MoE fuses adapters via majority voting, lacking sample-level adaptivity.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ First introduction of quantum computing into CIL, with well-motivated theoretical justification.
- Experimental Thoroughness: ⭐⭐⭐⭐ Five datasets; ablations demonstrate quantum gating's superiority over classical alternatives.
- Writing Quality: ⭐⭐⭐⭐ Quantum background is clearly introduced.
- Value: ⭐⭐⭐⭐ Provides a novel tool for CIL.