QKD: Quantum-Gated Task-interaction Knowledge Distillation for Class-Incremental Learning¶

Conference: CVPR 2026 arXiv: 2604.11112 Code: https://github.com/Frank-lilinjie/CVPR26-QKD Area: Physics Keywords: Class-Incremental Learning, Quantum Computing, Knowledge Distillation, Pre-trained Models, Adapters

TL;DR¶

QKD introduces quantum gating into class-incremental learning (CIL), modeling sample-task correlations in high-dimensional Hilbert space via parameterized quantum circuits to guide cross-task knowledge distillation during training and adaptive adapter fusion during inference, achieving state-of-the-art performance on 5 benchmarks.

Background & Motivation¶

Background: Pre-trained model (PTM)-based CIL freezes the backbone and learns lightweight adapters per task. Prompt-based methods retrieve prompts via similarity search; adapter-based methods assign independent adapters to each task.

Limitations of Prior Work: Prompt-based methods produce noisy matches when task subspaces overlap due to local similarity retrieval. Adapter-based methods treat adapters as independent subspaces, ignoring cross-task correlations; heuristic routing/fusion at inference cannot handle entangled subspaces.

Key Challenge: Routing and fusion lack an explicit learned task-interaction mechanism — how to quantify the correlation between a current sample and each historical task, and leverage it for knowledge transfer during training and adapter selection during inference.

Goal: Design a unified learnable mechanism that dynamically quantifies sample-task correlations to jointly serve knowledge distillation during training and adaptive routing during inference.

Core Idea: Map sample features and task embeddings into a quantum Hilbert space, exploiting quantum superposition and interference to naturally encode complex multi-way task dependencies.

Method¶

Overall Architecture¶

Frozen ViT backbone + per-task lightweight adapters → construct task embeddings from each adapter (SVD dimensionality reduction) → quantum gating module computes sample-task correlation scores → during training: correlation-weighted feature distillation from old adapters to new adapters → during inference: the same correlation scores are used for adaptive adapter fusion.

Key Designs¶

Quantum-Gated Task Modulation (QGTM):
- Function: Computes geometric mutual information between a sample and each historical task.
- Mechanism: Extracts task embeddings from each adapter via truncated SVD; encodes normalized sample features and task embeddings through a parameterized quantum circuit (\(R_y\) rotations + learnable rotations + CNOT entanglement chains); measures the quantum state and computes correlation scores via a Projected Quantum Kernel (PQK), followed by softmax normalization.
- Design Motivation: The exponential dimensionality and interference effects of quantum Hilbert space compactly encode complex multi-way dependencies among overlapping task subspaces, which classical cosine similarity or MLPs fail to capture.
Task-Interaction Knowledge Distillation (TIKD):
- Function: Uses quantum correlations to guide cross-task feature transfer.
- Mechanism: Computes feature outputs of the current sample through each old adapter; aggregates these features weighted by quantum-gated correlation scores; applies MSE distillation loss against the new adapter's features.
- Design Motivation: Highly correlated old tasks contribute more knowledge while low-correlation tasks are suppressed, avoiding interference from irrelevant tasks.
Training-Inference Consistent Routing:
- Function: Reuses the correlation scores learned during training for adapter fusion at inference time.
- Mechanism: At inference, the same quantum gating module computes sample-task correlations across all tasks and fuses classification logits from each adapter via weighted aggregation.
- Design Motivation: Using the same routing mechanism at both training and inference eliminates inconsistency.

Loss & Training¶

Cross-entropy classification loss + correlation-weighted feature distillation loss. Quantum circuit parameters and adapter parameters are trained jointly.

Key Experimental Results¶

Main Results¶

Dataset	QKD Accuracy	Prev. SOTA	Gain
CIFAR-100	SOTA	EASE	+gain
CUB-200	SOTA	MOE-Adapters	+gain
ImageNet-R	SOTA	—	—

Ablation Study¶

Configuration	Accuracy	Note
Quantum gating	Best	Full model
Replace with cosine similarity	Drop	Insufficient expressiveness
Replace with MLP	Drop	Poor capture of complex dependencies
w/o TIKD	Drop	Missing cross-task knowledge transfer

Key Findings¶

Quantum gating consistently outperforms cosine similarity and MLP alternatives, validating the superior geometric expressiveness of quantum Hilbert space.
TIKD yields greater gains as the number of tasks increases, indicating that selective knowledge transfer becomes increasingly important as subspace overlap grows.
Training-inference routing consistency is critical; inconsistency leads to notable performance degradation.

Highlights & Insights¶

Practical application of quantum computing: The use of quantum computation is not gratuitous — the geometric properties of quantum Hilbert space are genuinely suited to modeling multi-way task dependencies.
Training-inference consistency: The same set of correlation scores serves both distillation and routing, yielding an elegant unified design.

Limitations & Future Work¶

The quantum circuit is currently simulated on classical hardware; efficiency on actual quantum devices remains unknown.
The SVD computation for task embeddings grows with the number of tasks.
Future work may explore deeper quantum circuits or integration with real quantum hardware.

vs. EASE: EASE performs cross-task alignment via class-prototype similarity, which offers limited expressiveness.
vs. MOE-Adapters: MoE fuses adapters via majority voting, lacking sample-level adaptivity.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First introduction of quantum computing into CIL, with well-motivated theoretical justification.
Experimental Thoroughness: ⭐⭐⭐⭐ Five datasets; ablations demonstrate quantum gating's superiority over classical alternatives.
Writing Quality: ⭐⭐⭐⭐ Quantum background is clearly introduced.
Value: ⭐⭐⭐⭐ Provides a novel tool for CIL.