Skip to content

QKD: Quantum-Gated Task-interaction Knowledge Distillation for Class-Incremental Learning

Conference: CVPR 2026 arXiv: 2604.11112 Code: https://github.com/Frank-lilinjie/CVPR26-QKD Area: Physics Keywords: Class-Incremental Learning, Quantum Computing, Knowledge Distillation, Pre-trained Models, Adapters

TL;DR

QKD introduces quantum gating into class-incremental learning (CIL), modeling sample-task correlations in high-dimensional Hilbert space via parameterized quantum circuits to guide cross-task knowledge distillation during training and adaptive adapter fusion during inference, achieving state-of-the-art performance on 5 benchmarks.

Background & Motivation

Background: Pre-trained model (PTM)-based CIL freezes the backbone and learns lightweight adapters per task. Prompt-based methods retrieve prompts via similarity search; adapter-based methods assign independent adapters to each task.

Limitations of Prior Work: Prompt-based methods produce noisy matches when task subspaces overlap due to local similarity retrieval. Adapter-based methods treat adapters as independent subspaces, ignoring cross-task correlations; heuristic routing/fusion at inference cannot handle entangled subspaces.

Key Challenge: Routing and fusion lack an explicit learned task-interaction mechanism — how to quantify the correlation between a current sample and each historical task, and leverage it for knowledge transfer during training and adapter selection during inference.

Goal: Design a unified learnable mechanism that dynamically quantifies sample-task correlations to jointly serve knowledge distillation during training and adaptive routing during inference.

Core Idea: Map sample features and task embeddings into a quantum Hilbert space, exploiting quantum superposition and interference to naturally encode complex multi-way task dependencies.

Method

Overall Architecture

Frozen ViT backbone + per-task lightweight adapters → construct task embeddings from each adapter (SVD dimensionality reduction) → quantum gating module computes sample-task correlation scores → during training: correlation-weighted feature distillation from old adapters to new adapters → during inference: the same correlation scores are used for adaptive adapter fusion.

Key Designs

  1. Quantum-Gated Task Modulation (QGTM):

    • Function: Computes geometric mutual information between a sample and each historical task.
    • Mechanism: Extracts task embeddings from each adapter via truncated SVD; encodes normalized sample features and task embeddings through a parameterized quantum circuit (\(R_y\) rotations + learnable rotations + CNOT entanglement chains); measures the quantum state and computes correlation scores via a Projected Quantum Kernel (PQK), followed by softmax normalization.
    • Design Motivation: The exponential dimensionality and interference effects of quantum Hilbert space compactly encode complex multi-way dependencies among overlapping task subspaces, which classical cosine similarity or MLPs fail to capture.
  2. Task-Interaction Knowledge Distillation (TIKD):

    • Function: Uses quantum correlations to guide cross-task feature transfer.
    • Mechanism: Computes feature outputs of the current sample through each old adapter; aggregates these features weighted by quantum-gated correlation scores; applies MSE distillation loss against the new adapter's features.
    • Design Motivation: Highly correlated old tasks contribute more knowledge while low-correlation tasks are suppressed, avoiding interference from irrelevant tasks.
  3. Training-Inference Consistent Routing:

    • Function: Reuses the correlation scores learned during training for adapter fusion at inference time.
    • Mechanism: At inference, the same quantum gating module computes sample-task correlations across all tasks and fuses classification logits from each adapter via weighted aggregation.
    • Design Motivation: Using the same routing mechanism at both training and inference eliminates inconsistency.

Loss & Training

Cross-entropy classification loss + correlation-weighted feature distillation loss. Quantum circuit parameters and adapter parameters are trained jointly.

Key Experimental Results

Main Results

Dataset QKD Accuracy Prev. SOTA Gain
CIFAR-100 SOTA EASE +gain
CUB-200 SOTA MOE-Adapters +gain
ImageNet-R SOTA

Ablation Study

Configuration Accuracy Note
Quantum gating Best Full model
Replace with cosine similarity Drop Insufficient expressiveness
Replace with MLP Drop Poor capture of complex dependencies
w/o TIKD Drop Missing cross-task knowledge transfer

Key Findings

  • Quantum gating consistently outperforms cosine similarity and MLP alternatives, validating the superior geometric expressiveness of quantum Hilbert space.
  • TIKD yields greater gains as the number of tasks increases, indicating that selective knowledge transfer becomes increasingly important as subspace overlap grows.
  • Training-inference routing consistency is critical; inconsistency leads to notable performance degradation.

Highlights & Insights

  • Practical application of quantum computing: The use of quantum computation is not gratuitous — the geometric properties of quantum Hilbert space are genuinely suited to modeling multi-way task dependencies.
  • Training-inference consistency: The same set of correlation scores serves both distillation and routing, yielding an elegant unified design.

Limitations & Future Work

  • The quantum circuit is currently simulated on classical hardware; efficiency on actual quantum devices remains unknown.
  • The SVD computation for task embeddings grows with the number of tasks.
  • Future work may explore deeper quantum circuits or integration with real quantum hardware.
  • vs. EASE: EASE performs cross-task alignment via class-prototype similarity, which offers limited expressiveness.
  • vs. MOE-Adapters: MoE fuses adapters via majority voting, lacking sample-level adaptivity.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ First introduction of quantum computing into CIL, with well-motivated theoretical justification.
  • Experimental Thoroughness: ⭐⭐⭐⭐ Five datasets; ablations demonstrate quantum gating's superiority over classical alternatives.
  • Writing Quality: ⭐⭐⭐⭐ Quantum background is clearly introduced.
  • Value: ⭐⭐⭐⭐ Provides a novel tool for CIL.