Skip to content

Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap

Conference: ECCV 2024
Code: None
Area: Model Compression
Keywords: Adversarial Robustness, Knowledge Distillation, Feature Variance, Covariance Alignment, Model Compression

TL;DR

This paper proposes an adversarially robust knowledge distillation method based on feature distribution statistical alignment. By reducing the feature variance gap between adversarial and clean examples in the student and teacher models, the adversarial robustness of the student model is enhanced. It is discovered that robust accuracy exhibits a strong negative linear correlation with the variance gap.

Background & Motivation

Background: Adversarial robustness is a prerequisite for deploying deep learning models in safety-critical scenarios. Adversarial Training (AT) is currently the most effective defense method, but it heavily relies on large-scale model architectures and substantial computational resources. To achieve resource-efficient deployment, Adversarially Robust Knowledge Distillation (ARD) has emerged, which transfers the robustness of a large teacher model to a lightweight student model.

Limitations of Prior Work: Existing adversarially robust knowledge distillation methods primarily focus on sample-to-sample alignment—namely, aligning the prediction outputs or intermediate features of the teacher and student for each input sample individually. Although effective, this approach ignores a critical dimension: the alignment of statistical properties of the feature distributions between the teacher and student. Specifically, they do not account for the structural distribution differences of features from both models at the entire dataset level.

Key Challenge: Between the adversarially robust teacher and the student model to be distilled, there exist not only sample-level prediction discrepancies but also structural differences at the distribution level. In particular, adversarial and clean examples induce varying degrees of variance changes in the feature space; the teacher (being more robust) exhibits a smaller variance gap, whereas the student shows a much larger one. Existing methods only align "each individual point" rather than the "distribution shape of those points."

Goal: (1) How can adversarial robustness transfer in knowledge distillation be enhanced from the statistical perspective of feature distributions? (2) What is the quantitative relationship between the feature variance gap and adversarial robustness? (3) How can high accuracy on natural images be simultaneously maintained?

Key Insight: Through empirical study, the authors discovered a key phenomenon: for adversarially trained models (both student and teacher), their robust accuracy (under various attack radii) exhibits a strong negative linear correlation with the feature variance gap (the feature variance of adversarial examples minus that of clean examples). This implies that reducing the variance gap can systematically improve robustness.

Core Idea: Implicitly improve the student's adversarial robustness by aligning the student's feature covariance with the teacher's, which reduces the variance gap between adversarial and clean examples.

Method

Overall Architecture

The method is built upon the standard knowledge distillation framework but introduces alignment constraints at the feature distribution level. Given an adversarially trained teacher model \(T\) and a student model \(S\) to be trained, forward propagation is performed using both clean examples and their corresponding adversarial examples for each input batch. In addition to standard output-layer logits alignment, two new alignment objectives are introduced at the intermediate feature layers of the backbone: (1) Feature covariance matrix alignment, which shapes the student's feature distribution to mimic the teacher's; and (2) Gram matrix alignment, which captures second-order statistics of features from another perspective.

Key Designs

  1. Variance Gap Analysis:

    • Function: Revealing the quantitative relationship between adversarial robustness and the feature variance gap to provide theoretical motivation for method design.
    • Mechanism: For an adversarially trained model, extract its feature set \(F_{clean}\) on test set clean examples and \(F_{adv}\) on adversarial examples, and compute the feature variances \(\text{Var}(F_{clean})\) and \(\text{Var}(F_{adv})\), respectively. The variance gap is defined as \(\Delta\text{Var} = \text{Var}(F_{adv}) - \text{Var}(F_{clean})\). Experiments demonstrate a strong negative linear trend between robust accuracy evaluated at different attack radii \(\epsilon\) and \(\Delta\text{Var}\), i.e., the smaller the variance gap, the higher the robust accuracy.
    • Design Motivation: This discovery provides a direct optimization target for the distillation method—instead of directly optimizing robust accuracy (which requires continuous generation of adversarial examples for evaluation during training), we can indirectly enhance robustness by optimizing the surrogate target of the variance gap.
  2. Feature Covariance Alignment:

    • Function: Aligning the student's feature covariance matrix with the teacher's, thereby narrowing statistical discrepancies between the two at the distribution level.
    • Mechanism: For a feature map output from a certain layer of the backbone, compute the feature covariance matrices \(C_T\) and \(C_S\) of the teacher and student on the current batch, respectively. The covariance alignment loss is defined as the Frobenius norm between the two covariance matrices: \(L_{cov} = \|C_S - C_T\|_F^2\). This loss is applied to both clean and adversarial examples. By aligning covariance matrices, the student not only mimics the teacher's output at the sample level but also inherits the geometric structure of the teacher's feature space at the distribution level.
    • Design Motivation: The teacher's robustness partially stems from the similar distribution shapes of adversarial and clean examples in its feature space (i.e., a small variance gap). By aligning covariance matrices, the student can inherit this property, thereby gaining better robustness without increasing model capacity.
  3. Gram Matrix Alignment:

    • Function: Providing supplementary statistical alignment signals from the perspective of feature channel correlations.
    • Mechanism: The Gram matrix \(G = F^T F\) captures the inner product relationship between different feature channels, reflecting the second-order statistics of the features. Aligning the Gram matrices of the student and teacher constrains the correlation structure between channels to remain consistent. The loss is defined as \(L_{gram} = \|G_S - G_T\|_F^2\). Experiments verify that reducing the student-teacher gap in Gram matrices also shows a negative correlation trend with robust accuracy.
    • Design Motivation: Covariance matrices and Gram matrices describe second-order statistical properties of features from different perspectives; combining them provides a more comprehensive distribution alignment. The Gram matrix, which has been shown to effectively capture texture and style information in style transfer, captures "robustness style" in this scenario.

Loss & Training

The overall loss function is: \(L = L_{AT} + \alpha L_{KD} + \beta L_{cov} + \gamma L_{gram}\)

where \(L_{AT}\) is the standard adversarial training loss (cross-entropy on adversarial examples), \(L_{KD}\) is the traditional logits distillation loss, and \(L_{cov}\) and \(L_{gram}\) are the covariance and Gram matrix alignment losses, respectively. Training employs PGD attacks to generate adversarial examples, with distillation and adversarial training proceeding simultaneously.

Key Experimental Results

Main Results

Dataset Teacher→Student Metric Ours Prev. SOTA (ARD/IAD) Gain
CIFAR-10 WRN-34-10→WRN-16-2 Robust Acc (PGD-20) SOTA-level Various ARD methods Consistent gain
CIFAR-10 WRN-34-10→ResNet-18 Robust Acc (PGD-20) SOTA-level Various ARD methods Consistent gain
CIFAR-100 WRN-34-10→WRN-16-2 Robust Acc (PGD-20) SOTA-level Various ARD methods Consistent gain
CIFAR-100 WRN-34-10→MobileNetV2 Robust Acc (PGD-20) SOTA-level Various ARD methods Consistent gain

Ablation Study

Configuration Robust Acc Natural Acc Description
Standard ARD baseline Baseline Baseline Logits alignment only
+ \(L_{cov}\) (Covariance Alignment) Improved Maintained Distribution alignment enhances robustness
+ \(L_{gram}\) (Gram Matrix Alignment) Improved Maintained Supplementary second-order statistics alignment
+ \(L_{cov}\) + \(L_{gram}\) (Full) Highest Maintained/Slightly Improved Joint alignment achieves optimal synergy

Key Findings

  • A strong negative linear correlation (correlation coefficient > 0.9) exists between the feature variance gap and robust accuracy, providing a solid empirical foundation for the proposed method.
  • Covariance alignment consistently improves robust accuracy across multiple teacher-student pairs and datasets without compromising natural accuracy.
  • Gram matrix alignment offers improvements complementary to covariance alignment, yielding the best performance when combined.
  • The method demonstrates consistent robustness improvements under different attack methods (PGD, AutoAttack) and various attack radii.
  • The effect of feature distribution alignment is more pronounced when the student model capacity is smaller—the smaller the capacity, the greater the value of distribution alignment.

Highlights & Insights

  • The discovery of the negative correlation between variance gap and robust accuracy is a major insight in itself, offering a new perspective on understanding adversarial robustness: feature distributions of robust models remain more stable under adversarial perturbations.
  • Extending the concept from "sample-level alignment" to "distribution-level alignment" is natural and intuitive: distillation should transfer not only "how to predict each sample" but also the "overall feature space structure."
  • The dual second-order statistical alignment using both covariance and Gram matrices provides comprehensive distribution constraints.
  • The implementation is straightforward with minimal computational overhead (requiring only covariance and Gram matrix calculations), making it easy to integrate into existing distillation pipelines.

Limitations & Future Work

  • Code is not publicly available, limiting reproducibility.
  • Experiments are primarily conducted on small-scale datasets such as CIFAR-10/100; its performance and computational overhead at ImageNet scale require validation.
  • Batch-level covariance estimation might be inaccurate when the batch size is small, necessitating analysis of sensitivity to batch sizes.
  • Only second-order statistics (variance, covariance) are considered; whether higher-order distribution features are useful warrants exploration.
  • The linear relationship between the variance gap and robustness is an empirical finding lacking theoretical proof—under what conditions this relationship holds remains unclear.
  • Its effectiveness has not been explored in adversarial robust distillation for NLP or other modalities.
  • Adversarially robust knowledge distillation (ARD, IAD, RSLAD) is an active research area; this work provides a new dimension of optimization from a statistical perspective.
  • The classic application of Gram matrices in style transfer is innovatively adapted to the adversarial robustness domain.
  • The idea of feature distribution alignment shares commonalities with distribution alignment (e.g., MMD, CORAL) in Domain Adaptation.
  • The concept of the variance gap may inspire new robustness evaluation metrics—allowing preliminary estimation of a model's robustness level solely by analyzing feature distributions without evaluating accuracy on adversarial examples.

Rating

  • Novelty: ⭐⭐⭐⭐ The discovery of the negative correlation between the variance gap and robustness is novel and insightful; the approach of distribution-level alignment is innovative.
  • Experimental Thoroughness: ⭐⭐⭐ Ablations are comprehensive, but the dataset scale is relatively small, lacking large-scale validation.
  • Writing Quality: ⭐⭐⭐⭐ The method is clearly motivated and derived from empirical findings, presenting a logical structure.
  • Value: ⭐⭐⭐ Provides a new perspective for adversarially robust distillation; the proposed method is simple and practical.