CIARD: Cyclic Iterative Adversarial Robustness Distillation¶

Conference: ICCV 2025 arXiv: 2509.12633 Code: https://github.com/eminentgu/CIARD Institution: NJUST, HKUST(GZ), INSAIT Sofia University, Peking University Area: Model Compression / Adversarial Robustness Distillation / Knowledge Distillation Keywords: adversarial robustness distillation, knowledge distillation, adversarial training, dual-teacher, contrastive push loss, model compression

TL;DR¶

This paper proposes CIARD, which addresses the optimization objective conflict between the clean teacher and robust teacher in dual-teacher ARD frameworks via a Contrastive Push Loss, and introduces an Iterative Teacher Training (ITT) strategy to continuously update the robust teacher and prevent performance degradation. CIARD simultaneously improves adversarial robustness by +3.53% and clean accuracy by +5.87% on CIFAR-10/100 and Tiny-ImageNet.

Background & Motivation¶

State of the Field¶

Background: Deploying efficient yet robust models on edge devices is a practical necessity. Knowledge distillation (KD) enables compression from teacher to student, but conventional KD does not account for robustness. Adversarial training (AT) enhances robustness but is less effective for small models. Adversarial robustness distillation (ARD) combines both approaches.

Two key challenges in dual-teacher ARD: 1. Optimization objective conflict: The clean teacher focuses on clean accuracy while the robust teacher focuses on adversarial robustness, making it difficult for the student to reconcile both objectives. 2. Robust teacher performance degradation: As training progresses, the adversarial examples generated by the student become increasingly strong, continuously eroding the robust teacher's performance.

Starting Point¶

Goal: How to simultaneously improve the student's adversarial robustness and clean accuracy within a dual-teacher ARD framework?

Method¶

Overall Architecture¶

CIARD consists of a fixed clean teacher, a continuously updated robust teacher, and a push loss mechanism.

Key Designs¶

Contrastive Push Loss:
- Core Insight: Rather than driving the student toward both teachers simultaneously, the student is encouraged to actively diverge from the clean teacher's incorrect predictions.
- When the clean teacher makes incorrect predictions on adversarial examples, the push loss steers the student away from those erroneous directions.
- This enables the student to more effectively absorb the specialized knowledge of the robust teacher.
- It decouples the transmission pathways of clean knowledge and robust knowledge.
Iterative Teacher Training (ITT):
- Phase 1 (Warm-up): Both teacher networks are frozen, allowing the student to establish foundational knowledge.
- Phase 2 (Iterative Update): The robust teacher is periodically retrained using adversarial examples generated by the current student.
- Continuous adversarial retraining ensures the robust teacher can always effectively defend against the strongest current adversarial examples.
- Analogous to the alternating discriminator–generator updates in GANs.
Training Procedure:
- Adversarial examples are generated against the student using attacks such as PGD.
- Distillation loss = KL(student, robust_teacher on adv) + push_loss + CE(student, GT).
- ITT is triggered periodically to update the robust teacher.

Key Experimental Results¶

MobileNet-V2 on CIFAR-10¶

Framework Type	Method	Clean (%)	Robust (%)
Single-Teacher	ARD	83.43	57.03
Dual-Teacher	MTARD	89.26	57.84
Dual-Teacher	B-MTARD	89.09↓	58.79↑
Dual-Teacher	CIARD	89.51↑	59.10↑

CIARD is the only method that simultaneously improves both clean and robust accuracy.

Cross-Dataset Results¶

CIFAR-100: Average adversarial defense rate +3.53%
Tiny-ImageNet: Clean accuracy +5.87%
Consistent advantages across multiple attacks including PGD-20, AutoAttack, and C&W

Ablation Study¶

Removing push loss results in clean accuracy −1.2% and robust accuracy −0.8%.
Removing ITT leads to a notable decline in robust accuracy in the later stages of training.
The ITT update frequency requires balancing computational cost against performance gain.

Highlights & Insights¶

Breaking the accuracy–robustness trade-off: Simultaneously improving both metrics is highly significant for practical deployment.
Counter-intuitive design of Push Loss: Steering away from the clean teacher's incorrect predictions — a "negative sample" approach that is simple yet effective.
Identification and resolution of teacher degradation: This work is the first to explicitly identify the problem and directly address it via ITT.
Theoretical analysis + empirical validation: The teacher degradation phenomenon is clearly demonstrated.

Limitations & Future Work¶

ITT requires periodic retraining of the robust teacher, introducing additional computational overhead.
Validation is limited to classification tasks.
The weight coefficient for push loss requires manual tuning.
Comparisons with the latest adversarial training methods are not fully explored.

vs. ARD/RSLAD: Single-teacher frameworks with limited robustness gains and low clean accuracy.
vs. MTARD/B-MTARD: Dual-teacher frameworks that do not address optimization conflicts or teacher degradation.
CIARD's advantages: Push loss for decoupling + ITT for dynamic updating.

The "steer away from errors" strategy of push loss is transferable to other multi-teacher distillation scenarios.
Dynamic teacher degradation may similarly arise in other online distillation and adversarial learning settings.
CIARD shares conceptual similarity with curriculum learning: ITT essentially adapts the teacher to the increasing "curriculum" difficulty posed by the student.

Technical Details¶

Clean teacher: A large model trained on clean data only (e.g., WRN-34-10).
Robust teacher: A large model of the same architecture trained with PGD-AT, possessing adversarial robustness.
Student: A lightweight model (e.g., MobileNet-V2 / ResNet-18).
ITT update interval: The robust teacher is typically retrained every 10 epochs.
Push loss weight is adaptively adjusted based on robustness metrics on the validation set.
Adversarial examples are generated using PGD-10 (\(\varepsilon = 8/255\), step size \(2/255\)).
The additional training overhead of CIARD stems primarily from ITT's teacher retraining, increasing total training time by approximately 15–20%.
However, compared to performing adversarial training from scratch, ARD+CIARD remains more efficient overall.

Rating¶

Novelty: ⭐⭐⭐⭐ Push loss and ITT are effective, though neither is highly novel when considered independently.
Experimental Thoroughness: ⭐⭐⭐⭐ Multi-dataset, multi-attack, and ablation study included, but diversity of student architectures is limited.
Writing Quality: ⭐⭐⭐⭐ Problem formulation is clear; the framework comparison figure (Figure 1) is intuitive.
Value: ⭐⭐⭐⭐ Simultaneously improving accuracy and robustness carries significant practical importance for real-world deployment.