SORA: Free Second-Order Attacks in Fast Adversarial Training¶

Conference: ICML 2026
arXiv: 2606.00738
Code: https://github.com/SecondOrderAT/SORA
Area: AI Security / Adversarial Training
Keywords: Fast Adversarial Training, Catastrophic Overfitting, Second-Order Optimization, Adaptive Step Size, Robustness

TL;DR¶

This paper re-examines Catastrophic Overfitting (CO) in single-step adversarial training from a second-order perspective. It proposes PertAlign, a zero-cost curvature metric for early warning of CO, and derives SORA: an adaptive fast adversarial training algorithm that estimates the Hessian "for free" using backpropagated gradients from the previous step and samples optimal step sizes per-channel. SORA consistently avoids CO across 6 datasets and 4 architectures using a single set of hyperparameters, achieving a new state-of-the-art trade-off between robustness and clean accuracy for single-step AT.

Background & Motivation¶

Background: Adversarial Training (AT) is the most effective defense against adversarial examples, but the inner maximization of multi-step PGD-AT requires multiple backpropagations, making it extremely costly for large datasets and deep networks. Wong et al. (2020) proposed FGSM-RS with random starts, compressing the inner loop to a single step, making fast adversarial training (FAT) a viable solution.

Limitations of Prior Work: Single-step AT generally suffers from Catastrophic Overfitting (CO)—after training for a certain number of batches, robust accuracy against PGD suddenly collapses to near 0%, while FGSM accuracy increases, sometimes even exceeding clean accuracy. Existing mitigation methods (GradAlign, NuAT, N-FGSM, ATAS, ELLE, etc.) either require expensive additional backpropagations/regularization terms or have hyperparameters tightly bound to specific datasets/architectures, often failing in challenging scenarios like PathMNIST and TissueMNIST, which contradicts the "low-cost" objective of FAT.

Key Challenge: FGSM repeatedly attacks the model using perturbations of fixed size and direction, causing the model to minimize loss only along a narrow path within the \(\epsilon\)-ball. The loss surface becomes highly non-linear along this line, a phenomenon the authors define as Epsilon Overfitting (EO)—the geometric root of CO. To break EO, diversity in perturbation magnitude is required; to make this diversity effective, knowledge of the local curvature of the loss surface is needed; however, explicit Hessian computation would destroy the cost advantage of single-step AT.

Goal: (1) Provide a geometric characterization of CO and prove that perturbation magnitude diversity is key to resolving EO; (2) Design a "free" curvature metric within single-step AT to both warn of CO and guide the attack; (3) Develop a fast AT method that is robust to datasets and architectures without extra backpropagations or parameter tuning.

Key Insight: The authors notice that when backpropagation computes derivatives with respect to parameters, the first layer of the chain rule is exactly \(\nabla_x \mathcal{L}(f_\theta(x+\delta), y)\). By simply "extending" this gradient to the input layer and computing its cosine similarity with \(\nabla_x \mathcal{L}(f_\theta(x'), y)\) (already calculated during adversarial example generation), one obtains a curvature proxy with zero extra overhead.

Core Idea: Treat inner maximization as a second-order problem. Use the gradients already computed in the previous batch to approximate the Hessian-vector product, deriving an analytical optimal step size \(\alpha^*\). Sample step sizes uniformly within \([0, \alpha^*]\) for each channel to simultaneously break EO and CO.

Method¶

Overall Architecture¶

SORA replaces the "FGSM + fixed step size \(\alpha\)" in single-step AT with "FGSM direction + adaptive random step size." The training workflow for each batch is: (1) Add a random start \(\eta \sim \mathcal{U}(-\epsilon, \epsilon)^d\); (2) Compute \(g = \nabla_x \mathcal{L}(f_\theta(x+\eta), y)\); (3) Independently sample step sizes \(\alpha_i\) from \(\mathcal{U}(0, \alpha^*)\) for each pixel channel, construct \(x' = x + \eta + \alpha_i \odot \text{sign}(g)\) and clip to \([0,1]\); (4) Backpropagate to update \(\theta\), simultaneously retaining the gradient \(g'\) extended to the input layer; (5) Update the EMA linearity coefficient \(v\) using the dot product of \(g\) and \(g'\) along the sign direction \(p = \text{sign}(g)\), and derive a new \(\alpha^*\) for the next batch. Relative to FGSM-RS, this process involves no extra backpropagation, with negligible time and memory overhead.

Key Designs¶

PertAlign: A Zero-Cost Second-Order CO Warning Metric:
- Function: Uses a cosine similarity to measure the non-linearity of the loss surface along the attack direction, providing a "pre-CO" signal before PGD accuracy collapses.
- Mechanism: Define \(\text{PertAlign} = \cos\!\big(\nabla_x \mathcal{L}(f_\theta(x'), y),\ \nabla_x \mathcal{L}(f_\theta(x'+\delta), y)\big)\), where \(x' = x + \eta\) and \(\delta = \alpha v\). The paper proves \(1 - \text{PertAlign} \approx \tfrac{\alpha^2}{2}\|h_{\perp g}\|^2\), where \(h = Hv/\|g\|\). Thus, PertAlign directly captures the component of the Hessian-vector product orthogonal to the gradient. When CO occurs, this component explodes, and PertAlign drops sharply from near 1 toward 0. One gradient comes from generating the adversarial sample, and the other from the standard backpropagation extended by one layer, thus requiring zero additional forward or backward passes.
- Design Motivation: Existing metrics like GradAlign and ELLE require an extra backpropagation, violating FAT cost constraints. Furthermore, PertAlign triggers earlier than metrics like GradAlign / AAE share / TRADES-KL (Fig 3 shows PertAlign dropping at batch 3775, whereas FGSM/PGD accuracy diverges visibly only at batch 3825).
Second-Order Optimal Step Size \(\alpha^*\) + EMA Linearity Coefficient:
- Function: Provides a theoretically optimal attack step size for the next batch without explicitly calculating the Hessian.
- Mechanism: Perform a second-order expansion of the loss along direction \(v = \alpha\, \text{sign}(g)\) as \(\mathcal{L}(x+v) \approx \mathcal{L}(x) + v^T g + \tfrac{1}{2}v^T H v\). Differentiating with respect to \(\alpha\) yields the optimal step size \(\alpha^* = \min\!\big(\alpha_{\max},\ \alpha_0 / (1 - p^T g'/\|g\|_1)\big)\), where \(g' = g + \alpha H p\) is obtained for free from the input-layer gradient of the previous batch. SORA maintains an EMA \(v \leftarrow (1-\beta) v + \beta \cdot p^T g'/\|g\|_1\) to stabilize the estimation of \(\alpha^*\). Reusing information across batches exploits the assumption that weights change only slightly per step and batches come from the same distribution, making second-order information "essentially for free."
- Design Motivation: Explicitly calculating \(H\) is unacceptable for large models. The method must be more robust than empirically tuned fixed \(\alpha\), hence the introduction of an analytical approximation that reuses existing gradients.
Channel-wise Randomized Step Size to Break Epsilon Overfitting:
- Function: Adds a layer of perturbation diversity on top of \(\alpha^*\) to prevent the model from overfitting to a specific \(\epsilon\) value.
- Mechanism: Instead of using \(\alpha^*\) directly, sample \(\alpha_i \sim \mathcal{U}(0, \alpha^*)\) independently for each channel of each pixel, then set \(x' = x + \eta + \alpha_i \odot \text{sign}(g)\). This ensures samples within the same batch see a wide distribution of perturbation magnitudes, covering the \(\ell_\infty\) ball more uniformly and removing the "EO signature" where FGSM accuracy spikes at certain \(\epsilon\) values.
- Design Motivation: Sec. 3 identifies EO as the root cause of CO—fixed large perturbations flatten the loss only along narrow paths. Randomized step sizes \(\mathcal{U}(0, \alpha^*)\) combined with adaptive \(\alpha^*\) address both the symptoms (CO) and the root cause (EO).

Loss & Training¶

Standard Cross-Entropy + SGD (momentum 0.9, weight decay \(5\times 10^{-4}\)) is used throughout with a single set of hyperparameters: \(\alpha_0 = 0.02\), \(\beta = 0.05\), \(\alpha_{\max} = 2\epsilon\), \(\epsilon = 8/255\). The EMA coefficient \(v\) is initialized to 0.99. When the model reaches the CO threshold, PertAlign drops, \(v\) decreases, and \(\alpha^*\) is automatically scaled up via \(\alpha_0/(1-v)\), generating stronger adversarial samples to recover robustness. Under normal conditions, \(\alpha^*\) is clipped by \(\alpha_{\max}\), maintaining costs similar to FGSM-RS.

Key Experimental Results¶

Main Results¶

The paper compares 15 baselines across 6 datasets (CIFAR-10/100, TinyImageNet, ImageNet-100, PathMNIST, TissueMNIST) and 4 architectures (ResNet, PreActResNet, WideResNet, SENet, ViT). Representative results for PreActResNet-18, \(\epsilon = 8/255\), evaluated via AutoAttack:

Dataset	Metric	SORA	N-FGSM	AAER	FGSM
PathMNIST	Clean	84.88	74.86	81.43	36.54
PathMNIST	AutoAttack	35.54	1.90	1.89	0.42
ImageNet-100	Clean	57.26	49.38	48.26	15.98
ImageNet-100	AutoAttack	18.56	15.52	17.18	0.00

SORA is the only single-step method to successfully avoid CO and achieve the highest robust and clean accuracy across all 6 datasets and 4 architectures. On PathMNIST, it improves AutoAttack robustness from < 2% (all baselines) to 35.54%.

Ablation Study¶

Configuration	Clean	FGSM	PGD-10	Description
SORA (full)	84.69	57.51	45.56	Complete Method
– Without Random Sampling	85.67	58.89	45.35	Remove channel-level randomization
– Clamping Step-Size	86.82	52.02	34.08	Clamp adaptive \(\alpha^*\) to fixed value
– Without Optimal Step-Size	89.93	28.30	17.23	Degenerate to fixed step size FGSM-RS

Key Findings¶

Adaptive \(\alpha^*\) is the performance anchor: Removing it causes PGD-10 robustness to drop from 45.56% to 17.23%, while clean accuracy increases to 89.93%—a classic EO sign where clean accuracy appears fine but robustness is broken.
PertAlign provides earlier CO warning than GradAlign / AAE / TRADES-KL / ELLE: On CIFAR-10, it signals approximately 50 batches earlier, sufficient for SORA to automatically increase \(\alpha^*\) and stabilize training.
Hyperparameter stability across domains: All experiments use the same \((\alpha_0, \beta, \alpha_{\max})\). In contrast, methods like GradAlign, N-FGSM, and ATAS failed to find configurations on PathMNIST that didn't either cause CO or drop performance on CIFAR.
Negligible cost: PertAlign and \(\alpha^*\) reuse existing gradients. Training time for 30 epochs on an RTX 4090 is comparable to FGSM-RS, with almost no increase in memory usage.

Highlights & Insights¶

Solving inner maximization as a second-order problem at first-order cost: Reusing the "extended" input gradient from the chain rule as a proxy for the Hessian-vector product is an elegant combination of engineering and theory, applicable to any training workflow requiring curvature (e.g., sharpness-aware minimization, second-order meta-learning).
CO reinterpreted as a symptom of EO rather than the root cause: Diversity in perturbation magnitude is more critical than direction. This reshapes the traditional explanation of why random starts work—not just by stabilizing direction, but by indirectly breaking local overfitting at a fixed \(\epsilon\).
PertAlign as a universal diagnostic tool: Since it only relies on two gradients already present in single-step attacks, it can be added to any fast AT method for "early CO warning" without modifying the core algorithm.
"Delayed-by-one" second-order estimation: Using \(g'\) from the previous batch for the current \(\alpha^*\) is experimental proof that this "gradient time-shifting" is stable enough. This logic is valuable for large-batch optimization and distributed asynchronous training.
Per-channel vs. Per-sample step size: The choice of per-channel randomization increases \(\ell_\infty\) ball coverage by ensuring different channels of the same image have different perturbations. This "fine-grained diversity" is applicable to patch attacks, style transfer, and other tasks needing to break singular perturbation patterns.

Limitations & Future Work¶

The second-order derivation assumes weights and batches change minimally between steps. This approximation might fail with large learning rates or very small batches, which the EMA smoothing aims to address.
Experiments focus exclusively on \(\ell_\infty\) threat models and image classification; transferability to \(\ell_2\), patch attacks, NLP, detection, or segmentation remains unverified.
PertAlign provides a "post-hoc" signal; while earlier than other metrics, it is not a "prediction" in a true sense. Stronger metrics directly characterizing Hessian spectral properties theoretically exist.
A gap remains between SORA and multi-step PGD-AT robustness. The paper treats multi-step methods as an "upper bound baseline" rather than attempting to bridge the gap entirely.

vs GradAlign (Andriushchenko & Flammarion, 2020): Both measure local linearity, but GradAlign requires an extra doubly-back-propagation as a regularization term, while PertAlign reuses gradients. SORA uses the metric to dynamically adjust step size rather than as a loss penalty.
vs N-FGSM (de Jorge et al., 2022): N-FGSM increases random noise and removes clipping to mitigate CO. SORA uses an analytical optimal step size + channel-level randomization, providing a clearer theoretical foundation and significantly higher robustness on hard datasets like PathMNIST.
vs ATAS (Huang et al., 2022) / Zhao et al. (2025): Both are adaptive step size methods, but ATAS uses empirical gradient norm scaling. SORA is the first to explicitly model step size selection as second-order optimization with a closed-form solution.
vs Curvature regularization (Moosavi-Dezfooli et al., 2018; Ma et al., 2021): These add curvature penalties to the loss, usually in multi-step AT. SORA integrates curvature awareness into the attack step size selection, preserving the FAT cost budget.
vs Free AT (Shafahi et al., 2019): Free AT reuses adversarial gradients for parameter updates. SORA shares this "gradient reuse" philosophy but extends it to curvature estimation rather than just parameter updates.
vs ELLE (Rocamora et al., 2024): ELLE encourages local linearity via regularization. SORA does not force linearity but instead tracks current curvature, remaining robust even on architectures like ViT without prior landscape bias.
vs AAER (Lin et al., 2024): AAER identifies and suppresses "abnormal adversarial examples" via regularization. SORA uses adaptive \(\alpha^*\) to ensure the attack reaches a true local maximum (NAE), preventing AAER at the source without extra regularization.