Enhancing Adversarial Transferability by Balancing Exploration and Exploitation with Gradient-Guided Sampling¶

Conference: ICCV 2025 arXiv: 2511.00411 Code: https://github.com/anuin-cat/GGS Area: AI Security / Adversarial Attacks Keywords: adversarial transferability, gradient-guided sampling, exploration-exploitation balance, flat maxima, black-box attack

TL;DR¶

This paper proposes Gradient-Guided Sampling (GGS), an inner-iteration sampling strategy that uses the gradient direction from the previous inner iteration to guide sampling. By striking a balance between Exploitation (attack strength / loss maxima) and Exploration (cross-model generalization / flat loss landscape), GGS significantly outperforms existing transfer attack methods across diverse architectures including CNNs, ViTs, and MLLMs.

Background & Motivation¶

Background: Adversarial transfer attacks are critical in black-box settings, where attackers have access only to surrogate models and aim to craft adversarial examples that fool unknown target models. Gradient-based methods, input transformation methods, and flat-maxima methods have been actively developed in recent years.
Limitations of Prior Work:
- Traditional momentum methods (MI-FGSM) overly prioritize Exploitation—seeking higher loss maxima for stronger attacks—but the resulting sharp loss landscape leads to poor generalization.
- Recent inner-iteration sampling methods (PGN, GRA) overly prioritize Exploration—obtaining flat loss landscapes through neighborhood sampling to enhance generalization—but sacrifice the height of loss maxima, weakening attack strength.
Key Challenge: A fundamental trade-off exists between Exploration and Exploitation: flat regions are not necessarily high-value regions, and high-value regions are not necessarily flat.
Key Insight: Fully random sampling in inner iterations produces unstable gradient directions that cannot consistently point toward regions that are simultaneously flat and high-value.
Core Idea: The gradient direction from the previous inner iteration is used to guide the current sampling direction (while the magnitude remains random), thereby maintaining stability toward the gradient ascent direction (Exploitation) while preserving sampling randomness to explore flat regions (Exploration).

Method¶

Overall Architecture¶

GGS builds upon the outer-iteration framework of MI-FGSM and inserts \(N\) inner-iteration steps before each adversarial example update. In each inner iteration, a lookahead sample is drawn by taking a random-magnitude step along the previous gradient direction, and the gradient is then computed at the sampled point. The gradients from all inner iterations are averaged to update the momentum, which is subsequently used to update the adversarial example.

Key Designs¶

Random Sampling (RS) Baseline Analysis:
- Function: Performs fully random neighborhood sampling in inner iterations.
- Mechanism: Sampled point \(\tilde{x}_i = x_{t-1}^{adv} + \tilde{p}\), where \(\tilde{p} \sim \text{Uniform}(-\zeta, \zeta)\).
- Limitation: Fully random sampling yields unstable gradient directions; although the averaged gradient roughly points toward flat regions, it fails to consistently align with the center of regions that are both flat and high-value.
Momentum-Guided Sampling (MGS) Intermediate Design:
- Function: Uses the accumulated momentum direction to guide sampling.
- Mechanism: \(\bar{x}_i = x_{t-1}^{adv} + |\tilde{p}| \cdot \text{sign}(m_{i-1})\), where \(m_i = \sum_{k=1}^i \tilde{g}_k\).
- Limitation: Improves upon RS via Nesterov-style lookahead that ensures a stable gradient ascent direction. However, momentum accumulation introduces long-chain dependency: early unstable samples excessively constrain later directions, severely impairing the ability to explore flat regions.
- Empirical Evidence: MGS yields only a 0.2% improvement in white-box ASR on the surrogate model, but a 5% drop in transfer ASR.
Gradient-Guided Sampling (GGS):
- Function: Uses the gradient from the previous inner iteration (rather than accumulated momentum) to guide the sampling direction.
- Mechanism: \(\hat{x}_i = x_{t-1}^{adv} + |\tilde{p}| \cdot \text{sign}(\tilde{g}_{i-1})\)
  - Direction is determined by the previous-step gradient (maintaining gradient ascent stability → Exploitation).
  - Magnitude is determined by a random distribution (preserving sampling randomness → Exploration).
  - Depends only on the single preceding gradient (avoiding long-chain dependency → preserving exploration capability).
- Design Motivation: Replacing \(m_{i-1}\) with \(\tilde{g}_{i-1}\) compared to MGS substantially alleviates the long-chain dependency problem. After a brief initial oscillation, GGS stably converges to the center of regions that are both flat and high-value.
- Key Property: Loss landscape visualizations show that GGS's high-loss region (in red) nearly completely encompasses the loss landscapes of all other methods.

Loss & Training¶

Hyperparameter settings: maximum perturbation \(\epsilon=16/255\), outer iterations \(T=10\), step size \(\alpha=\epsilon/T\), inner iterations \(N=20\), sampling radius \(\zeta=2.0\times\epsilon\).
Momentum decay \(\gamma\) is consistent with MI-FGSM.
Initial gradient \(\tilde{g}_0 \sim \text{Uniform}(-\zeta, \zeta)\) (random initialization).
1,000 ImageNet-compatible images (\(299\times299\times3\)) are used for evaluation.
The complete attack procedure is given in Algorithm 1: inner-iteration sampling + gradient computation → outer-iteration momentum update + sample projection.

Key Experimental Results¶

Main Results (Untargeted/Targeted ASR%, Single-Model Generation)¶

Generated from ResNet50:

Method	Dense121	Inc-v3	ViT-B	Inc-v3ens3	Avg (9 models)
MI (CVPR'18)	54.9/0.2	44.2/0.0	11.8/0.0	22.8/0.0	36.53/10.92
PGN (NeurIPS'23)	91.3/4.7	85.0/1.4	49.7/0.4	74.9/0.6	76.53/6.84
GGS	95.9/28.6	89.6/7.2	60.2/3.4	77.5/3.7	82.08/17.67

Generated from ViT-B:

Method	Res50	Dense121	Inc-v3	PiT-B	Avg (9 models)
PGN	69.3/0.4	81.1/0.7	78.6/0.4	84.8/4.4	75.29/9.91
GGS	80.8/6.0	89.9/6.5	87.4/4.6	92.7/27.3	83.33/17.22

MLLM Attack Results (Ensemble Setting, CSR%↓)¶

Method	GPT-4o	Gemini Pro	Claude Sonnet	Avg↓
Clean	77.1	85.6	68.3	79.18
PGN	56.2	69.6	45.1	56.80
GGS	43.1	61.1	40.0	47.54

GGS reduces the average MLLM CSR by more than 9% compared to the strongest baseline.

Ablation Study¶

Sampling Guidance	ResNet50 (White-box)	Other 8 Models (Black-box)
Random Sampling (RS)	97.3	63.74
Momentum-Guided (MGS)	97.5	58.79
Gradient-Guided (GGS)	99.3	79.93

Compatibility with Other Methods (Generated from ResNet50):

Method	Untargeted ASR	Targeted ASR
GRA / +GGS	73.41 / 78.69	8.61 / 13.64
PGN / +GGS	76.53 / 83.23	6.84 / 12.79
DIM / +GGS	51.30 / 90.13	8.63 / 19.37
SIM / +GGS	46.80 / 90.12	11.14 / 28.44
Admix / +GGS	54.78 / 85.50	11.28 / 30.50

Key Findings¶

GGS achieves the highest average untargeted ASR across all surrogate models (Res50/Inc-v3/ViT-B: 82.08/69.27/83.33%).
MGS, due to long-chain dependency, actually reduces transfer ASR by 5% (Table 4), validating the analysis that momentum guidance is unsuitable for inner-iteration sampling.
Combining GGS with input transformation methods yields substantial gains: DIM+GGS improves untargeted ASR by 38.83%, and SIM+GGS by 43.32%.
Loss landscape visualizations (Fig. 4) show that GGS's high-loss region nearly completely covers those of other methods while maintaining higher local maxima.
Inner-iteration gradient similarity analysis (Fig. 5d): the lower gradient similarity of GGS reflects enhanced exploration capability, whereas the high similarity of MGS indicates constrained exploration.
GGS reaches a stable sampling direction after a brief initial oscillation (Fig. 2c), regardless of initial sampling quality.

Highlights & Insights¶

The core insight is precise: the Exploration-Exploitation dilemma is abstracted as the tension between "flatness" and "high-value" in the loss landscape, capturing the fundamental contradiction in transfer attacks.
The method is remarkably concise: simply replacing \(\tilde{p}\) in RS with \(|\tilde{p}| \cdot \text{sign}(\tilde{g}_{i-1})\)—a single-line code change—yields a transfer ASR improvement of over 16%.
The progressive analysis from RS → MGS → GGS is logically clear, with each step addressing one specific issue (stability → long-chain dependency → balance).
Evaluation against MLLMs (GPT-4o / Gemini / Claude) demonstrates the practical significance of the method for real-world AI security research.
Broad compatibility with five input transformation methods and two RS-based methods is thoroughly validated.

Limitations & Future Work¶

The method is currently applicable only to inner-iteration approaches based on gradient averaging; compatibility with non-gradient-averaging methods such as VMI-FGSM and RAP remains to be explored (acknowledged by the authors in the conclusion).
The inner iteration count \(N=20\) incurs relatively high computational cost; investigating how to maintain performance with fewer inner iterations is worthwhile.
The sampling radius \(\zeta=2\epsilon\) is fixed; adaptive adjustment may further optimize flat-region search.
Evaluation is limited to classification tasks; transferability to downstream tasks such as object detection and semantic segmentation has not been verified.
A deeper analysis of how the randomness in the initial gradient \(\tilde{g}_0\) affects final convergence is warranted.

The momentum concept from MI-FGSM and the Nesterov lookahead from NI-FGSM serve as core foundations.
PGN's gradient norm penalty and RAP's sharpness-aware minimization represent two distinct approaches to pursuing flat maxima.
The analogy of transferring SAM (Sharpness-Aware Minimization) from model training generalization to adversarial example generalization is highly instructive.
The GGS principle of replacing accumulated momentum dependency with single-step gradient dependency may offer insights for other optimization problems requiring a balance between exploration and exploitation.

Rating¶

Novelty: ⭐⭐⭐⭐ Deep insight (E&E dilemma → single-step gradient guidance), elegant and concise methodology.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive coverage of CNNs, ViTs, adversarially trained models, MLLMs, and commercial APIs; detailed ablation and compatibility analysis.
Writing Quality: ⭐⭐⭐⭐⭐ The progressive RS→MGS→GGS logic and visualizations in Figs. 1–5 are exceptionally clear.
Value: ⭐⭐⭐⭐ The method is concise, practical, and broadly compatible, with important implications for both adversarial attack research and AI security defense.