Skip to content

Exploring Semantic-constrained Adversarial Example with Instruction Uncertainty Reduction

Conference: NeurIPS 2025 arXiv: 2510.22981 Code: Not released Area: Image Generation Keywords: adversarial examples, semantic constraint, diffusion models, 3D adversarial, transfer attack

TL;DR

This paper proposes InSUR, a multi-dimensional instruction uncertainty reduction framework that stabilizes adversarial optimization via a ResAdv-DDIM sampler, constrains attack scenarios through context-aware encoding, and evaluates semantic fidelity using WordNet-based semantic abstraction. InSUR is the first method to generate 2D/3D semantic-constrained adversarial examples (SemanticAE) from natural language instructions.

Background & Motivation

Traditional adversarial example research focuses on finding small perturbations around existing data. Generating adversarial examples directly from natural language instructions (SemanticAE) is an emerging yet underexplored direction. Given a semantic description, the goal is to generate data that is semantically correct but cannot be correctly recognized by deep learning models. Existing methods (AdvDiff, SD-NAE, VENOM, etc.) suffer from three limitations: 1. Referential diversity causes inconsistent language guidance across multi-step diffusion models, destabilizing adversarial optimization. 2. Descriptive incompleteness leads to poor adaptability to attack scenarios. 3. Ambiguous semantic boundaries make it difficult to evaluate SemanticAE generators.

Core Problem

How to generate transferable, adaptive, and effective semantic-constrained adversarial examples from uncertain human natural language instructions?

Method

Problem Formulation

\[\text{find } x_{\text{adv}} \in \mathcal{S}(\text{Text}) \quad \text{s.t.} \quad \mathcal{M}(x_{\text{adv}}) \in A_{\text{Text}}\]

where \(\mathcal{S}(\text{Text})\) is the set of data satisfying the semantic constraint, \(\mathcal{M}\) is the target (black-box) model, and \(A_{\text{Text}}\) is the set of erroneous outputs semantically inconsistent with the instruction.

Module 1: ResAdv-DDIM Sampler (Addressing Referential Diversity)

Core Idea: At each denoising step, a coarse estimate \(g_\theta(x_t)\) of \(x_0\) is obtained via DDIM residual shortcut prediction, rather than approximating \(\nabla_{x_t}\mathcal{L}_{\text{ATK}}\) directly by \(\nabla_{x_0}\mathcal{L}_{\text{ATK}}\).

\[g_\theta(x_t) = \underbrace{f_{\theta,\Delta T_1} \circ f_{\theta,\Delta T_2} \circ \cdots \circ f_{\theta,\Delta T_k}}_{k \text{ steps, } k \ll T/\Delta T}(x_t)\]
\[x_{t-\Delta T} = f_{\theta,\Delta T}\left(\arg\max_{x_t} \mathcal{L}_{\text{ATK}}(\mathcal{M}(g_\theta(x_t)))\right)\]

Semantic constraints are enforced via an upper bound on trajectory deviation:

\[\|\text{Denoise}_{\text{DDIM}}(x_{t_s-\Delta T}) - \text{Denoise}_{\text{Adv}}(x_{t_s-\Delta T})\|_2 < \epsilon\]

Adaptive attack optimization employs an early stopping mechanism: optimization terminates when the estimated attack failure probability falls below threshold \(\xi_1 = 0.1\) or \(\xi_2 = 0.01\).

Module 2: Context-Encoded Attack Scenario Constraint

2D Generation: Conditional and unconditional guidance is redistributed via guidance masking:

\[\epsilon_\theta(x_t, t) = (1-M) \cdot \epsilon_{\theta,\text{Unconditional}}(x_t, t) + M \cdot \epsilon_{\theta,\text{Conditional}}(x_t, t, \text{Text})\]

3D Generation (first implementation): ResAdv-DDIM is integrated with a Gaussian Splatting renderer:

\[g_\theta(z_t, \mathbf{pos}, \text{Camera}) = \text{Renderer}_{\text{GS}}(\mathcal{D}_{\text{GS}}(f_{\theta,\Delta T_1} \circ \cdots \circ f_{\theta,\Delta T_k}(z_t, \mathbf{pos}), \mathbf{pos}), \text{Camera})\]

Gradient accumulation over unknown camera poses is performed via the Expectation over Transformation (EoT) method.

Module 3: Semantic Abstraction Evaluation Enhancement

A hierarchical label taxonomy is constructed based on WordNet, defining escape attack tasks at the abstraction level:

\[\text{Text} = \text{"Realistic image of [AbstractedLabel], specifically, [label]"}\]
\[A_{\text{Text}} = \{\text{label}_{\text{Adv}} \mid \text{AbstractedLabel} \notin \mathbf{Ancestors}(\text{label}_{\text{Adv}})\}\]

The paper proposes a relative attack success rate \(ASR_{\text{Relative}}\) and a paired semantic divergence metric \(\text{SemanticDiff}_\mathcal{S}\), verifying semantic consistency by simultaneously generating a non-adversarial exemplar \(x_{\text{exemplar}}\).

Key Experimental Results

2D SemanticAE (\(\epsilon = 2.5\), average ASR across target models)

Surrogate Method Acc.↓ ASR↑ CLIP-Q↑ LPIPS↓
ResNet50 MI-FGSM 33.4% 41.5% 0.548 0.201
ResNet50 SD-NAE 37.1% 47.4% 0.841 0.457
ResNet50 VENOM 34.5% 34.4% 0.795 0.023
ResNet50 InSUR 15.1% 62.0% 0.815 0.031
ViT-B VENOM 30.5% 40.6% 0.796 0.021
ViT-B InSUR 10.9% 69.7% 0.815 0.038
  • Across all surrogate-and-task configurations, InSUR achieves at least a 1.19× improvement in average ASR and at least 1.08× in minimum ASR.
  • On the ViT-B surrogate, ASR reaches 69.7%, substantially outperforming VENOM at 40.6%.

Abstract Label Escape Task

Surrogate Method Acc.↓ ASR↑ CLIP-Q↑
ResNet50 VENOM 51.0% 34.9% 0.779
ResNet50 InSUR 35.2% 47.9% 0.808
ViT-B VENOM 46.3% 40.3% 0.780
ViT-B InSUR 28.7% 55.4% 0.814

Highlights & Insights

  • ⭐ First method to achieve reference-free 3D semantic adversarial example generation from natural language instructions.
  • ⭐ ResAdv-DDIM resolves adversarial direction inconsistency in multi-step diffusion models via residual shortcut prediction.
  • ⭐ The WordNet-based evaluation framework provides a principled definition of semantic boundaries for SemanticAE.
  • The paper systematically decomposes instruction uncertainty into three dimensions and addresses each independently.
  • InSUR achieves an excellent Pareto frontier balancing semantic preservation (low LPIPS) and attack effectiveness (high ASR).

Limitations & Future Work

  • The \(\epsilon\) parameter requires manual tuning and may vary across scenarios.
  • 3D generation relies on the Trellis framework; generalization to other 3D generative models remains unverified.
  • Evaluation metrics avoid FID/IS (due to concerns about adversarial interference), but no alternative generation quality metrics are provided.
  • The selection strategy for the number of residual steps \(k \in \{1,2,3,4\}\) in ResAdv-DDIM is not thoroughly discussed.
  • Generation time (7.26s) is slower than VENOM (3.09s), though still faster than SD-NAE (24.43s).
Method Generation Form Transfer Attack 3D Support Semantic Constraint
AdvDiff Perturbation-based Weak Implicit
SD-NAE Generative Moderate End-to-end optimization
VENOM Generative Moderate Sampling process modification
InSUR Generative Strong Multi-dimensional uncertainty reduction

The residual prediction concept in ResAdv-DDIM can be adapted to other diffusion model control tasks. The semantic abstraction evaluation methodology can be generalized as a universal semantic consistency assessment framework. The guidance masking partition strategy is applicable to controllable image editing (e.g., foreground/background guidance separation). 3D adversarial example generation has direct application value for autonomous driving safety evaluation.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ (SemanticAE conceptualization + first 3D implementation)
  • Experimental Thoroughness: ⭐⭐⭐⭐ (multi-surrogate, multi-task, comprehensive 2D/3D experiments)
  • Writing Quality: ⭐⭐⭐ (dense content, complex notation system, moderate readability)
  • Value: ⭐⭐⭐⭐ (provides new tools for AI safety evaluation)