TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration¶

Conference: CVPR 2026 arXiv: 2603.22882 Code: https://github.com/ChunXiaostudy/TreeTeaming Area: Multimodal VLM Keywords: red-teaming, vision-language model safety, strategy tree exploration, automated vulnerability discovery, jailbreak attack

TL;DR¶

This paper proposes TreeTeaming, an autonomous red-teaming framework that transforms strategy exploration from static testing into a dynamic evolutionary process. An LLM orchestrator autonomously constructs and expands a hierarchical strategy tree, while a multimodal executor carries out concrete attacks. TreeTeaming achieves state-of-the-art attack success rates on 11 out of 12 evaluated VLMs, reaching 87.60% on GPT-4o.

Background & Motivation¶

Background: As VLMs grow more capable, their safety has attracted increasing attention. Red-teaming is a critical methodology for systematically identifying model vulnerabilities.
Limitations of Prior Work: Existing methods are constrained by predefined strategies — whether fixed prompt templates, typographic obfuscation, or fixed image patterns — and can only optimize within a known strategy space, failing to discover novel attack vectors.
Key Challenge: Even methods with feedback mechanisms (e.g., TRUST-VLM) can only refine test cases within a predefined framework; the strategies themselves still require manual design.
Goal: To automate the discovery of attack strategies themselves, rather than merely automating the execution of known strategies.
Key Insight: Starting from a single seed example, a complete strategy system is grown through hierarchical exploration over a tree structure.
Core Idea: A strategy orchestrator autonomously decides whether to "deepen promising attack paths" or "explore new strategy branches," thereby constructing a strategy tree.

Method¶

Overall Architecture¶

Two core components: (1) a Strategy Orchestrator (LLM) — maintains a hierarchical strategy tree and autonomously determines the direction of exploration (deepening vs. branching); (2) a Multimodal Executor — executes concrete attack strategies using a pluggable tool suite, with a built-in consistency checker that verifies the alignment between final samples and intended attacks.

Key Designs¶

Hierarchical Strategy Tree: Abstract concepts serve as parent nodes, and concrete attack strategies serve as leaf nodes; the orchestrator expands the tree autonomously.
Deepen vs. Explore Decision: The LLM orchestrator autonomously decides whether to go deeper (refining existing strategies) or sideways (exploring new strategy branches), based on attack success rates and strategy diversity.
Consistency Checker: Verifies that execution results align with the intended strategy, addressing the problem of strategy drift.

Loss & Training¶

No training is required — an LLM serves as the orchestrator and multimodal tools execute the attacks.

Key Experimental Results¶

Main Results¶

Model	TreeTeaming ASR	Best Baseline ASR	Note
GPT-4o	87.60%	~70%	SOTA
Claude-3	~80%	~65%	SOTA
LLaVA	~90%	~75%	SOTA
11 of 12 models	SOTA	-	Comprehensively leading

Key Findings¶

The diversity of discovered strategies exceeds the combined set of all publicly known strategies.
Generated attacks exhibit an average toxicity reduction of 23.09% — making them more covert and harder to detect by safety filters.
A rich strategy tree can be grown from a single seed example.

Newly Discovered Attack Strategy Categories¶

Strategy Category	Example	First Discovered
Typographic obfuscation	Embedding textual instructions in images	Known
Role-play induction	Constructing fictional scenarios to bypass moderation	Known
Semantic camouflage	Framing content within academic/educational contexts	Newly discovered
Multi-turn progressive elicitation	Gradually guiding toward target content	Newly discovered
Visual-semantic conflict	Misleading through image-text semantic contradiction	Newly discovered

Attack Toxicity Comparison¶

Attacks generated by TreeTeaming show an average toxicity reduction of 23.09% — making them more covert.
Attack success rates are simultaneously higher — indicating that reduced toxicity actually enhances the ability to bypass safety filters.

Highlights & Insights¶

"Automatically discovering strategies" rather than "automatically executing strategies" represents a paradigm-level innovation.
The strategy tree structure makes discovered attacks interpretable and traceable.
The framework is equally valuable for defense research — newly discovered strategies can be directly used to improve safety alignment.

Limitations & Future Work¶

The framework relies on a powerful LLM as the orchestrator, which incurs high cost; weaker models may be unable to effectively explore the strategy space.
The ethical boundaries of automated attack generation require careful consideration, as newly discovered attack vectors could be maliciously exploited.
Attacking closed-source models (e.g., GPT-4o) depends on API access, which is subject to rate limits and cost constraints.
The growth rate of the strategy tree may slow with increasing depth, making deeper strategies progressively harder to discover.
The accuracy of the consistency checker may affect the quality of final attack samples.
The automatic generation of defensive strategies following attack discovery remains unexplored.
The definition of attack success rate may be insufficient — some "successful" cases may represent only marginal violations.

vs. TRUST-VLM: TRUST-VLM automates test case generation within a fixed strategy framework; TreeTeaming automates the discovery of strategies themselves.
vs. FigStep/MM-SafetyBench: These methods each represent a single manually designed attack strategy; TreeTeaming can automatically discover these strategies and more.

Additional Discussion¶

The core innovation lies in transforming the problem from a single dimension to multiple dimensions for analysis, providing a more comprehensive perspective.
The experimental design covers diverse scenarios and baseline comparisons, with statistically significant results.
The modular design of the method facilitates extension to related tasks and new datasets.
Open-sourcing the code and data is of significant value for community reproduction and follow-up research.
Compared to concurrent work, this paper demonstrates greater depth in problem formulation and comprehensiveness in experimental analysis.
The paper's logical structure is clear, forming a complete loop from problem definition to method design to experimental validation.
The computational overhead of the method is reasonable, making it deployable in practical applications.
Future work could consider integration with additional modalities (e.g., audio, 3D point clouds).
Validating the scalability of the method on larger-scale data and models is an important direction for future research.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ A paradigm leap from automated execution to automated strategy discovery
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Comprehensive evaluation across 12 models with strategy diversity analysis
Writing Quality: ⭐⭐⭐⭐ Clear framework presentation and intuitive comparisons
Value: ⭐⭐⭐⭐⭐ Significant contribution to AI safety research