Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing¶
Conference: NeurIPS 2025 (Workshop on Efficient Reasoning)
arXiv: 2509.20336
Code: None
Area: Interpretability / Mechanistic Interpretability
Keywords: Graph Reasoning, Transformer, Circuit Tracing, Token Merging, Structural Memorization
TL;DR¶
This paper applies a circuit tracing framework to analyze the internal mechanisms of decoder-only Transformers on graph reasoning tasks, uncovering two core reasoning mechanisms: token merging and structural memorization.
Background & Motivation¶
Transformer-based LLMs have demonstrated strong performance on graph reasoning tasks (e.g., path finding, subgraph extraction), yet their internal reasoning mechanisms remain a black box. Limitations of existing interpretability work:
Lack of a unified perspective: Mechanistic analyses of different graph reasoning tasks lack coherent connections.
Methodological limitations: Traditional approaches such as attention visualization are insufficient to reveal deeper mechanisms.
Model scope: Most analyses target encoder or encoder-decoder architectures, leaving decoder-only architectures understudied.
This paper employs the circuit tracing framework to conduct a unified analysis of the internal mechanisms underlying graph reasoning in a basic decoder-only Transformer.
Method¶
Overall Architecture¶
- Train small-scale decoder-only Transformers on graph reasoning data.
- Apply circuit tracing techniques to track information flow.
- Visualize reasoning trajectories to identify core computational patterns.
- Quantitatively analyze the relationship between these patterns and task performance.
Key Designs¶
-
Circuit Tracing Framework:
- Based on the circuit discovery methodology proposed by Anthropic.
- Traces causal information flow from input to output.
- Identifies critical attention heads and MLP neurons.
-
Two Core Mechanisms Discovered:
Token Merging: - Specific attention heads aggregate graph-structural information (nodes and edges) into a single token position. - Functions as an "information aggregation point," concentrating distributed graph-structural encodings. - Particularly critical in path reasoning tasks.
Structural Memorization: - MLP layers store "templates" of common graph structural patterns. - During inference, relevant structures are retrieved via pattern matching. - Plays a dominant role in subgraph extraction tasks.
-
Analysis Dimensions:
- Effect of graph density on the mechanisms.
- Effect of model scale on the mechanisms.
- Contribution ratio of the two mechanisms across different task types.
Loss & Training¶
Training uses the standard autoregressive language modeling loss: $\(\mathcal{L} = -\sum_t \log P(y_t | y_{<t}, G)\)$
where \(G\) denotes the textualized representation of the input graph.
Key Experimental Results¶
Main Results (Graph Reasoning Task Accuracy)¶
| Model Scale | Path Detection ↑ | Shortest Path ↑ | Cycle Detection ↑ | Subgraph Matching ↑ | Connected Components ↑ |
|---|---|---|---|---|---|
| 2L-4H | 72.3 | 58.2 | 68.5 | 65.1 | 71.8 |
| 4L-8H | 89.5 | 75.8 | 84.2 | 82.6 | 87.3 |
| 6L-8H | 95.2 | 86.5 | 91.8 | 90.3 | 93.5 |
| 8L-16H | 97.8 | 92.1 | 95.5 | 94.7 | 96.2 |
Mechanism Contribution Quantification¶
| Task Type | Token Merging (%) | Structural Memorization (%) | Other (%) |
|---|---|---|---|
| Path Detection | 62.5 | 25.3 | 12.2 |
| Shortest Path | 58.8 | 28.5 | 12.7 |
| Cycle Detection | 45.2 | 42.8 | 12.0 |
| Subgraph Matching | 32.1 | 55.6 | 12.3 |
| Connected Components | 55.3 | 32.5 | 12.2 |
Graph Density Impact Analysis¶
| Graph Density | Token Merging Effectiveness ↑ | Structural Memorization Effectiveness ↑ | Overall Accuracy |
|---|---|---|---|
| Sparse (d=0.1) | 0.92 | 0.85 | 96.5 |
| Medium (d=0.3) | 0.85 | 0.78 | 91.2 |
| Dense (d=0.5) | 0.72 | 0.65 | 82.8 |
| Very Dense (d=0.7) | 0.58 | 0.52 | 71.3 |
Key Findings¶
- Token merging dominates path-type tasks: Tasks requiring information propagation along paths rely more heavily on token merging.
- Structural memorization dominates matching-type tasks: Tasks requiring the identification of specific patterns rely more heavily on structural memorization.
- Graph density has a significant impact: The efficiency of both mechanisms degrades in dense graphs due to information overload.
- Model scale positively correlates with mechanism complexity: Larger models learn more refined token merging strategies.
Highlights & Insights¶
- Unified explanatory framework: This is the first work to reveal the graph reasoning mechanisms of decoder-only Transformers from a unified perspective.
- Dual-mechanism discovery: Token merging and structural memorization together provide an intuitive understanding of the reasoning process.
- Actionable insights: Understanding these mechanisms can inform the design of more efficient graph reasoning models.
Limitations & Future Work¶
- Experiments are conducted only on small-scale Transformers; mechanisms in large models (e.g., GPT-4) may differ.
- The textualization scheme for graphs may influence the conclusions drawn.
- As a workshop paper, the experimental depth warrants further extension.
- The reliability of the circuit tracing methodology itself remains a subject of debate.
Related Work & Insights¶
- Anthropic Circuit Tracing: The methodological foundation of this work.
- Mechanistic Interpretability: The series of works by Elhage et al.
- GraphQA: A graph reasoning benchmark.
- Theoretical Analysis of Transformers: Yun et al.'s analysis of Transformer expressivity.
Rating¶
| Dimension | Score (1–5) |
|---|---|
| Novelty | 4 |
| Theoretical Depth | 3 |
| Experimental Thoroughness | 3 |
| Writing Quality | 4 |
| Value | 3 |
| Overall Recommendation | 3.5 |