Granular Concept Circuits: Toward a Fine-Grained Circuit Discovery for Concept Representations¶

Conference: ICCV 2025 arXiv: 2508.01728 Code: https://github.com/daheekwon/GCC Area: Interpretability Keywords: Interpretability, Visual Circuit Discovery, Concept Representations, Neuron Connectivity, Mechanistic Interpretability

TL;DR¶

This paper proposes Granular Concept Circuit (GCC), a method that automatically discovers fine-grained visual circuits encoding specific concepts in deep visual models by iteratively evaluating inter-neuron functional dependency (Neuron Sensitivity Score) and semantic consistency (Semantic Flow Score). GCC is the first method capable of discovering multiple concept-level circuits within a single query.

Background & Motivation¶

Deep visual models form concept representations through hierarchical architectures—from low-level edges and textures to high-level objects and scenes. Understanding how these concepts are encoded within models is a core problem in explainable AI.

Limitations of existing methods:

Single-neuron analysis (NetDissect, CLIP-Dissect, etc.): Associates concepts with individual neurons, ignoring the distributed nature of representations—concepts are encoded collaboratively across multiple neurons and layers.

VCC (Visual Concept Connectome): Analyzes inter-layer connections using concept activation vectors (CAVs), but is misaligned with network structure and cannot precisely localize where concepts emerge.

ADVC (Rajaram et al.): Iteratively discovers circuits via gradient × activation cross-layer attribution, but constructs only a single unified circuit tied to class labels, lacking concept-level granularity.

All existing methods do not support decomposing model responses into multiple concept-level circuits—distinct concepts (e.g., sky, flag, clock) are conflated within a single circuit.

Method¶

Overall Architecture¶

GCC aims to discover multiple fine-grained concept circuits for a given query, each corresponding to a specific concept related to that query. The pipeline proceeds as follows: (1) extract root nodes → (2) evaluate cross-layer connections → (3) iteratively trace until no further connections are found → (4) repeat for all root nodes to obtain a complete set of circuits.

Key Designs¶

Neuron Sensitivity Score (\(S_{NS}\)): An intervention-based measure of functional dependency.

Connection strength is quantified by muting a source neuron and observing the resulting change in the target neuron's activation:

\(\tilde{S}_{NS,c} = \max(0, f^{l+1}(a_c^l) - f^{l+1}(\hat{a}_c^l))\) \(S_{NS} = \frac{\tilde{S}_{NS}}{\sum \tilde{S}_{NS}}\)

where \(\hat{a}_c^l\) denotes the layer-\(l\) activation after zeroing out the \(c\)-th neuron. A high \(S_{NS}\) indicates that the target neuron strongly depends on the source neuron. Positive clipping is applied to focus exclusively on positive correlations.

Design Motivation: Causal intervention captures true dependency more accurately than gradients.
A first-order approximation (per-neuron intervention) is used to avoid the \(O(2^{|N|})\) cost of exhaustive combinatorial search.
Semantic Flow Score (\(S_{SF}\)): A semantic consistency constraint.

\(S_{SF} = \frac{|\mathcal{S}_{src} \cap \mathcal{S}_{tgt}|}{|\mathcal{S}_{src}|}\)

where \(\mathcal{S}_{src}\) and \(\mathcal{S}_{tgt}\) are the top-\(k\) highly activated sample sets for the source and target neurons, respectively. High overlap indicates that both neurons encode similar semantic information.

Design Motivation: A high \(S_{NS}\) alone is insufficient—nonlinearities may produce spurious connections (functionally dependent but semantically unrelated); \(S_{SF}\) filters out such false connections.
Circuit Construction Algorithm:
Root node extraction: Selects neurons whose activations rank in the top 1% across all samples.
Connection criterion: A connection is accepted when both \(S_{NS} > \tau_{NS}\) and \(S_{SF} > \tau_{SF}\); \(\tau_{NS}\) is determined automatically via the Peak-over-Threshold (POT) method from extreme value theory; \(\tau_{SF}\) uses the mean score across all nodes.
Iterative expansion: Newly added nodes serve as new starting points, extending the search to the next layer until no further qualifying connections are found.
Efficient computation: Previously computed source-node connections are reused, and recursive techniques avoid redundant computation.

Loss & Training¶

GCC is a post-hoc analysis method that does not involve any training. It operates directly on pre-trained models (VGG19, ResNet50/101, MobileNetV3, ViT, etc.) using the ImageNet1K validation set as the reference sample pool.

Key Experimental Results¶

Main Results¶

Faithfulness and Completeness Evaluation (tested on 100 random ImageNet1K queries; logit changes are observed after ablating neurons inside/outside the circuit):

Ablation Condition	ResNet50	ResNet101	VGG19	MobileNetV3	Mean Drop
Original (no ablation)	17.17	17.46	20.94	17.34	—
Random neuron ablation	15.66	13.80	19.03	15.01	▼2.35
Ablate neurons inside GCC	6.41	6.18	12.93	12.95	▼8.60
Ablate neurons outside GCC	16.12	14.58	19.93	15.88	▼1.74

Ablating neurons inside GCC causes a substantial drop in logits (8.60), far exceeding random ablation (2.35); ablating neurons outside GCC has minimal impact (1.74), demonstrating that the discovered circuits are both faithful and complete.

Ablation Study (User Study)¶

33 participants rated GCC on five dimensions (5-point scale):

Evaluation Dimension	Mean Score
Query relevance: whether GCC relates to the query	3.65/5
Diversity: whether GCC captures a variety of concepts	4.00/5
Prototypicality: whether GCC represents commonalities across multiple queries	4.45/5
Connection plausibility: whether node–node and query–node connections are reasonable	>90% rated as plausible
Comparison with VCC: which captures more meaningful concepts	70% preferred GCC

Edge ablation/insertion experiments: edges are removed or added in order of \(S_{NS}\) rank; removing high-ranked edges leads to rapid performance degradation, while adding them yields significant gains.

Key Findings¶

First concept-level decomposition of visual circuits: A single "scoreboard" image can be decomposed into 17 GCCs, each corresponding to a distinct concept such as sky background, flag, and clock.
Generalization across models and datasets: Validated on both CNNs (VGG19, ResNet50/101, MobileNetV3) and Transformers (ViT).
Hierarchical concept evolution: The "blue texture" concept in a peacock image progressively refines from "broad blue tones" in the first layer to "blue scales" → "structured blue patterns" → "decorative blue patterns".
Cross-category shared concept discovery: The method identifies shared "radial" concepts across different categories (daisy and peacock) and "wheel" concepts across tanks, minibuses, and trucks.
Misclassification auditing: The specific concept circuits responsible for misclassification can be localized and verified through stimulation/suppression.

Highlights & Insights¶

Complementary dual-score design: \(S_{NS}\) captures functional dependency while \(S_{SF}\) ensures semantic consistency—both are indispensable, as relying solely on \(S_{NS}\) yields spurious connections.
Automatic thresholding via POT: Eliminates manual hyperparameter tuning, enhancing practical usability.
Paradigm shift from "one circuit" to "multiple circuits": Prior methods (VCC, ADVC) construct a single unified circuit per query; GCC is the first to decompose it into multiple concept-specific circuits.
Balance between interpretability and practicality: Application scenarios such as misclassification auditing and cross-category concept discovery offer tangible practical value.
Sankey diagram visualization: Connection strength is conveyed via link width, and neuron semantics are illustrated through high-activation sample crops, yielding an intuitive and informative visualization scheme.

Limitations & Future Work¶

Connection evaluation relies on a first-order approximation via per-neuron intervention, which may miss higher-order interactions.
Under strict thresholds, a single concept may be distributed across multiple circuit paths.
Some high-\(S_{NS}\) connections remain difficult to interpret in human-understandable language.
Validation is limited to classification models; circuit discovery in generative models (GAN/Diffusion) merits future exploration.
The reference dataset for semantic coverage (ImageNet1K validation set) may introduce bias, as activation samples for certain concepts may be insufficiently abundant.
The approach could be extended to video models and larger-scale Transformers (e.g., ViT-L/H).

Corresponding to mechanistic interpretability in NLP (Conmy et al.), GCC is the first to achieve fine-grained circuit discovery in the visual domain.
CRP computes conditional relevance but is limited to pairwise layers; CRAFT back-propagates from the classifier layer and can only extract classification-relevant concepts—GCC propagates forward and discovers a broader range of concepts.
Inspired by Hebbian learning and synaptic plasticity in neuroscience: functional connectivity combined with information flow preservation.
GCC's forward circuit discovery can identify abstract features shared across categories, which is beyond the reach of methods that back-propagate from class logits (e.g., ADVC).

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First to achieve concept-level fine-grained circuit discovery in visual models; the dual-score design is original and theoretically grounded.
Experimental Thoroughness: ⭐⭐⭐⭐ Covers quantitative faithfulness/completeness validation, user study, edge ablation, and evaluation across multiple models and datasets.
Writing Quality: ⭐⭐⭐⭐ Concepts are articulated clearly with rich visualizations; algorithm pseudocode is well-structured.
Value: ⭐⭐⭐⭐ Provides a new tool for visual model interpretability; application scenarios such as misclassification auditing carry practical significance.