Partially Shared Concept Bottleneck Models¶
Conference: AAAI 2026 arXiv: 2511.22170 Code: github.com/7494zdl/PS-CBM Area: Interpretability Keywords: Concept Bottleneck Models, Interpretability, Vision-Language Models, Concept Efficiency, Image Classification
TL;DR¶
This paper proposes PS-CBM, a framework that integrates multimodal concept generation (combining LLM semantics with visual cues from exemplar images), a partially shared concept strategy (merging concepts based on activation patterns), and a Concept-Efficient Accuracy (CEA) evaluation metric. PS-CBM achieves higher classification accuracy and interpretability with fewer concepts across 11 datasets.
Background & Motivation¶
Problem Definition¶
Concept Bottleneck Models (CBMs) insert a layer of human-interpretable concepts between inputs and predictions to enhance model interpretability. While automated concept generation using LLMs and VLMs has reduced the burden of manual annotation, three fundamental challenges remain.
Three Core Challenges¶
1. Poor Visual Grounding¶
- Concepts generated by LLMs are semantically rich but often misaligned with actual visual content.
- VLM-based methods improve visual fidelity but sacrifice class-level semantic consistency at high computational cost.
- A persistent semantic–visual gap undermines both accuracy and interpretability.
2. Concept Redundancy¶
- Independent pool strategy: Concepts are generated independently per class → semantic duplication, with similar concepts redundantly assigned to multiple classes.
- Globally shared strategy: Unified deduplication → reduces redundancy but forces unrelated classes to share a fixed concept pool, harming class discriminability.
- Both strategies impede model clarity and training stability.
3. Inadequate Metrics¶
- Most CBMs are evaluated solely on classification accuracy, ignoring the interpretability cost of large, redundant concept sets.
- No principled metric exists to capture the trade-off between accuracy and concept efficiency.
- Performance gains may come at the expense of usability.
Paper Goals¶
To design a unified framework that simultaneously addresses all three challenges — introducing a Partially Shared concept strategy that finds an optimal balance between the independent and globally shared extremes.
Method¶
Overall Architecture¶
PS-CBM consists of three stages: 1. Multimodal Concept Generation: Generating a concept set by combining LLM semantics with exemplar images. 2. Partially Shared Concept Strategy: Merging concepts based on activation patterns and assigning them across classes. 3. CBM Training: Learning a transparent prediction model via concept supervision.
Key Designs¶
1. Multimodal Concept Generation¶
Bridging the gap between LLM semantics and visual grounding:
-
Few-shot image selection: For each class \(i\), a diverse few-shot exemplar set \(\bm{X}_i \subset \mathcal{X}_i\) is constructed using CLIP embeddings.
- Initialized from random images; iteratively selects new samples with maximum cosine distance from already-selected ones.
- Random sampling is used for noisy datasets (e.g., Food101).
-
Concept generation: For each class, a prompt combining textual descriptions with selected exemplars is used to query GPT-4o twice to reduce randomness.
- After deduplication, a candidate concept set \(\mathcal{S} = \bigcup_{i=1}^{l} \mathcal{S}_i\) is obtained.
- Each concept \(\bm{c}_j\) is associated with a class set \(\mathcal{C}_j\).
2. Partially Shared Concept Strategy (Three-Step Refinement)¶
The core contribution, progressively refining the concept set in three steps:
Step 1: Concept Filtering - Compute an image–concept affinity matrix \(\bm{A}_{i,j} = \cos(\Phi(\bm{x}_i), \Psi(\bm{c}_j))\), where \(\Phi\) is the image encoder and \(\Psi\) is the text encoder. - A concept \(\bm{c}_j\) is retained if its top-4 average alignment with class images exceeds a confidence threshold \(\tau_{\text{conf}}\).
Step 2: Concept Merging - Compute a correlation matrix \(\bm{Q}\) over filtered concepts: \(\bm{Q}_{i,j} = \frac{\bm{A}_{:,i}^\top \bm{A}_{:,j}}{\|\bm{A}_{:,i}\| \|\bm{A}_{:,j}\|}\) - Greedy merging: select the concept with the largest number of mergeable candidates as the representative; merge all concepts exceeding threshold \(\tau_{\text{merge}}\). - Merged concepts inherit the union of the original concepts' class sets. - Each class retains at most top-\(K\) exclusive concepts (concepts associated with only one class).
Step 3: Concept Labeling - The concept label vector for each image is one-hot encoded: \(s_{i,j} = 1\) if and only if \(y_i \in C_j\) and \(\bm{A}_{i,j} > \tau_{\text{conf}}\). - This yields a concept-annotated dataset \(\mathcal{D}' = \{(\bm{x}_i, \bm{s}_i, y_i)\}_{i=1}^n\).
3. CEA Metric (Concept-Efficient Accuracy)¶
A principled, information-theoretic metric:
where: - \(k = \lceil \log_2 l \rceil\): the theoretical minimum information (in bits) required to distinguish \(l\) classes. - \(m\): the number of concepts used. - \(\beta \geq 0\): a temperature parameter (smaller values emphasize accuracy; larger values emphasize compactness).
Three desirable properties: - Optimal efficiency: CEA → 1 as ACC → 1 and \(m\) → \(k\). - Adaptive scaling: Logarithmic scaling with base \(k\) adaptively adjusts the penalty according to task complexity. - Theoretical grounding: Aligned with Shannon information theory.
Loss & Training¶
Concept Bottleneck Layer (CBL) Training: - The backbone encoder \(\bm{\phi}\) is frozen; a projection layer \(\bm{g}: \mathbb{R}^d \to \mathbb{R}^{\hat{m}}\) is trained. - Loss function: Binary Cross-Entropy $\(\min_{\bm{g}} \mathcal{L}_{\text{CBL}} = \frac{1}{n}\sum_{i=1}^n \text{BCE}(\bm{g}(\bm{\phi}(\bm{x}_i)), \bm{s}_i)\)$
Final Classification Layer (FCL) Training: - A sparse linear classifier \(\bm{f}: \mathbb{R}^{\hat{m}} \to \mathbb{R}^l\). - Loss function: Cross-Entropy + elastic-net regularization $\(\min_{\bm{f}} \mathcal{L}_{\text{FCL}} = \frac{1}{n}\sum_{i=1}^n \text{CE}(\bm{f}(\hat{\bm{g}}(\bm{x}_i)), y_i) + \lambda R_\alpha(\bm{W}_F)\)$ - Optimized using the GLM-SAGA optimizer.
Key Experimental Results¶
Main Results (11 Datasets, CLIP_RN50 Backbone)¶
| Method | Avg. ACC (%) ↑ | Avg. # Concepts ↓ | Avg. CEA (%) ↑ |
|---|---|---|---|
| LaBo | 72.8 | 7,900 | 51.6 |
| LF-CBM | 72.9 | 718 | 55.2 |
| LM4CV | 73.4 | 873 | 56.4 |
| DN-CBM | 77.3 | 8,192 | 53.4 |
| Res-CBM | 71.8 | 291 | 56.7 |
| VLG-CBM | 75.2 | 732 | 57.0 |
| V2C-CBM | 72.8 | 7,500 | 51.2 |
| DCBM | 70.9 | 2,048 | 49.5 |
| PS-CBM | 78.3 | 545 | 59.0 |
PS-CBM surpasses prior SOTA by 1.0%–7.4% in average ACC and 2.0%–9.5% in CEA, while using only 545 concepts — 7,647 fewer than DN-CBM.
Ablation Study¶
| Concept Strategy | ACC ↑ | CEA ↑ | Notes |
|---|---|---|---|
| Independent | Lower | Lower | High concept redundancy |
| Globally Shared | Medium | Medium | Unrelated classes forced to share concepts |
| Partially Shared | Highest | Highest | Selective sharing balances specificity and compactness |
| Confidence Threshold \(\tau_{\text{conf}}\) | Avg. ACC (%) | # Concepts | CEA (%) |
|---|---|---|---|
| 0.10 | 76.20 | 548 | 57.41 |
| 0.15 | 76.14 | 548 | 57.36 |
| 0.20 | 78.35 | 545 | 59.02 |
| 0.25 | 72.71 | 458 | 55.16 |
| 0.30 | 57.55 | 145 | 46.84 |
Analysis of the number of exclusive concepts \(K\): ACC increases sharply from \(K=0\) to \(K=1\) and stabilizes at \(K \geq 2\); CEA peaks at \(K=1\) and decreases thereafter.
CLIP Score Comparison (Domain-Specific Datasets)¶
| Method | DTD | Resisc45 | UCF101 | Concept Generation | Concept Pool |
|---|---|---|---|---|---|
| LaBo | 0.227 | 0.222 | 0.230 | Language | Independent |
| DN-CBM | 0.192 | 0.187 | 0.187 | Vision | Globally Shared |
| V2C-CBM | 0.246 | 0.216 | 0.247 | Vision | Independent |
| PS-CBM | 0.249 | 0.255 | 0.265 | Language+Vision | Partially Shared |
Key Findings¶
- Decoupling accuracy from concept count: PS-CBM achieves the highest accuracy (78.3%) with the fewest concepts (545), demonstrating that concept quality matters more than quantity.
- Partial sharing is optimal: The partially shared strategy finds the best balance between the independent and globally shared extremes — selectively sharing semantically similar concepts reduces redundancy while preserving discriminability.
- \(K=1\) is most efficient: A single exclusive concept per class suffices to achieve the best CEA; additional class-specific concepts reduce efficiency.
- Advantage of multimodal concept generation: Combining LLM semantics with visual exemplars yields significantly higher CLIP Scores on domain-specific datasets compared to unimodal methods.
- Semantic consistency in concept–class mapping: Visualizations on CIFAR10 confirm that shared concepts correctly reflect semantic relationships (e.g., "hooves" and "long neck" are shared between deer and horse).
Highlights & Insights¶
- Elegant design of partial sharing: The greedy merging algorithm based on activation patterns is concise and effective — selecting the maximally mergeable set, inheriting the union of class sets, and capping the number of exclusive concepts.
- Theoretical grounding of CEA: The metric is derived from Shannon information theory with a well-defined upper bound (→1) and lower bound, offering greater theoretical elegance than existing metrics (CUE, NEC).
- Experimental breadth: Evaluation across 11 datasets spanning general, fine-grained, and domain-specific tasks ensures strong generalizability of the conclusions.
- Open-source release: Complete reproduction code and configurations are publicly available.
Limitations & Future Work¶
- Dependence on CLIP encoder: The quality of concept filtering and labeling is bounded by CLIP's alignment capability.
- Cost of GPT-4o: Concept generation requires querying GPT-4o, which incurs non-trivial costs at scale.
- Hyperparameter sensitivity: The choices of \(\tau_{\text{conf}}\) and \(\tau_{\text{merge}}\) substantially affect performance and require careful tuning.
- ImageNet subsampling: Due to dataset scale, only 10% of training images are used during merging, potentially affecting concept quality.
- Static concept set: The current framework does not support dynamic concept updates based on feedback.
Related Work & Insights¶
- vs. LF-CBM: LF-CBM employs a globally shared pool but suffers from concept redundancy; PS-CBM's partially shared strategy achieves a better balance between compactness and discriminability.
- vs. DN-CBM: DN-CBM discovers visual concepts via sparse autoencoders and achieves high accuracy but requires 8,192 concepts; PS-CBM surpasses it with only 545.
- CEA vs. CUE: CUE lacks a clear upper bound and is sensitive to text format; CEA is posterior, task-adaptive, and training-agnostic.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The partially shared strategy and CEA metric are well-motivated but not paradigm-shifting innovations.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 11 datasets, 8 baselines, and multi-dimensional ablations; highly comprehensive.
- Writing Quality: ⭐⭐⭐⭐⭐ — Clear structure, rich tables, excellent visualizations; the comparison matrix in Table 1 is particularly intuitive.
- Value: ⭐⭐⭐⭐ — Makes a substantive contribution to the CBM field; the CEA metric has strong potential for broad adoption.