There Was Never a Bottleneck in Concept Bottleneck Models¶
Conference: ICLR 2026 arXiv: 2506.04877 Code: None (per paper description) Area: Interpretability / Concept Bottleneck Models Keywords: Concept Bottleneck Models, Information Bottleneck, Information Leakage, Interventionability, Representation Learning
TL;DR¶
This paper identifies that Concept Bottleneck Models (CBMs) do not enforce a true "bottleneck" — the fact that a representation variable \(z_j\) can predict concept \(c_j\) does not imply it encodes only the information of \(c_j\). The paper proposes MCBM (Minimal Concept Bottleneck Model), which applies information bottleneck regularization to constrain each \(z_j\) to retain only the information of its corresponding concept, thereby achieving genuinely disentangled representations and reliable concept interventions.
Background & Motivation¶
- The Promise of CBMs: By training each component \(z_j\) of the representation to predict an interpretable concept \(c_j\), CBMs aim to provide both interpretability and interventionability.
- Information Leakage: The ability of \(z_j\) to predict \(c_j\) does not imply that \(z_j\) encodes only \(c_j\). In the extreme case, \(z_j\) may encode the entire input \(\mathbf{x}\) while still satisfying the CBM constraint.
- Two Consequences:
- Impaired Interpretability: \(z_j\) cannot be fully explained by \(c_j\).
- Ineffective Interventions: Modifying \(z_j\) alters not only \(c_j\) but also any other information encoded therein.
- Theoretical Flaw in CBM Interventions: There is no directed path from \(c_j\) to \(z_j\) in CBMs, leaving \(p(z_j|c_j)\) undefined in the graphical model. Existing interventions rely on an ad-hoc approximation via empirical quantiles of the sigmoid inverse.
Core Distinction: VM vs. CBM vs. MCBM¶
| VM | CBM | MCBM | |
|---|---|---|---|
| \(z_j\) encodes all of \(c_j\)? | ✗ | ✓ | ✓ |
| \(z_j\) encodes only \(c_j\)? | ✗ | ✗ | ✓ |
Method¶
1. Data Generating Process¶
- Input \(\mathbf{x}\) is determined by concepts \(\mathbf{c}\) and nuisance \(\mathbf{n}\): \(p(\mathbf{x}, \mathbf{y}, \mathbf{c}, \mathbf{n}) = p(\mathbf{x}|\mathbf{c}, \mathbf{n}) p(\mathbf{y}|\mathbf{x}) p(\mathbf{c}, \mathbf{n})\)
- Nuisance is decomposed into task-relevant \(\mathbf{n}_y\) and task-irrelevant \(\mathbf{n}_{\bar{y}}\) components.
2. Three Information-Theoretic Objectives¶
Objective 1 (shared by VM/CBM/MCBM): Maximize \(I(Z; Y)\) — representation predicts the target.
Objective 2 (CBM/MCBM): Maximize \(I(Z_j; C_j)\) — \(z_j\) is a sufficient statistic for \(c_j\).
Objective 3 (MCBM only): Minimize \(I(Z_j; X | C_j)\) — information bottleneck.
This ensures that \(z_j\) retains no additional information about \(\mathbf{x}\) beyond what is captured by \(c_j\), enforcing the Markov chain \(X \leftrightarrow C_j \leftrightarrow Z_j\).
3. MCBM Training Objective¶
- First term: task prediction loss.
- Second term (\(\beta\)-weighted): concept prediction loss.
- Third term (\(\gamma\)-weighted): information bottleneck regularization (exclusive to MCBM).
4. Intervention Mechanism¶
Interventions in MCBM are theoretically grounded: $\(p(z_j|c_j) = q_\phi(z_j|c_j)\)$
Since optimizing Objective 3 ensures \(z_j\) encodes only the information of \(c_j\), modifying \(z_j\) strictly corresponds to an intervention on \(c_j\).
5. Representation Head Design¶
- Binary concepts: \(g_\phi^z(c_j) = \lambda\) if \(c_j=1\), else \(-\lambda\)
- Multi-class concepts: \(g_\phi^z(c_j) = \lambda \cdot \text{one\_hot}(c_j)\) (prototype learning)
- Continuous concepts: \(g_\phi^z(c_j) = \lambda \cdot c_j\)
The encoder uses a stochastic formulation \(p_\theta(\mathbf{z}|\mathbf{x}) = \mathcal{N}(\mathbf{z}; f_\theta(\mathbf{x}), \sigma_x^2 I)\), trained via the reparameterization trick.
Key Experimental Results¶
Information Leakage Metric: URR (Uncertainty Reduction Ratio)¶
Measures how much nuisance information beyond the concept set is encoded in \(z\) (lower is better).
Task-Relevant Nuisance Leakage¶
| Method | MPI3D | Shapes3D | CIFAR-10 | CUB | AwA2 |
|---|---|---|---|---|---|
| Vanilla | 35.0 | 45.5 | 19.8 | 3.8 | 1.5 |
| CBM | 28.1 | 18.1 | 18.5 | 3.8 | 1.4 |
| CEM | 43.2 | 15.8 | 27.2 | 3.9 | 1.1 |
| ECBM | 25.2 | 47.1 | 18.1 | 4.5 | 1.1 |
| MCBM (high γ) | 0.0 | 0.0 | 17.6 | 2.4 | 0.7 |
Task-Irrelevant Nuisance Leakage¶
| Method | MPI3D | Shapes3D |
|---|---|---|
| Vanilla | 11.3 | 42.7 |
| CBM | 7.4 | 20.6 |
| CEM | 15.5 | 40.9 |
| MCBM (any γ) | 0.0 | 0.0 |
Key Findings¶
- CEM and ECBM exacerbate leakage: On certain datasets, leakage exceeds that of the Vanilla model.
- MCBM eliminates leakage entirely: Under high \(\gamma\), nuisance information is reduced to 0 across all datasets.
- ARCBM and HCBM offer no systematic advantage: Neither consistently outperforms standard CBM in controlling leakage.
- Trade-off: MCBM incurs a slight decrease in task accuracy, as task-relevant nuisance \(\mathbf{n}_y\) is also excluded.
Highlights & Insights¶
- Fundamental critique of CBMs: The paper demonstrates that CBMs are, in a principled sense, misnamed — a true bottleneck has never existed in these models.
- Natural introduction of the information bottleneck: The condition \(I(Z_j; X | C_j) = 0\) precisely formalizes the requirement that \(z_j\) encodes only its corresponding concept.
- Theoretical analysis of CBM intervention failures (Section 5): The paper proves that the intervention assumptions underlying CBMs are probabilistically ill-founded.
- Practical KL divergence regularization: Under Gaussian assumptions, the regularization reduces to a simple MSE loss.
- Visualization of disentangled representations: In MCBM's representation space, samples sharing the same concept value cluster tightly together.
Limitations & Future Work¶
- There is an inherent trade-off between task accuracy and concept purity: when the concept set is incomplete, excluding nuisance necessarily reduces performance.
- Concept annotations are required — a limitation shared with all CBM-based methods.
- Handling of continuous concepts relies on Gaussian assumptions.
- The hyperparameter \(\gamma\) requires tuning to balance disentanglement and task performance.
- Validation on larger-scale models and more complex tasks remains to be conducted.
Related Work & Insights¶
- CBM Variants: CEM (concept embeddings), HCBM (hard bottleneck), ARCBM (autoregressive), SCBM (stochastic)
- Information Leakage Analysis: Margeloiu et al. 2021; Parisini et al. 2025
- Information Bottleneck: Tishby et al. 2000; Alemi et al. 2016 (Variational Information Bottleneck)
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — A fundamental reexamination of the CBM paradigm.
- Technical Depth: ⭐⭐⭐⭐ — Rigorous information-theoretic formalization with clear variational derivations.
- Experimental Thoroughness: ⭐⭐⭐⭐ — Five datasets, 8+ competing methods, and multi-faceted analysis.
- Value: ⭐⭐⭐⭐ — Provides a principled solution toward genuinely interpretable concept-based models.