ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts¶
Conference: NeurIPS 2025 arXiv: 2510.26186 Code: GitHub Area: Human Understanding Keywords: Dataset Bias, Sparse Autoencoder, Visual Concepts, Bias Detection, Interpretability
TL;DR¶
This paper proposes ConceptScope, a framework that trains sparse autoencoders (SAE) on representations from visual foundation models to automatically discover and quantify visual concept biases in datasets, categorizing concepts into target / context / bias without any manual annotation.
Background & Motivation¶
Biases in machine learning datasets — such as high correlations between specific categories and specific backgrounds — are pervasive and degrade model generalization. For example, approximately 75% of "leatherback turtle" images in ImageNet are photographed on beaches, while only 15% are underwater. Existing methods either rely on costly human annotation or on descriptive text generated by VLMs; however, natural language descriptions suffer from inconsistent granularity and synonym substitution, making structured extraction of visual concepts difficult. This paper aims to construct a fully automatic and scalable framework for dataset bias analysis.
Method¶
Overall Architecture¶
ConceptScope operates in two stages: 1. Concept dictionary construction: An SAE is trained on intermediate-layer token embeddings of a pretrained visual encoder (CLIP-ViT-L/14) to disentangle dense representations into sparse, interpretable concepts. 2. Concept classification: Each concept is categorized as target, context, or bias based on semantic relevance and statistical frequency.
Key Designs¶
Sparse Autoencoder (SAE) Training: Given an image \(x\), patch-level token embeddings \(\mathbf{z} = \{z_1, \ldots, z_l\}\) are extracted. The SAE encode-decode process is:
where \(\phi\) denotes the ReLU activation, \(W_{\text{enc}} \in \mathbb{R}^{d \times d'}\), and \(d'\) is much larger than \(d\) (expansion factor of 16 or 32).
Concept Classification — Alignment Score: Two metrics, necessity \(N(c,y)\) and sufficiency \(S(c,y)\), are defined to measure the drop in prediction confidence upon removing concept \(c\) and the predictive capacity when retaining only \(c\), respectively:
Their average yields the alignment score \(A(c,y) = \frac{N(c,y) + S(c,y)}{2}\). A concept is classified as a target concept when \(A(c,y) \geq \mu_y^{\text{align}} + \alpha \times \sigma_y^{\text{align}}\); otherwise it is treated as a context concept.
Bias Concept Identification: After excluding target concepts, the concept strength of each context concept is computed as \(\tilde{f}_{c,y} = \text{avg}_{\mathbf{z} \in Z_y}(f(\mathbf{z})_c)\). A concept is identified as a bias concept when \(\tilde{f}_{c,y} \geq \mu^{c.s.} + \sigma^{c.s.}\).
Loss & Training¶
The SAE training loss combines reconstruction loss with an L1 sparsity penalty:
Key Experimental Results¶
Main Results¶
Concept prediction performance (binary classification accuracy, F1 / AUPRC, across 6 annotated datasets):
| Method | Caltech101 | DTD | Waterbird | CelebA | RAF-DB | Stanford40 | Avg. |
|---|---|---|---|---|---|---|---|
| BLIP-2 | 0.64 | 0.38 | 0.37 | 0.27 | 0.24 | 0.66 | 0.43 |
| LLaVA-NeXT | 0.61 | 0.40 | 0.57 | 0.62 | 0.45 | 0.80 | 0.58 |
| ConceptScope | 0.83 | 0.57 | 0.78 | 0.81 | 0.55 | 0.78 | 0.72 |
Bias discovery task (Precision@10):
| Method | Waterbirds | CelebA | NICO++(75) | NICO++(90) | NICO++(95) |
|---|---|---|---|---|---|
| DOMINO | 90.0% | 87.0% | 24.0% | 24.0% | 24.0% |
| FACTS | 100.0% | 100.0% | 55.0% | 60.8% | 61.0% |
| ConceptScope | 100.0% | 100.0% | 72.9% | 73.1% | 74.0% |
Ablation Study¶
- SAE spatial attribution segmentation precision: AUPRC reaches 0.399 on ADE20K, significantly outperforming BLIP-2 (0.098) and LLaVA-NeXT (0.302).
- Pearson correlation between SAE activation values and CLIP similarity: \(r = 0.71\); Spearman \(\rho = 0.65\).
- Standard deviation across SAEs trained with four random seeds is below 0.01, demonstrating framework robustness.
Key Findings¶
- Previously unannotated biases are discovered in ImageNet-1K: e.g., "necklace" frequently co-occurs with the "mannequin" category, and the "bridegroom" category is highly correlated with East Asian cultural scenes.
- An average of 2.45 bias concepts are detected per category.
- Model robustness diagnostic experiments show that the high-target + high-bias group achieves the highest accuracy and the low-target + low-bias group the lowest, a trend consistent across all 34 evaluated models.
Highlights & Insights¶
- Fully automatic and unsupervised: Dataset biases are discovered without manual annotation; once the SAE is trained, it transfers to other datasets.
- The three-way concept classification (target / context / bias) is both theoretically grounded and practically useful.
- Bias discovery Precision@10 on NICO++ improves over the previous SOTA (ViG-Bias) by approximately 10 percentage points.
- The framework is extensible to multi-label settings (MS-COCO).
Limitations & Future Work¶
- Concepts are constrained by the knowledge scope of CLIP representations; domain-specific datasets (e.g., medical imaging) require retraining the SAE.
- Segmentation masks are patch-level (16×16), limiting localization precision.
- Performance on domain-specific attributes (e.g., emotion, texture) is weaker than on general attributes.
Related Work & Insights¶
- Unlike methods such as SpLiCE, ConceptScope requires no predefined bias categories to perform automatic discrimination.
- The successful application of SAEs in LLM interpretability is transferred to the visual domain.
- This work inspires inquiry into whether ConceptScope could be applied to automatic dataset cleaning or active learning sample selection.
Rating¶
- ⭐ Novelty: 4/5 — Applying SAEs to visual dataset bias analysis represents the first systematic exploration of this approach.
- ⭐ Experimental Thoroughness: 5/5 — Covers 6 attribute datasets + 3 bias benchmarks + multiple real-world datasets + robustness analysis across 34 models.
- ⭐ Writing Quality: 4/5 — Well-structured with rigorous concept definitions.
- ⭐ Value: 4/5 — Provides a practical tool for dataset auditing and model diagnostics.