Skip to content

Nonparametric Identification of Latent Concepts

Conference: ICML2025
arXiv: 2510.00136
Code: To be confirmed
Area: Concept learning
Keywords: Concept learning, Identifiability, Nonparametric, Latent variables, Disentangled representation learning

TL;DR

This work proposes the first theoretical framework for nonparametric concept identifiability. It proves that hidden concepts can be identified (up to component-wise transformation and permutation uncertainty) purely through the diversity of multi-class observations, without assuming concept types, functional relationships, or parametric generative models.

Background & Motivation

Humans naturally possess the ability to learn concepts by comparing observations from different categories. For example, children identify unique concepts such as "predator," "streamlined body shape," and "shell" by comparing the differences between sharks and turtles. This cognitive mechanism has been widely validated in psychology and neuroscience.

In machine learning, extracting concepts from data is crucial for interpretability and generalization. Although there has been significant empirical success, there remains a lack of general theoretical guarantees:

  • Linear assumptions: Assuming concepts are linearly related to representations, identifying them up to linear transformations (Rajendran et al., 2024).
  • Object-centric learning: Assuming occlusion-free or additive generation processes (Brady et al., 2023; Lachapelle et al., 2023).
  • These constraints limit the explanation of empirical success in real-world scenarios.

Core Problem: Under general settings, which concepts can be reliably recovered with theoretical guarantees?

Method

Problem Setting

Observed variables \(\mathbf{x} \in \mathbb{R}^m\) are generated from latent concepts \(\mathbf{z} = (\mathbf{z}_A, \mathbf{z}_B) \in \mathbb{R}^n\) via an unknown diffeomorphism:

\[\mathbf{x} \coloneqq f(\mathbf{z})\]

where:

  • \(\mathbf{z}_A\): Class-dependent concepts, conditioned on the observed class variable \(\mathbf{c}\).
  • \(\mathbf{z}_B\): Class-independent concepts (e.g., lighting, temperature).
  • Conditional independence: \(p(\mathbf{z}|\mathbf{c}) = p(\mathbf{z}_A|\mathbf{c})p(\mathbf{z}_B)\)
  • Connection structure \(M\): A binary adjacency matrix encoding the dependency relationships between classes and concepts.
\[p(\mathbf{z}_A|\mathbf{c}) = \prod_{i=1}^{n_A} p(\mathbf{z}_i | M_{i,\cdot} \odot \mathbf{c})\]

Theorem 1: Local Comparison Learning (Pairwise Comparison)

For any pair of classes \(\mathbf{c}_i\) and \(\mathbf{c}_j\), under mild non-degenerate sample space conditions (linear independence of the Jacobian support spaces), there exists a permutation \(\pi\) such that the unique concepts of each class are disentangled from other concepts:

\[\frac{\partial \hat{\mathbf{z}}_{\pi(A_i \setminus A_j)}}{\partial \mathbf{z}_{A_j}} = 0, \quad \frac{\partial \hat{\mathbf{z}}_{\pi(A_j \setminus A_i)}}{\partial \mathbf{z}_{A_i}} = 0\]

Corollary 1: Generalizes to local comparisons of arbitrary class subsets, achieving partial identifiability—even if global conditions are not met, as long as there is sufficient local diversity, as many concepts as possible can still be identified.

Theorem 2: Global Comparison Learning

Under the structural diversity assumption (Assumption 1), for each class-dependent concept \(\mathbf{z}_i\), there exists a set of classes that allows it to be distinguished from other concepts:

  • Component-wise identifiability: \(\hat{\mathbf{z}}_i = h_i(\mathbf{z}_{\pi(i)})\), where \(h_i\) is an invertible function.
  • Block-identifiability: Class-independent component \(\hat{\mathbf{z}}_B = h(\mathbf{z}_B)\).

The distribution variability condition only requires two classes to have different conditional distributions, which is significantly fewer than the \(2n_A+1\) domains required by prior work.

Proposition 1: Identification of Class-Independent Concepts

Building upon Theorem 2, all concepts can be nonparametrically identified by imposing sparsity conditions on the connection structure between \(\mathbf{z}_B\) and \(\mathbf{x}\).

Proposition 2: Structure Recovery

The latent class-concept connection structure \(M\) can also be recovered (up to a permutation matrix): \(\hat{M} = PM\), and no structural diversity assumption is required.

Estimation Method

A regularized maximum likelihood estimation is adopted:

\[\mathcal{L}(\theta) = \mathbb{E}_{(\mathbf{x},\mathbf{c})} \left[ -\log p_{\hat{f}^{-1}}(\mathbf{x} | \mathcal{M}_{i,:} \odot \mathbf{c}) + \lambda \mathbf{R} \right]\]

where \(\mathbf{R}\) is the \(\ell_1\)-norm regularization of \(\hat{M}\) and \(D_{\hat{\mathbf{z}}}\hat{f}\).

Key Experimental Results

Setting Dataset Metric Ours Base
Class-dependent concepts Synthetic data (various concept numbers) MCC High MCC + Low variance Low MCC + High variance
All concepts Synthetic data (various concept numbers) MCC Significantly outperforms Base Cannot be disentangled
Real data Fashion-MNIST Semantic consistency Identified sleeve length/torso length/shoulder width -
Real data AnimalFace Semantic consistency Identified concepts like Ursidae/monochrome -
Real data Flower102 Cross-environment robustness Consistent identification of the same concept across angles -

Key Findings:

  • When the structural diversity condition is met, the model achieves high MCC with low variance on synthetic data.
  • The Base model (without structural conditions) fails to disentangle most concepts.
  • The concepts identified on real data align semantically with human understanding.
  • The same concept in Flower102 can be identified consistently across different environments/angles.

Highlights & Insights

  1. First general nonparametric framework: Does not limit the type of concepts (linear/additive/disjoint), does not assume a parametric generative model, and relies solely on structural diversity across classes.
  2. Flexible local-to-global guarantees: Thm. 1 and Cor. 1 achieve partial identifiability, providing as many guarantees as possible when global conditions are difficult to satisfy in real-world scenarios.
  3. Highly relaxed conditions: Distribution variability only requires differences across 2 classes (vs. \(2n_A+1\) domains required previously); structure recovery does not even require the structural diversity assumption.
  4. Cognitively inspired: The theoretical framework directly aligns with the human cognitive mechanism of "learning concepts by comparison," scaling from pairwise comparisons to global understanding.
  5. Cross-domain impact: The theory provides insights for disentangled representation learning, causal representation learning, object-centric learning, and structure learning.

Limitations & Future Work

  1. Overlapping concept scenarios: When concepts across all classes highly overlap (e.g., all dog breeds sharing "barking" and "furry"), the structural diversity is not met, still requiring parametric assumptions.
  2. Theory-to-practice gap: Extending the theoretical framework to practical tasks such as compositional generalization, controllable generation, and decision-making remains to be explored.
  3. Limited experimental scale: Real-data experiments primarily focus on small-to-medium-scale image datasets, without validation on foundation models or large-scale data.
  4. Estimation method depends on regularization hyperparameters: The selection of \(\lambda\) and \(\ell_1\) regularization lacks theoretical guidance.
  5. Identification of class-independent concepts requires an additional assumption on the sparsity of the \(\mathbf{z}_B\)-\(\mathbf{x}\) connection structure, reducing its generality.
  • Linear Concept Identification: Rajendran et al. (2024), Reizinger et al. (2024) — identifying up to linear transformations under linear assumptions.
  • Object-Centric Learning: Brady et al. (2023), Lachapelle et al. (2023) — requiring occlusion-free/additive assumptions.
  • Nonlinear ICA: Hyvärinen & Morioka (2016), Khemakhem et al. (2020) — requiring multiple domains/辅助变量.
  • Structure Learning: Shimizu et al. (2006), Zheng et al. (2022) — recovering DAGs from exogenous noise.

Rating

  • Novelty: ⭐⭐⭐⭐⭐ — First nonparametric concept identifiability theory, with a novel partial identifiability approach via local comparison.
  • Experimental Thoroughness: ⭐⭐⭐ — Disentanglement theory is validated by synthetic experiments with qualitative demonstrations on real data, though it lacks large-scale quantitative evaluation.
  • Writing Quality: ⭐⭐⭐⭐⭐ — Smooth narrative from cognitive science to mathematical formalization, rich in examples.
  • Value: ⭐⭐⭐⭐ — Provides a theoretical foundation for the empirical success of concept learning, with broad cross-domain impact.