Deep Incomplete Multi-View Clustering via Hierarchical Imputation and Alignment¶

Conference: AAAI 2026 arXiv: 2601.09051 Code: Available Area: LLM Evaluation Keywords: Incomplete multi-view clustering, hierarchical imputation, energy-based model, contrastive alignment, missing views

TL;DR¶

This paper proposes DIMVC-HIA, a deep incomplete multi-view clustering framework that integrates hierarchical imputation with dual alignment. The method first imputes missing cluster assignments and then imputes missing features in a coarse-to-fine manner, maintaining robust performance under high missing rates (up to 70%).

Background & Motivation¶

Multi-view clustering (MVC) integrates heterogeneous multi-source data to provide richer insights than single-view approaches. However, the assumption that all views are fully observable rarely holds in practice—sensor failures, data corruption, and transmission errors commonly result in missing views.

Existing incomplete multi-view clustering (IMVC) methods fall into two categories:

Imputation-based methods: Reconstruct missing views but suffer from error propagation—poor imputation distorts structural patterns, which in turn degrades subsequent imputation quality, forming a vicious cycle.

Imputation-free methods: Learn a shared latent space directly from available views, but face challenges including instance misalignment, information imbalance, and representation uncertainty under high missing rates.

Core challenge: How to accurately impute missing views without introducing bias, while preserving cross-view semantic consistency and intra-cluster compactness?

Method¶

Overall Architecture¶

DIMVC-HIA comprises four key components:

View-specific Autoencoders: Independent encoder-decoder per view with a shared cluster predictor
Hierarchical Imputation Module: Imputes cluster assignments first, then latent features (coarse-to-fine)
Energy-Based Semantic Alignment Module: Enhances intra-cluster compactness via EBM
Contrastive Assignment Alignment Module: Enhances cross-view consistency and clustering confidence

Key Designs¶

1. View Feature Learning and Prediction

Each view \(v\) uses an independent AutoEncoder to extract latent features:

\[H_v = E_v(X_v; \phi_v^e), \quad \hat{X}_v = D_v(H_v; \phi_v^d)\]

Latent features are mapped to soft cluster assignments via a shared cluster predictor: \(Q_v(i) = F(H_v(i); \vartheta) \in \mathbb{R}^K\).

2. Hierarchical Imputation — Core Innovation

First Level: Cluster Assignment Imputation (Core of Coarse-to-Fine)

Core insight: The soft cluster assignment space encodes richer semantic information than raw features and more directly reflects the underlying cluster structure. Cross-view information transfer is therefore performed first in the assignment space.

Steps: - For each co-observed view pair \((v, v')\), extract soft cluster assignments of co-observed samples - Compute cross-view similarity matrix: \(S_{v,v'} = Q_v^{v,v'} (Q_{v'}^{v',v})^\top\) - Compute inter-view semantic alignment scores using label-aware contrastive similarity (excluding false negatives—samples from the same cluster but different indices) - Rank reference views \(\mathcal{R}_v\) in descending order of similarity - For each missing sample, select the most semantically aligned available view for assignment imputation

\[Q_v^*(i) = \begin{cases} Q_v(i), & \text{if } G(i,v) = 1 \\ Q_{\pi_v^i}(i), & \text{otherwise} \end{cases}\]

Second Level: Latent Feature Imputation

Based on the imputed cluster assignments, missing features are reconstructed using intra-cluster statistics: - Determine the most probable cluster label for each missing sample: \(\hat{y}_v(i) = \arg\max_k Q_v^*(i, k)\) - Compute cluster prototypes (mean latent features of all available samples in the cluster) - Use cluster prototypes as imputed values for missing features

\[H_v^*(i) = \begin{cases} H_v(i), & \text{if } G(i,v) = 1 \\ \mathcal{C}_v(\hat{y}_v(i)), & \text{otherwise} \end{cases}\]

3. Energy-Based Semantic Alignment (EBM)

A view-shared energy function \(\mathcal{E}_{\theta_k}: \mathbb{R}^d \to \mathbb{R}^+\) is defined for each cluster, where lower energy indicates stronger compatibility with the cluster.

An "anchor" (the most reliable feature with minimum energy) is identified per cluster, and all intra-cluster features are encouraged to converge toward this anchor's energy level:

\[\mathcal{L}_\text{EBM}^k = \frac{1}{|\mathcal{H}_k|} \sum_{\mathbf{h} \in \mathcal{H}_k} |\mathcal{E}_{\theta_k}(\mathbf{h}) - \varepsilon_\text{min}^k|\]

Unlike conventional centroid-distance regularization, EBM allows flexible shaping of a continuous energy landscape.

4. Contrastive Assignment Alignment (CAA)

Two sub-objectives are included:

Contrastive alignment loss: Pulls the cluster assignment distributions of the same sample across different views closer together
Entropy regularization: Promotes balanced cluster assignments and prevents degenerate solutions (all samples assigned to one cluster)

\[\mathcal{L}_\text{CAA} = \frac{1}{2}\sum_v \sum_{v' \neq v} [\text{sim}(v,v') \cdot \mathcal{L}_\text{ca}^{v,v'} + \mathcal{L}_\text{reg}^{v,v'}]\]

Here \(\text{sim}(v, v')\) serves as an adaptive weight, assigning higher alignment weight to view pairs with greater semantic alignment.

Loss & Training¶

Overall objective: \(\mathcal{L} = \mathcal{L}_\text{REC} + \alpha \cdot \mathcal{L}_\text{EBM} + \beta \cdot \mathcal{L}_\text{CAA}\)

AutoEncoders are pre-trained independently for 100 epochs (with \(\mathcal{L}_\text{REC}\) only)
Joint fine-tuning for 200 epochs (full loss)
\(\alpha = 0.1\), \(\beta = 0.01\) (fixed across all datasets)
lr = 0.0001, NVIDIA RTX 3080

Key Experimental Results¶

Main Results¶

Table 1: Clustering Performance at Various Missing Rates (ACC/NMI/PUR)

Dataset	Method	η=0.1 ACC	η=0.3 ACC	η=0.5 ACC	η=0.7 ACC
BDGP	DSIMVC	98.00	96.08	93.56	91.12
BDGP	DIMVC-HIA	98.40	96.25	95.16	92.32
MNIST-USPS	DCG	99.05	97.48	96.09	92.58
MNIST-USPS	DIMVC-HIA	99.10	97.54	96.48	93.66
Fashion	ProImp	96.26	93.48	91.01	86.74
Fashion	DIMVC-HIA	98.84	97.16	96.51	95.27
Handwritten	GIMVC	92.14	93.58	90.73	86.10
Handwritten	DIMVC-HIA	96.85	96.35	95.15	94.05

Table 2: Best Baseline Comparison at High Missing Rate η=0.7

Dataset	Best Baseline	Baseline ACC	DIMVC-HIA ACC	Gain
BDGP	PMIMC	91.72	92.32	+0.60
MNIST-USPS	ProImp	93.42	93.66	+0.24
Fashion-MNIST	ProImp	86.74	95.27	+8.53
Handwritten	GIMVC	86.10	94.05	+7.95

Ablation Study¶

Impact of removing each component on Fashion-MNIST (η=0.5):

Removing \(\mathcal{L}_\text{CAA}\): Largest performance drop, confirming that contrastive assignment alignment is the most critical component
Removing \(\mathcal{L}_\text{EBM}\): Notable performance degradation, confirming the importance of energy-based alignment
Removing \(\mathcal{L}_\text{REC}\): Performance drop, indicating that reconstruction loss is indispensable for stable training

Hyperparameter sensitivity analysis: Performance remains stable without significant fluctuation within \(\alpha \in [0.01, 0.10]\) and \(\beta \in [0.01, 0.05]\).

Key Findings¶

More pronounced advantage at high missing rates: DIMVC-HIA outperforms ProImp by 8.53% on Fashion-MNIST at η=0.7, demonstrating the effectiveness of hierarchical imputation under severe missing scenarios
Minimal accuracy degradation: On BDGP, accuracy drops only from 98.40 to 92.32 (~6%) as η increases from 0.1 to 0.7, compared to nearly 7% for DSIMVC
Stable convergence: Loss curves converge smoothly after a rapid decrease in the first 25 epochs

Highlights & Insights¶

Elegant hierarchical imputation design: Imputing cluster assignments at the semantic level first, then using the imputed assignments to guide feature-level imputation in a coarse-to-fine manner, avoiding the noise issues associated with direct feature imputation
Label-aware contrastive similarity: Excluding false negatives (same-cluster samples with different indices) improves the quality of cross-view semantic matching
EBM as an alternative to distance regularization: Modeling intra-cluster compactness via an energy landscape rather than simple centroid-distance constraints offers greater flexibility
Bridging imputation and imputation-free paradigms: Combines the advantages of both approaches—performing imputation while mitigating imputation errors through alignment

Limitations & Future Work¶

Cluster prototype imputation is non-parametric and may cause all missing samples to collapse toward cluster centroids in the feature space, limiting diversity
Experiments are conducted on only 4 standard benchmarks with relatively small data scales (up to 10,000 samples)
Scalability with increasing numbers of views is not discussed (at most 6 views evaluated)
EBM maintains independent energy functions per cluster, which may increase computational burden when the number of clusters \(K\) is large
Missing patterns are assumed to be missing completely at random (MCAR); missing not at random (MNAR) scenarios are not considered

Compared to SOTA methods such as DSIMVC and ProImp, DIMVC-HIA's hierarchical imputation strategy demonstrates clear advantages at high missing rates
The application of EBM to clustering is relatively novel and transferable to other unsupervised learning tasks
The false negative exclusion strategy in label-aware contrastive learning is worth adopting in other contrastive learning settings

Rating¶

Novelty: ⭐⭐⭐⭐ — The hierarchical strategy of imputing assignments before features is novel; the application of EBM to clustering is innovative
Technical Depth: ⭐⭐⭐⭐ — Four well-designed components with clear mathematical derivations
Experimental Thoroughness: ⭐⭐⭐ — Four datasets with four missing rates, though data scales are relatively small
Value: ⭐⭐⭐ — Practically meaningful for multimodal data fusion scenarios, though computational overhead may limit large-scale applications