Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis¶

Paper Info¶

Conference: ICCV 2025
arXiv: 2411.00144
Code: Project Page
Area: 3D Vision
Keywords: 3D Gaussian Splatting, Few-Shot Novel View Synthesis, Self-Ensembling Learning, Uncertainty-Aware Perturbation

TL;DR¶

SE-GS dynamically generates diverse 3DGS models during training via an uncertainty-aware perturbation strategy, and leverages a self-ensembling mechanism to allow the Σ-model to aggregate information from perturbed models, effectively mitigating overfitting under sparse-view settings and achieving state-of-the-art few-shot novel view synthesis performance across multiple datasets.

Background & Motivation¶

3D Gaussian Splatting (3DGS) excels at novel view synthesis but is prone to overfitting when trained with sparse views:

Severe Overfitting: Experiments show (Fig. 2) that training-set performance improves continuously with iterations, while test-set performance begins to degrade after approximately 2,000 iterations; overfitting is more pronounced in the 3-view setting.

Limitations of Prior Work: - Depth-prior-based methods (DNGaussian, FSGS) introduce noisy depth estimates and may actually hurt performance as the number of training views increases. - Multi-model regularization (CoR-GS) incurs high computational cost and lacks sufficient diversity among models.

Potential of Ensemble Learning: Ensemble learning has been proven effective at alleviating overfitting in detection and segmentation tasks, yet efficient ensembling within 3DGS remains unexplored.

Method¶

Overall Architecture¶

SE-GS jointly trains two models: - Δ-model: Trained normally on available training images and dynamically perturbed to produce diverse model variants. - Σ-model: Achieves self-ensembling by minimizing discrepancy with the perturbed models; used at inference time.

Key Design 1: Uncertainty-Aware Perturbation¶

Naïve global random perturbation causes the model to deviate too far, leading to instability. SE-GS applies targeted perturbation via the following steps:

1. Create Pseudo-Views: Generate \(M\) pseudo-views via spherical linear interpolation (SLERP) between training views:

\[\hat{\mathbf{R}} = \text{SLERP}(\mathbf{R}_1, \mathbf{R}_2, \beta)\]

2. Compute Uncertainty Maps: Store pseudo-view renderings from different training steps in a buffer and compute pixel-level uncertainty:

\[\mathbf{U} = \sqrt{\frac{1}{S}\sum_{i=1}^S (\mathbf{I}_i - \bar{\mathbf{I}})^2}\]

followed by local smoothing with \(k=5\) to obtain \(\hat{\mathbf{U}}\).

3. Selective Perturbation: Perturb only Gaussians whose projected regions overlap with high-uncertainty pixels:

\[\hat{G}_\Delta^t = G_\Delta^t + \delta_t \cdot h(G_\Delta^t, \hat{\mathcal{U}}^t)\]

where the indicator function \(h\) checks whether the maximum uncertainty in a Gaussian's projected region exceeds threshold \(\tau\). A 6D continuous rotation representation is used for perturbation to ensure continuity.

Key Design 2: Self-Ensembling Regularization¶

The Σ-model is trained normally on training views while being regularized to maintain consistency with the perturbed models:

\[\mathcal{L}_r = (1-\lambda)\|\mathbf{I}_\Sigma^t - \mathbf{I}_\Delta^t\|_1 + \lambda\mathcal{L}_{\text{D-SSIM}}(\mathbf{I}_\Sigma^t, \mathbf{I}_\Delta^t)\]

where \(\lambda=0.2\). The regularization is applied on pseudo-views in a self-supervised manner, requiring no additional ground-truth supervision.

Loss & Training¶

\[\mathcal{L} = \mathcal{L}_{\text{RGB}} + \gamma\mathcal{L}_r\]

where \(\gamma=1\) and \(\mathcal{L}_{\text{RGB}}\) is the photometric loss on training views.

Key Advantages¶

Perturbed models are derived from the Δ-model rather than trained from scratch, making the computational overhead negligible.
Uncertainty is computed in 2D rendering space, naturally accommodating changes in the number of Gaussians during training.
Regularization is applied on pseudo-views, independent of external information such as depth.

Key Experimental Results¶

Main Results: LLFF Dataset¶

Method	3-view PSNR	6-view PSNR	9-view PSNR
3DGS	19.22	23.80	25.44
DNGaussian	19.12	22.01	22.62
FSGS	20.43	24.09	25.31
CoR-GS	20.45	24.49	26.06
SE-GS	20.79	24.78	26.36

SE-GS achieves the best PSNR across 3/6/9-view settings, and also leads comprehensively on SSIM and LPIPS.

Ablation Study: Perturbation Strategy Comparison¶

Perturbation	PSNR	SSIM	LPIPS
No perturbation (cross-model only)	baseline	-	-
Global random perturbation	degraded	-	-
Uncertainty-aware perturbation	highest	highest	lowest

Naïve global perturbation degrades performance due to excessive deviation.
Uncertainty-aware selective perturbation significantly outperforms other strategies.

Key Findings¶

Buffer size \(S=5\) and pseudo-view count \(M=10\) are the optimal configurations.
SE-GS is more efficient and effective than explicit ensembling of \(k\) independently trained models.
SE-GS maintains substantial gains over vanilla 3DGS even as the number of training views increases (e.g., 9 views).
Unlike depth-prior-based methods, SE-GS does not suffer from performance degradation as more views are added.

Highlights & Insights¶

First introduction of self-ensembling into 3DGS: Cleverly exploits uncertainty signals emerging from training dynamics.
Computationally efficient: Incurs negligible additional training cost compared to explicit multi-model ensembling.
Plug-and-play: Orthogonal to depth-prior methods and can be combined with them.
Self-supervised regularization: Requires no external ground-truth depth or generated novel-view images.

Limitations & Future Work¶

Still requires a certain number of initial SfM points for 3DGS initialization.
Pseudo-views must lie within the interpolation range of training views, limiting effectiveness on extrapolated scenes.
Uncertainty estimation may be unreliable in extremely sparse settings (e.g., 1–2 images).

3DGS: Explicit point-based representation supporting real-time rendering.
DNGaussian / FSGS: Leverage monocular depth priors to address sparse-view challenges.
CoR-GS: Trains multiple 3DGS models for cross-regularization.
Ensemble Learning: Improves robustness by aggregating multi-model predictions; related to temporal ensembling and consistency regularization.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of uncertainty-aware perturbation and self-ensembling is novel.
Practicality: ⭐⭐⭐⭐ — Low overhead, no external data required, consistent improvements across datasets.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensively validated on LLFF/DTU/Mip-NeRF360/MVImgNet with extensive ablations.
Writing Quality: ⭐⭐⭐⭐ — Clear motivation and rigorous methodological derivation.