Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis¶
Paper Info¶
- Conference: ICCV 2025
- arXiv: 2411.00144
- Code: Project Page
- Area: 3D Vision
- Keywords: 3D Gaussian Splatting, Few-Shot Novel View Synthesis, Self-Ensembling Learning, Uncertainty-Aware Perturbation
TL;DR¶
SE-GS dynamically generates diverse 3DGS models during training via an uncertainty-aware perturbation strategy, and leverages a self-ensembling mechanism to allow the Σ-model to aggregate information from perturbed models, effectively mitigating overfitting under sparse-view settings and achieving state-of-the-art few-shot novel view synthesis performance across multiple datasets.
Background & Motivation¶
3D Gaussian Splatting (3DGS) excels at novel view synthesis but is prone to overfitting when trained with sparse views:
Severe Overfitting: Experiments show (Fig. 2) that training-set performance improves continuously with iterations, while test-set performance begins to degrade after approximately 2,000 iterations; overfitting is more pronounced in the 3-view setting.
Limitations of Prior Work: - Depth-prior-based methods (DNGaussian, FSGS) introduce noisy depth estimates and may actually hurt performance as the number of training views increases. - Multi-model regularization (CoR-GS) incurs high computational cost and lacks sufficient diversity among models.
Potential of Ensemble Learning: Ensemble learning has been proven effective at alleviating overfitting in detection and segmentation tasks, yet efficient ensembling within 3DGS remains unexplored.
Method¶
Overall Architecture¶
SE-GS jointly trains two models: - Δ-model: Trained normally on available training images and dynamically perturbed to produce diverse model variants. - Σ-model: Achieves self-ensembling by minimizing discrepancy with the perturbed models; used at inference time.
Key Design 1: Uncertainty-Aware Perturbation¶
Naïve global random perturbation causes the model to deviate too far, leading to instability. SE-GS applies targeted perturbation via the following steps:
1. Create Pseudo-Views: Generate \(M\) pseudo-views via spherical linear interpolation (SLERP) between training views:
2. Compute Uncertainty Maps: Store pseudo-view renderings from different training steps in a buffer and compute pixel-level uncertainty:
followed by local smoothing with \(k=5\) to obtain \(\hat{\mathbf{U}}\).
3. Selective Perturbation: Perturb only Gaussians whose projected regions overlap with high-uncertainty pixels:
where the indicator function \(h\) checks whether the maximum uncertainty in a Gaussian's projected region exceeds threshold \(\tau\). A 6D continuous rotation representation is used for perturbation to ensure continuity.
Key Design 2: Self-Ensembling Regularization¶
The Σ-model is trained normally on training views while being regularized to maintain consistency with the perturbed models:
where \(\lambda=0.2\). The regularization is applied on pseudo-views in a self-supervised manner, requiring no additional ground-truth supervision.
Loss & Training¶
where \(\gamma=1\) and \(\mathcal{L}_{\text{RGB}}\) is the photometric loss on training views.
Key Advantages¶
- Perturbed models are derived from the Δ-model rather than trained from scratch, making the computational overhead negligible.
- Uncertainty is computed in 2D rendering space, naturally accommodating changes in the number of Gaussians during training.
- Regularization is applied on pseudo-views, independent of external information such as depth.
Key Experimental Results¶
Main Results: LLFF Dataset¶
| Method | 3-view PSNR | 6-view PSNR | 9-view PSNR |
|---|---|---|---|
| 3DGS | 19.22 | 23.80 | 25.44 |
| DNGaussian | 19.12 | 22.01 | 22.62 |
| FSGS | 20.43 | 24.09 | 25.31 |
| CoR-GS | 20.45 | 24.49 | 26.06 |
| SE-GS | 20.79 | 24.78 | 26.36 |
SE-GS achieves the best PSNR across 3/6/9-view settings, and also leads comprehensively on SSIM and LPIPS.
Ablation Study: Perturbation Strategy Comparison¶
| Perturbation | PSNR | SSIM | LPIPS |
|---|---|---|---|
| No perturbation (cross-model only) | baseline | - | - |
| Global random perturbation | degraded | - | - |
| Uncertainty-aware perturbation | highest | highest | lowest |
- Naïve global perturbation degrades performance due to excessive deviation.
- Uncertainty-aware selective perturbation significantly outperforms other strategies.
Key Findings¶
- Buffer size \(S=5\) and pseudo-view count \(M=10\) are the optimal configurations.
- SE-GS is more efficient and effective than explicit ensembling of \(k\) independently trained models.
- SE-GS maintains substantial gains over vanilla 3DGS even as the number of training views increases (e.g., 9 views).
- Unlike depth-prior-based methods, SE-GS does not suffer from performance degradation as more views are added.
Highlights & Insights¶
- First introduction of self-ensembling into 3DGS: Cleverly exploits uncertainty signals emerging from training dynamics.
- Computationally efficient: Incurs negligible additional training cost compared to explicit multi-model ensembling.
- Plug-and-play: Orthogonal to depth-prior methods and can be combined with them.
- Self-supervised regularization: Requires no external ground-truth depth or generated novel-view images.
Limitations & Future Work¶
- Still requires a certain number of initial SfM points for 3DGS initialization.
- Pseudo-views must lie within the interpolation range of training views, limiting effectiveness on extrapolated scenes.
- Uncertainty estimation may be unreliable in extremely sparse settings (e.g., 1–2 images).
Related Work & Insights¶
- 3DGS: Explicit point-based representation supporting real-time rendering.
- DNGaussian / FSGS: Leverage monocular depth priors to address sparse-view challenges.
- CoR-GS: Trains multiple 3DGS models for cross-regularization.
- Ensemble Learning: Improves robustness by aggregating multi-model predictions; related to temporal ensembling and consistency regularization.
Rating¶
- Novelty: ⭐⭐⭐⭐ — The combination of uncertainty-aware perturbation and self-ensembling is novel.
- Practicality: ⭐⭐⭐⭐ — Low overhead, no external data required, consistent improvements across datasets.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Comprehensively validated on LLFF/DTU/Mip-NeRF360/MVImgNet with extensive ablations.
- Writing Quality: ⭐⭐⭐⭐ — Clear motivation and rigorous methodological derivation.