4C4D: 4 Camera 4D Gaussian Splatting¶
Conference: CVPR 2026 arXiv: 2604.04063 Code: Project Page Area: 3D Vision Keywords: 4D Gaussian Splatting, Sparse-View, Dynamic Scene Reconstruction, Neural Decaying Function, Geometry-Appearance Balance
TL;DR¶
This paper proposes the 4C4D framework, which employs a Neural Decaying Function to adaptively control Gaussian opacity decay, addressing the geometry–appearance learning imbalance in sparse-view (only 4 cameras) 4D Gaussian Splatting, and achieves state-of-the-art performance across multiple benchmarks.
Background & Motivation¶
Background: Novel view synthesis for 4D dynamic scenes typically requires dense camera arrays (tens to hundreds of cameras), severely limiting practical deployment. 3DGS/4DGS perform well under dense-view settings.
Limitations of Prior Work: Under extremely sparse views (e.g., 4 cameras), 4DGS fails significantly. The root cause is an optimization bias: fitting appearance (color) is relatively easy, whereas recovering accurate geometry (depth) under insufficient supervision is extremely difficult. Existing Gaussian formulations cannot balance the two objectives.
Key Challenge: Insufficient spatial supervision under sparse views → inadequate geometry learning → overfitting to training-view appearance → severe artifacts in novel views.
Key Observation: 4DGS can accurately reproduce appearance at training viewpoints but produces severely degraded depth geometry (see Fig. 3), indicating that the problem lies in optimization bias rather than model capacity.
Core Idea: Introduce a learnable opacity decaying function to redirect optimization gradients toward geometry learning.
Method¶
Overall Architecture¶
4D Gaussian primitives + Neural Decaying Function \(f_\theta\) + visibility-detection-based decoupled decaying strategy → jointly optimized via photometric rendering loss.
Key Designs¶
- Neural Decaying Function: A lightweight neural network that takes Gaussian attributes (position, opacity, rotation) as input and predicts a decaying factor \(\tau\): $\(\tau = f_\theta(x, y, z, o, r)\)$ The final opacity is: $\(o(\tilde{t}) = \tau \cdot \exp\left(-\frac{1}{2}\frac{(\tilde{t} - \mu_t)^2}{\Sigma_{4,4}}\right) \cdot o\)$
-
Design Motivation: Opacity is a critical parameter for geometry learning in 4DGS. By modulating opacity via a neural network, additional learnable degrees of freedom are introduced, causing gradients to flow more toward geometric parameters (position, scale, etc.) rather than simply minimizing appearance error. This rebalances the optimization of geometry and appearance.
-
Visibility-Detection-Based Decoupled Decaying Strategy:
- Key Issue: Gradients only exist for Gaussians visible at the current viewpoint/timestep. Applying the same decay to invisible Gaussians would distort the optimization.
- Visibility Detection: \(G_m = Z_V(\tilde{v}, \sigma, Z_T(\tilde{t}, s_t, G))\)
- \(Z_V\): Spatial visibility (filters out Gaussians whose centers fall outside the current viewpoint)
- \(Z_T\): Temporal visibility (filters out Gaussians whose temporal span does not include the current timestep)
- Decoupled Strategy: $\(\tau(g) = \begin{cases} f_\theta(x,y,z,o,r) & \text{if } g \in G_m \\ \beta=0.999 & \text{if } g \in G_m^* \end{cases}\)$
- Design Motivation: Visible Gaussians require precise decay learning; invisible Gaussians are stabilized with a small constant decay (consistent with observations in AbsGS).
Loss & Training¶
- Standard photometric rendering loss (L1 + SSIM)
- Neural Decaying Function and 4D Gaussians are jointly optimized
- 4D Gaussians inherit temporal attributes (\(\mu_t\), \(s_t\)) and 4D spherical harmonic coefficients from 4DGS
Key Experimental Results¶
Main Results¶
| Dataset | Metric | 4C4D | 4DGS | 4DGaussians | Ex4DGS | Gain (vs. best) |
|---|---|---|---|---|---|---|
| Neural3DV | PSNR↑ | 22.29 | 20.60 | 20.82 | 19.33 | +1.47 |
| Neural3DV | LPIPS↓ | 0.146 | 0.244 | 0.190 | 0.239 | +23.2% |
| ENeRF-Outdoor | PSNR↑ | 24.32 | 23.52 | 18.21 | 21.89 | +0.80 |
| ENeRF-Outdoor | LPIPS↓ | 0.121 | 0.151 | 0.456 | 0.263 | +19.9% |
| Mobile-Stage | PSNR↑ | 22.36 | 22.15 | 20.15 | 17.85 | +0.21 |
| Mobile-Stage | LPIPS↓ | 0.121 | 0.180 | 0.226 | 0.260 | +32.8% |
Ablation Study¶
| Configuration | PSNR↑ | DSSIM1↓ | LPIPS↓ | Note |
|---|---|---|---|---|
| w/o Neural Decaying | 22.60 | 0.097 | 0.147 | Fixed decay insufficient |
| w/o Visibility Detection | 24.49 | 0.075 | 0.127 | No visible/invisible separation |
| Full Model | 24.68 | 0.070 | 0.115 | Both components are complementary |
| Constant Decay | 24.31 | 0.075 | 0.125 | Inferior to learnable decay |
| Exponential Decay | 24.32 | 0.077 | 0.135 | Hand-crafted function suboptimal |
| Power Decay | 24.35 | 0.074 | 0.124 | Still inferior to neural network |
| Neural Decay (Ours) | 24.68 | 0.070 | 0.115 | Automatically learns optimal strategy |
Key Findings¶
- High-fidelity 4D reconstruction is achievable with only 4 cameras, substantially lowering the acquisition barrier compared to dense-array methods
- Neural decay vs. constant decay: PSNR +0.37, LPIPS −0.010, demonstrating the advantage of adaptive over fixed decay
- Contribution of visibility decoupling: PSNR +0.19, LPIPS −0.012
- The most significant improvements are observed on the outdoor complex scene benchmark (ENeRF-Outdoor)
- The self-collected dataset Dyn4Cam (4 GoPro cameras, <$1,500) enables temporally consistent 4D dynamic capture
Highlights & Insights¶
- Rigorous problem analysis: The paper clearly identifies the root cause of 4DGS failure under sparse views as a geometry–appearance optimization imbalance, rather than insufficient model capacity.
- Elegant solution: Without modifying the 3D representation, gradient flow is redirected solely through opacity modulation, yielding significant gains.
- High practical value: 4D capture with 4 GoPro cameras (<$1,500) is genuinely accessible to consumer-level users.
- Decoupled treatment of visible and invisible Gaussians is a subtle yet critically important design detail.
Limitations & Future Work¶
- Four viewpoints still leave occlusion blind spots, making scenes with severe self-occlusion difficult to handle.
- The Neural Decaying Function introduces additional computational overhead.
- Integration with monocular depth priors (e.g., MiDaS) remains unexplored.
- Dyn4Cam does not support quantitative evaluation due to the absence of ground-truth test viewpoints.
Related Work & Insights¶
- The constant decay for invisible Gaussians is consistent with observations reported in AbsGS.
- The view-inflated attention mechanism of MVDream may also be applicable to this setting.
- The "gradient redirection" paradigm could be extended to other imbalanced optimization problems (e.g., sparse-view NeRF).
Rating¶
- Novelty: ⭐⭐⭐⭐ Core idea is clear but not fundamentally disruptive
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ Four datasets, detailed ablations, and a self-collected dataset — very solid
- Writing Quality: ⭐⭐⭐⭐ Problem analysis is well-articulated and the method is presented concisely
- Value: ⭐⭐⭐⭐ Consumer-level 4D capture carries significant practical significance