7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting¶
Conference: ICCV 2025 arXiv: 2503.07946 Code: https://gaozhongpai.github.io/7dgs/ Area: 3D Vision / Dynamic Scene Rendering / 3D Gaussian Splatting Keywords: 7D Gaussian, Dynamic Scene, View-Dependent, Conditional Slicing, Real-Time Rendering
TL;DR¶
This paper proposes 7DGS, which models scene elements as 7-dimensional Gaussian distributions (3D spatial + 1D temporal + 3D view direction). A conditional slicing mechanism converts 7D Gaussians into time- and view-conditioned 3D Gaussians, unifying dynamic scene rendering with view-dependent appearance. On the proposed 7DGS-PBR dataset, 7DGS achieves up to 7.36 dB PSNR gain over 4DGS while using only 15.3% of the Gaussian primitives, with real-time rendering at 401 FPS.
Background & Motivation¶
High-quality real-time rendering of dynamic scenes requires simultaneous modeling along three dimensions: (1) spatial geometry, (2) temporal dynamics, and (3) view-dependent appearance. Existing methods address only subsets of these: 4DGS handles dynamics (space + time) but ignores view-dependent effects; 6DGS handles view dependence (space + direction) but is limited to static scenes. In the real world, all three dimensions are mutually coupled — for instance, specular highlights on a moving object vary simultaneously with object position and viewing direction. No prior method addresses all three in a unified framework.
Core Problem¶
How to simultaneously model spatial, temporal, and view-directional dependencies within a unified framework, enabling real-time rendering of dynamic scenes with view-dependent appearance?
Method¶
Overall Architecture¶
Each scene element is represented as a 7D Gaussian \(\mathcal{N}(\mu, \Sigma)\), where \(\mu = [\mu_p, \mu_t, \mu_d]\) (3D position + 1D time + 3D direction) and \(\Sigma\) is a \(7 \times 7\) covariance matrix. At render time, given the current time \(t\) and viewing direction \(d\), conditional slicing produces a 3D Gaussian, which is then passed to the standard 3DGS rasterization pipeline.
Key Designs¶
-
7D Gaussian Representation: The \(7 \times 7\) covariance matrix is parameterized via Cholesky decomposition \(\Sigma = LL^T\) to ensure positive definiteness. Cross-covariance blocks \(\Sigma_{pt}\), \(\Sigma_{pd}\), and \(\Sigma_{td}\) encode spatial-temporal-directional coupling — a critical design that enables a single Gaussian to simultaneously capture motion-induced positional changes and view-dependent appearance variations.
-
Conditional Slicing Mechanism: For a given \((t, d)\), the conditional distribution formula for multivariate Gaussians is applied to slice the 7D Gaussian into a conditioned 3D Gaussian: $\(\mu_{cond} = \mu_p + \Sigma_{p,(t,d)} \Sigma_{(t,d)}^{-1} \begin{pmatrix} t - \mu_t \\ d - \mu_d \end{pmatrix}\)$ $\(\Sigma_{cond} = \Sigma_p - \Sigma_{p,(t,d)} \Sigma_{(t,d)}^{-1} \Sigma_{p,(t,d)}^T\)$ The conditional opacity is modulated by temporal and directional attenuation factors: \(\alpha_{cond} = \alpha \cdot f_{temp} \cdot f_{dir}\).
-
Adaptive Gaussian Refinement (AGR): Conditional slicing adjusts position and opacity but keeps the covariance shape static. AGR employs a lightweight MLP (2 layers × 64 units) to predict residual corrections for key parameters (position, time, direction, covariance), dynamically adjusting Gaussian shape based on temporal encoding \(\gamma(t)\) to capture non-rigid deformations.
-
Compatibility with 3DGS Pipeline: The sliced conditional 3D Gaussians are directly fed into the standard 3DGS projection and rasterization pipeline, leveraging existing adaptive density control and efficient rendering, with only a minimal opacity threshold (\(\tau_{min}=0.01\)) added.
Loss & Training¶
- Same loss function as 3DGS (L1 + SSIM)
- Temporal densification: splitting triggered when spatial-temporal covariance \(\Sigma_{pt}\) magnitude \(> 0.05\) and temporal scale \(> 0.25\)
- Single V100 (16 GB), Adam optimizer
- AGR training begins after 3,000 steps; \(\lambda_t\) and \(\lambda_d\) become trainable after 15,000 steps
Key Experimental Results¶
7DGS-PBR Dataset (Dynamic + View-Dependent)¶
| Method | PSNR↑ | SSIM↑ | LPIPS↓ | #Points | FPS |
|---|---|---|---|---|---|
| 4DGS | 27.79 | 0.934 | 0.079 | 641K | 193 |
| 7DGS | 32.50 | 0.958 | 0.051 | 98K | 175 |
| 7DGS (w/o AGR) | 31.77 | 0.955 | 0.055 | 88K | 376 |
+4.71 dB PSNR with only 15.3% of the Gaussian primitives. The heart1 scene achieves +8.18 dB.
D-NeRF Dataset (Synthetic Dynamic)¶
| Method | PSNR↑ |
|---|---|
| 4DGaussians | 33.30 |
| 4DGS | 33.21 |
| 7DGS | 34.34 |
Technicolor Dataset (Real Multi-View)¶
| Method | PSNR↑ | SSIM↑ |
|---|---|---|
| Ex4DGS | 33.49 | 0.917 |
| STG | 33.23 | 0.912 |
| 7DGS | 33.58 | 0.912 |
Ablation Study: w/o AGR vs. w/ AGR¶
- 7DGS-PBR: 31.77 vs. 32.50 (+0.73 dB), with FPS dropping from 376 to 175
- D-NeRF: 33.26 vs. 34.34 (+1.08 dB)
- AGR contributes substantially to quality at the cost of rendering speed
Highlights & Insights¶
- Elegant unification: The 7D Gaussian naturally encodes spatial-temporal-directional cross-covariances, yielding a mathematically complete and intuitively grounded representation
- Elegance of conditional slicing: Reduces the high-dimensional problem to a standard 3D problem, remaining fully compatible with existing pipelines
- Remarkable parameter efficiency: Surpassing 4DGS by 4.7 dB with only 15% of the Gaussian count — a testament to the compressive power of high-dimensional representations
- View-dependent effect breakthrough: On scenes such as hearts, clouds, and flames in 7DGS-PBR, view-dependent effects that 4DGS entirely fails to handle are effectively captured by 7DGS
- Flexible degraded mode: Removing AGR yields 400+ FPS, suitable for speed-critical applications
Limitations & Future Work¶
- The AGR MLP introduces additional computational overhead (FPS drops from 376 to 175)
- Advantages on real-world data such as Technicolor are relatively modest
- The 7D covariance has 28 independent parameters (Cholesky lower triangle), making training less stable than 3DGS
- Explicit motion modeling strategies such as keyframe interpolation have not been incorporated
Related Work & Insights¶
- 4DGS: 4D only (space + time); cannot handle view-dependent effects. 7DGS outperforms it across all datasets
- 6DGS: 6D (space + direction) but static only. 7DGS adds the temporal dimension to unify dynamics and view dependence
- Ex4DGS: Models motion explicitly via keyframe interpolation. Results on Technicolor are comparable, suggesting complementary strengths
- SSS / 3D-HGS: Improve kernel functions. 7DGS pursues dimensional extension — an orthogonal direction that could be combined with these approaches
The conceptual progression 3DGS (3D) → 4DGS (4D) → 6DGS (6D) → 7DGS (7D) exemplifies an elegant "unification through dimensionality" paradigm. Conditional slicing is grounded in elementary multivariate Gaussian conditioning, yet proves remarkably powerful. This framework paves the way for even higher-dimensional representations — e.g., 8D incorporating wavelength for spectral rendering.
Rating¶
- Novelty: ⭐⭐⭐⭐⭐ — The 7D unified representation combined with conditional slicing is elegant and natural, filling the gap between 4DGS and 6DGS
- Experimental Thoroughness: ⭐⭐⭐⭐ — Three datasets, SOTA comparisons, and ablations; additional real-world validation would strengthen the work
- Writing Quality: ⭐⭐⭐⭐⭐ — Mathematical derivations are rigorous and complete; the logical progression from 3DGS to 6DGS to 7DGS is clearly articulated
- Value: ⭐⭐⭐⭐⭐ — Unifies dynamic and view-dependent rendering in a single framework; the 15% point count with +5 dB gain demonstrates high practical utility