7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting¶

Conference: ICCV 2025 arXiv: 2503.07946 Code: https://gaozhongpai.github.io/7dgs/ (project page) Area: 3D Vision Keywords: Gaussian Splatting, Dynamic Scene Rendering, View-Dependent Effects, Real-Time Rendering, Novel View Synthesis

TL;DR¶

This work extends 3DGS to seven dimensions (spatial 3D + temporal 1D + directional 3D). A conditional slicing mechanism projects 7D Gaussians into 3D Gaussians compatible with the standard 3DGS pipeline, achieving up to 7.36 dB PSNR improvement on dynamic scenes with view-dependent effects while maintaining 401 FPS real-time rendering.

Background & Motivation¶

Photorealistic rendering of dynamic scenes requires simultaneous modeling of three dimensions: spatial geometry, temporal dynamics, and view-dependent appearance. These three dimensions exhibit complex interdependencies — for instance, specular highlights on moving objects vary simultaneously with both viewing direction and position.

Existing methods address individual sub-problems: - 4DGS (spatial + temporal): handles dynamic scenes but ignores view-dependent effects - 6DGS (spatial + directional): captures view dependence but is limited to static scenes - No prior method handles all three dimensions within a unified framework while maintaining real-time performance

Key Challenge: High-dimensional representations offer stronger modeling capacity but introduce substantial computational overhead. Operating directly in 7D space precludes real-time rendering.

Core Idea: Scene elements are represented as 7D Gaussian distributions. Leveraging the conditional distribution property of multivariate Gaussians, a mathematically rigorous conditional slicing mechanism "slices" each 7D Gaussian into a 3D Gaussian conditioned on time and viewing direction, which seamlessly integrates into the existing 3DGS rendering pipeline.

Method¶

Overall Architecture¶

In 7DGS, each scene element is modeled as a 7D Gaussian $X = (X_p, X_t, X_d) \sim \mathcal{N}(\mu, \Sigma)$, where $X_p \in \mathbb{R}^3$ denotes spatial coordinates, $X_t \in \mathbb{R}$ denotes time, and $X_d \in \mathbb{R}^3$ denotes viewing direction. The cross-block terms $\Sigma_{pt}, \Sigma_{pd}, \Sigma_{td}$ of the $7\times7$ covariance matrix encode inter-dimensional correlations. At render time, the 7D Gaussian is reduced to 3D via conditional slicing and fed into the standard 3DGS rasterization pipeline.

Key Designs¶

7D Gaussian Representation: The full 7D covariance matrix of each scene element is parameterized via Cholesky decomposition $\Sigma = LL^\top$ to guarantee positive definiteness. The cross-covariance blocks $\Sigma_{pt}$ (spatial–temporal), $\Sigma_{pd}$ (spatial–directional), and $\Sigma_{td}$ (temporal–directional) naturally model the coupling among all three dimensions.
- Design Motivation: Moving specular highlights must respond simultaneously to changes in position, time, and direction — a coupling that cannot be captured by modeling each dimension independently.
Conditional Slicing Mechanism: Given observation time $t$ and viewing direction $d$, the conditional distribution of the spatial component is derived using the multivariate Gaussian conditioning formula: $$\mu_{cond} = \mu_p + \Sigma_{p,(t,d)} \Sigma_{(t,d)}^{-1} \begin{pmatrix} t - \mu_t \\ d - \mu_d \end{pmatrix}$$ $$\Sigma_{cond} = \Sigma_p - \Sigma_{p,(t,d)} \Sigma_{(t,d)}^{-1} \Sigma_{p,(t,d)}^\top$$ Opacity is simultaneously modulated by temporal and directional factors: $\alpha_{cond} = \alpha \cdot f_{temp} \cdot f_{dir}$.
- Design Motivation: This operation is mathematically exact with no approximation error, and the resulting 3D Gaussians can directly reuse the efficient 3DGS rasterizer.
Adaptive Gaussian Refinement (AGR): A lightweight MLP ($C_{in} \times 64 \times C_{out}$) predicts residual corrections $\Delta\mu_p, \Delta\mu_t, \Delta\mu_d, \Delta l$ from a feature vector $f = \mu_p \oplus \mu_t \oplus \mu_d \oplus \gamma(t)$, dynamically adjusting Gaussian parameters prior to conditional slicing to model complex motions such as non-rigid deformations.
- Design Motivation: Conditional slicing preserves Gaussian shape invariant over time. AGR compensates for this limitation, enabling the same Gaussian to exhibit different spatial shapes at different time steps.

Loss & Training¶

The same loss function (L1 + SSIM), optimizer, and hyperparameters as 3DGS are adopted
The minimum opacity threshold is modified to $\tau_{min}=0.01$ to compensate for conditional opacity modulation
The Gaussian splitting strategy adds a temporal criterion: splitting is triggered when $\|\Sigma_{pt}\|$ exceeds a threshold and the temporal scale $\Sigma_t$ is large
The AGR network begins training after 3,000 iterations; modulation parameters $\lambda_t, \lambda_d$ become learnable after 15,000 iterations

Key Experimental Results¶

Main Results¶

Dataset	Method	PSNR↑	SSIM↑	LPIPS↓	FPS↑	# Points↓
7DGS-PBR (avg)	4DGS	27.79	0.934	0.079	192.6	641,960
7DGS-PBR (avg)	7DGS	32.50	0.958	0.051	174.7	98,440
D-NeRF (avg)	4DGS	33.21	0.969	0.036	296.4	255,319
D-NeRF (avg)	7DGS	34.34	0.972	0.032	194.2	47,378
Technicolor (avg)	4DGS	33.25	0.905	0.216	84.2	838,892
Technicolor (avg)	7DGS	33.58	0.912	0.198	79.2	416,390

The largest gain is observed on the heart1 scene: PSNR improves from 27.30 → 35.48 (+8.18 dB), while the point count drops from 694K to 83K (only 11.9% retained).

Ablation Study¶

Variant	PSNR (7DGS-PBR)	FPS	# Points
4DGS (baseline)	27.79	192.6	641,960
7DGS w/o AGR	31.77	376.0	88,393
7DGS (full)	32.50	174.7	98,440

Even without AGR, 7DGS surpasses 4DGS by +3.98 dB while doubling FPS to 376 (377.8 FPS on D-NeRF), demonstrating the effectiveness of the core 7D representation. AGR contributes an additional +0.73 dB at the cost of reduced rendering speed.

Key Findings¶

7DGS achieves the largest gains on scenes with pronounced view-dependent effects (heart, cloud, suzanne), with PSNR improvements of 4–8 dB
7DGS requires only 15–50% as many points as 4DGS, as the unified representation eliminates redundant Gaussians
7DGS w/o AGR offers a favorable speed–quality trade-off (401 FPS with quality far exceeding 4DGS)

Highlights & Insights¶

Mathematically elegant unified representation: The closed-form solution of multivariate Gaussian conditioning reduces 7D to 3D without auxiliary networks, yielding a theoretically clean formulation
Backward compatibility with the 3DGS ecosystem: The sliced 3D Gaussians directly reuse 3DGS rasterization and densification modules, facilitating practical deployment
Efficiency: Higher quality is achieved with fewer points and rendering speeds up to 401 FPS
A custom 7DGS-PBR dataset (real CT heart + volumetric cloud + volumetric flame) is introduced, filling the evaluation gap for dynamic scenes with view-dependent effects

Limitations & Future Work¶

The 7D covariance matrix has 28 independent parameters per Gaussian, substantially more than 3DGS (6) and 4DGS, increasing memory overhead
Gains on the Technicolor dataset are modest (+0.33 dB), suggesting limited generalization to complex real-world scenes
The AGR MLP introduces additional computation, causing the full 7DGS to achieve lower FPS than 4DGS
Color is still represented with spherical harmonics; time-dependent color modeling is not incorporated
Conditional slicing requires inverting $\Sigma_{(t,d)}$ (a $4\times4$ matrix), which, while small, incurs overhead when applied to a large number of Gaussians

3DGS / 4DGS / 6DGS: This work is a natural extension of the dimensionality-expansion line within the Gaussian Splatting family, unifying temporal and directional components
D-NeRF / HexPlane: NeRF-based dynamic scene methods; 7DGS outperforms them comprehensively in both speed and quality
Insight: The conditional slicing paradigm is generalizable to higher-dimensional Gaussians (e.g., incorporating lighting or material dimensions) and to other scene representations requiring dimensionality reduction
Compared to Ex4DGS (which models motion via keyframe interpolation), 7DGS implicitly encodes motion through the covariance matrix without requiring explicit keyframes

Rating¶

Novelty: ⭐⭐⭐⭐⭐ First unified 7D Gaussian representation for spatial–temporal–directional modeling; the conditional slicing mechanism is elegantly designed
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive evaluation on three datasets including a custom benchmark; additional validation on real-world scenes would strengthen the work
Writing Quality: ⭐⭐⭐⭐⭐ Mathematical derivations are rigorous and algorithmic pseudocode is clear
Value: ⭐⭐⭐⭐⭐ Unified framework with real-time performance represents a significant advance for dynamic rendering