4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video¶

Conference: CVPR 2025
arXiv: 2503.18421
Code: None
Area: 3D Vision
Keywords: 4D Gaussian Splatting, video compression, rate-distortion, free-viewpoint video, streamable

TL;DR¶

This paper proposes 4DGC, a rate-distortion-aware 4D Gaussian compression framework. By adopting motion-aware dynamic Gaussian modeling (multi-resolution motion grids + sparse compensatory Gaussians) and end-to-end compression (differentiable quantization + implicit entropy model), 4DGC achieves 16× compression over 3DGStream without sacrificing rendering quality.

Background & Motivation¶

Background¶

Background: 3D Gaussian Splatting (3DGS) enables high-quality free-viewpoint video (FVV) rendering. However, it requires storing a vast amount of Gaussian attributes (position, color, covariance, etc.) for each frame, resulting in extremely high storage and transmission costs.
Limitations of Prior Work: (1) Existing methods handle Gaussian representation and compression separately, ignoring the rate-distortion trade-off. (2) Inter-frame redundancy is underutilized—Gaussian attributes of adjacent frames are highly similar. (3) Static 3DGS compression methods cannot be directly extended to dynamic scenes.
Key Challenge: High-quality FVV rendering requires massive Gaussian parameters, whereas streaming demands extremely low bitrates. It is necessary to consider compression efficiency during the representation design stage.
Goal: Design an end-to-end 4D Gaussian compression scheme that simultaneously optimizes rate-distortion performance at both representation and compression levels.
Key Insight: Utilize motion grids to capture inter-frame rigid motion (which covers most scene dynamics), and only use sparse compensatory Gaussians to represent residuals, drastically reducing the amount of information to be encoded.
Core Idea: Inter-frame motion modeling via motion grids + new region handling via sparse compensation + end-to-end rate-distortion optimized compression.

Proposed Approach¶

Goal: ### Overall Architecture 4DGC comprises two core modules: (1) motion-aware dynamic Gaussian modeling, which estimates inter-frame motion via multi-resolution motion grids and handles newly appeared regions with sparse compensatory Gaussians; (2) end-to-end compression, which performs differentiable quantization on attributes and estimates bitrates with an implicit entropy model to jointly optimize rendering quality and bitrate.

Method¶

Overall Architecture¶

4DGC comprises two core modules: (1) motion-aware dynamic Gaussian modeling, which estimates inter-frame motion via multi-resolution motion grids and handles newly appeared regions with sparse compensatory Gaussians; (2) end-to-end compression, which performs differentiable quantization on attributes and estimates bitrates with an implicit entropy model to jointly optimize rendering quality and bitrate.

Key Designs¶

Multi-Resolution Motion Grid
- Function: Estimate inter-frame rigid motion (translation + rotation)
- Mechanism: \(\Delta\boldsymbol{\mu}_t = \Phi_{\mu}(\bigcup_{l=1}^L \text{interp}(\mathbf{P}_{t-1}^l, \mathbf{M}_t^l))\)
- Design Motivation: Motion grids provide continuous, low-dimensional representations, which are far more efficient than storing motion vectors per Gaussian.
Sparse Compensatory Gaussians
- Function: Introduce additional Gaussians for newly appeared or rapidly changing regions.
- Mechanism: Two triggering conditions: gradient change (\(|\nabla| > \tau_g\)) and rapid displacement (\(|\Delta\mu| > \tau_\mu\)).
- Final Representation: \(\hat{\mathbf{G}}_t = \hat{\mathbf{G}}_{t-1}(\cdot) + \Delta\hat{\mathbf{G}}_t\)
End-to-End Rate-Distortion Compression
- Differentiable Quantization: Quantize Gaussian attributes directly during training.
- Implicit Entropy Model: Estimate the bitrate for each attribute.
- RD Loss: \(\mathcal{L} = \mathcal{L}_{render} + \lambda \cdot R\)

Loss & Training¶

Rendering loss + \(\lambda \times\) bitrate, where \(\lambda\) controls the compression rate.

Key Experimental Results¶

Main Results¶

Method	Compression Ratio	PSNR	Storage
3DGStream	1×	Baseline	Large
4DGC	16×	≈ Baseline	1/16

Ablation Study¶

Component	Effect
w/o Motion Grid	Significant drop in compression efficiency
w/o Compensatory Gaussians	Poor rendering quality in newly appeared regions
w/o End-to-End Training	Suboptimal RD performance
Full	Best RD trade-off

Key Findings¶

The motion grid captures the vast majority of inter-frame changes, with compensatory Gaussians accounting for only a small portion.
Implicit entropy models fit the distribution of Gaussian attributes better than traditional entropy coding.
Rendering quality is nearly lossless under 16× compression.

Highlights & Insights¶

Joint Optimization of Representation and Compression: High-quality representations are designed to be inherently compression-friendly, rather than compressing after modeling.
The motion grid design leverages the prior knowledge of dynamic scenes where "most regions exhibit rigid motion".
The end-to-end differentiable framework enables rate-distortion (RD) optimization.

Limitations & Future Work¶

Non-rigid motion (e.g., cloth, liquids) may require more complex motion modeling.
Streaming latency has not been fully evaluated.
Combining with other compression/detection methods may yield better results.
Evaluation on larger-scale datasets (e.g., longer video sequences, more diverse scenes) remains to be conducted.
Deployment optimization for different application scenarios (mobile, server) is worth exploring.
Theoretical analysis of the method can be further deepened.

vs 3DGStream: Lacks compression design; its storage cost is 16× that of 4DGC.
vs Compact3D: Only handles compression for static scenes.

Rating¶

Novelty: ⭐⭐⭐⭐ Strong idea of joint representation and compression design
Experimental Thoroughness: ⭐⭐⭐⭐ Comprehensive RD curve comparisons
Writing Quality: ⭐⭐⭐⭐ Clear technical descriptions
Value: ⭐⭐⭐⭐ Addresses a core problem in FVV streaming