SAGS: Structure-Aware 3D Gaussian Splatting¶

Conference: ECCV 2024
arXiv: 2404.19149
Code: Available
Area: 3D Vision
Keywords: 3D Gaussian Splatting, graph neural network, structure-aware, novel view synthesis, model compression

TL;DR¶

This work proposes SAGS, which implicitly encodes scene geometry using a local-global graph representation and graph neural networks. It improves the rendering quality of 3DGS, reduces storage requirements (up to 24× compression), and significantly suppresses floater artifacts while maintaining real-time rendering.

Background & Motivation¶

Proposed Solution¶

Goal: Background: 3D-GS optimizes each Gaussian kernel independently in a geometry-agnostic manner, ignoring the intrinsic 3D structure of the scene. This leads to Gaussian kernels deviating significantly from their initial positions, producing floater artifacts and poor depth map quality. Existing compression methods (such as codebook quantization) also fail to utilize structural information. This paper introduces the idea of graph networks from point cloud analysis, allowing adjacent Gaussian kernels to share information and learn topology-preserving displacements.

Method¶

Overall Architecture¶

Curvature-aware densification: Estimating the Gaussian curvature of point clouds and increasing point density in low-curvature regions using midpoint interpolation.
Structure-aware encoder: Constructing a $k$-NN graph and using a GNN to aggregate local and global features.
Refinement network: Decoding color, opacity, covariance, and displacement using four independent MLPs.

Key Designs¶

Curvature-Aware Densification: COLMAP often suffers from insufficient point sampling in textureless planar regions. By estimating curvature via local PCA, midpoints are inserted in low-curvature regions to supplement the point cloud density.

Structure-Aware GNN Encoder: $$\Phi(\mathbf{p}_i, \mathbf{f}_i) = \phi\left(\sum_{j \in \mathcal{N}(i)} w_{ij} h_\Theta(\gamma(\mathbf{p}_j), \mathbf{f}_j - \mathbf{f}_i, \mathbf{g})\right)$$ Utilizing relative features $\mathbf{f}_j - \mathbf{f}_i$, positional encoding $\gamma(\mathbf{p})$, and global features $\mathbf{g}=\max(\mathbf{f})$, neighborhood information is aggregated through inverse distance weighting.

Displacement Prediction: Gaussian positions are modeled as the initial COLMAP positions plus displacements $\Delta \mathbf{p}$ predicted by the MLP, constraining small displacements to preserve scene topology.

SAGS-Lite: Training the network only on keypoints and obtaining midpoint attributes through interpolation, achieving extreme lightweight performance without any compression techniques.

Loss & Training¶

\[\mathcal{L} = (1-\lambda)\mathcal{L}_1 + \lambda\mathcal{L}_{SSIM}, \quad \lambda=0.2\]

Key Experimental Results¶

Main Results¶

Rendering Quality on Mip-NeRF360 / Tanks & Temples / Deep Blending:

Method	MipNeRF360 PSNR/SSIM/LPIPS	T&T PSNR/SSIM/LPIPS	DB PSNR/SSIM/LPIPS
3D-GS	28.69/0.870/0.182	23.14/0.841/0.183	29.41/0.903/0.243
Scaffold-GS	28.84/0.848/0.220	23.96/0.853/0.177	30.21/0.906/0.254
SAGS	29.65/0.874/0.179	24.88/0.866/0.166	30.47/0.913/0.241

Storage Compression Ratio (Compared to 3D-GS):

Method	MipNeRF360 (MB)	T&T (MB)	Deep Blending (MB)
3D-GS	693	411	676
Scaffold-GS	252 (2.8×↓)	87 (4.7×↓)	66 (10.2×↓)
SAGS	135 (5.1×↓)	75 (5.5×↓)	58 (11.7×↓)
SAGS-Lite	76 (9.1×↓)	35 (12×↓)	28 (24×↓)

Ablation Study¶

Ablation Item	DB PSNR	T&T PSNR
w/o Curvature Densification	29.87%	23.97%
w/o GNN	29.94%	24.19%
w/o Positional Encoding	30.21%	24.31%
w/o Global Features	30.17%	24.42%
w/o View-Dependent Positions	30.07%	24.37%
Full SAGS	30.47%	24.88%

Key Findings¶

Structure awareness constrains Gaussian kernel displacements within a small range, suppressing floater artifacts.
The depth maps generated by SAGS are significantly superior to those of 3D-GS and Scaffold-GS, capturing sharp edges and flat surfaces.
SAGS-Lite achieves 24× storage reduction without using compression techniques, while maintaining rendering quality close to 3D-GS.

Highlights & Insights¶

First structure-aware 3DGS method: Bridging the fields of point cloud analysis and 3DGS.
Displacement prediction paradigm: Constraining Gaussian kernels to stay near the initial geometry, implicitly preserving scene topology.
Extremely lightweight SAGS-Lite: The midpoint interpolation scheme is simple and effective, achieving 24× compression without quantization.
Achieving superior rendering quality and smaller model sizes simultaneously.

Limitations & Future Work¶

GNN inference introduces extra computational overhead.
It heavily relies on the quality of initial COLMAP point clouds.
It is not applicable to highly dynamic scenes.

Scaffold-GS introduces a hierarchical structure but still employs structureless optimization.
Concepts from point cloud analysis GNNs (e.g., DGCNN, PointNet++) can be transferred to 3DGS.
Insight: Scene structure priors are crucial for reducing redundant Gaussian kernels and improving rendering quality.

Rating¶

Novelty: ★★★★★ First work to introduce GNN to 3DGS, with a novel structure-aware concept.
Practicality: ★★★★☆ Real-time rendering and substantial compression hold great value for VR/AR applications.
Experimental Quality: ★★★★★ Fully evaluated on 13 scenes across 3 datasets, with comprehensive ablation studies.