SGI: Structured 2D Gaussians for Efficient and Compact Large Image Representation¶

Conference: CVPR 2026 arXiv: 2603.07789 Code: https://github.com/zx-pan/SGI Area: 3D Vision Keywords: 2D Gaussian splatting, image representation, structured Gaussians, entropy coding, multi-scale fitting

TL;DR¶

SGI organizes unstructured 2D Gaussian primitives via seed points and decodes their attributes with lightweight MLPs. Combined with context-model-driven entropy coding and a multi-scale fitting strategy, SGI achieves up to 7.5× compression and 6.5× speedup in high-resolution image representation while maintaining or improving fidelity.

Background & Motivation¶

Background: 2D Gaussian splatting has emerged as a new paradigm for image representation, enabling efficient rendering on low-end devices. However, scaling to high resolutions requires millions of unstructured Gaussian primitives, resulting in slow convergence and parameter redundancy.
Limitations of Prior Work: Methods such as GaussianImage optimize each Gaussian independently without exploiting spatial locality—neighboring pixels typically share similar color and texture—leading to substantial parameter redundancy among adjacent primitives.
Key Challenge: Anchor-based methods (e.g., Scaffold-GS) achieve effective compression in 3D scenes, but direct transfer to 2D yields limited gains (only ~3%) due to already-removed parameters such as opacity.
Goal: Design a compact and efficient 2D Gaussian representation for high-resolution images.
Key Insight: Introduce seed points to organize Gaussian primitives and perform entropy coding at the seed level for further compression.
Core Idea: Seed points + shared MLPs → structured Gaussians → entropy coding to remove residual redundancy → multi-scale fitting to accelerate optimization.

Method¶

Overall Architecture¶

(1) Seed points uniformly cover the image; each seed is associated with \(K\) Gaussian primitives, and two shared MLPs decode color and covariance. (2) A context model combined with a binarized hash grid estimates the distribution of seed attributes for entropy coding. (3) A multi-scale fitting strategy progressively optimizes from coarse to fine.

Key Designs¶

Seed-Based 2D Neural Gaussians
Function: Organize unstructured Gaussians into a compact seed-level representation.
Mechanism: Each seed located at \(x_a\) carries attributes \(\mathcal{A} = \{f_a, s_o, s_a, \delta\}\) (feature vector, offset scale, size scale, \(K\) offsets). Gaussian positions are computed as \(\mu^{(k)} = x_a + \delta^{(k)} \cdot s_o\). Two shared MLPs, \(\text{MLP}_c\) and \(\text{MLP}_\Sigma\), decode color and covariance from \(f_a\). Each Gaussian requires only 8 parameters.
Design Motivation: Shared MLPs and seed-level feature vectors exploit spatial locality, substantially reducing parameter count.
Context-Model-Driven Entropy Coding
Function: Further compress residual spatial redundancy in seed attributes.
Mechanism: A learnable binarized hash grid \(\mathcal{H}\) encodes spatial consistency across seeds. A context MLP predicts the mean \(\mu_j^{(i)}\) and standard deviation \(\sigma_j^{(i)}\) of each attribute component from hash features, which are then used for arithmetic coding. During training, uniform noise injection simulates quantization, and quantization step sizes are adjusted via learnable refinement factors.
Design Motivation: The structural regularity introduced by seed-based representation makes attribute distributions modelable, enabling effective compression. Seed structuring alone is insufficient (~3% compression); entropy coding is essential.
Multi-Scale Fitting Strategy
Function: Accelerate optimization and improve stability.
Mechanism: A Gaussian pyramid \(\{I_0=I, I_1, \ldots, I_{M-1}\}\) is constructed. Optimization begins at the coarsest level, with results serving as initialization for the next finer level (positions and scales upsampled by 2×). The total number of iterations is fixed, distributed progressively across levels.
Design Motivation: Direct optimization at full resolution converges slowly and unstably, especially given the overhead of quantization-aware training and probabilistic modeling. Coarse-to-fine warm-starting significantly accelerates convergence.

Loss & Training¶

\[L = L_{\text{img}} + \frac{\lambda}{N \cdot d_{\mathcal{A}}} (L_{\text{entropy}} + L_{\text{hash}})\]

\(\lambda=0.001\), \(M=3\) pyramid levels, 15,000 optimization steps.

Key Experimental Results¶

Main Results¶

Method	FGF2 PSNR↑	Size (MB)↓	Opt. Time (min)↓
SGI (low-rate, 3.5M)	31.24	16.33	48.43
GaussianImage	27.30	23.37	322.17
LIG	32.10	106.81	87.56
SGI (high-rate, 10M)	36.27	41.74	97.75
3DGS	34.93	787.73	642.85

Ablation Study¶

Configuration	FGF2 PSNR	Size	Note
λ=0 (no entropy coding)	32.36	104.08	Seed structuring yields negligible compression
λ=0.001	31.24	16.33	6.4× compression
K=5	31.29	18.48	Fewer Gaussians per seed
K=10 (default)	31.24	16.33	Optimal quality–compactness trade-off
M=1 (no multi-scale)	30.58	—	71.59 min
M=3 (default)	31.24	—	48.43 min

Key Findings¶

Seed structuring alone yields only ~3% compression; entropy coding is the key component—compressing 104 MB to 16 MB.
Multi-scale fitting not only accelerates optimization by 32% (71→48 min) but also improves PSNR by 0.66 dB.
Feature-space KNN outperforms JPEG at low bitrates (PSNR +3.3 dB @ 0.245 bpp).
K=10 is the optimal trade-off; increasing K enlarges MLP capacity and feature dimensionality.

Highlights & Insights¶

Parameter sharing via seeds and MLPs: Transforms unstructured Gaussians into a structured representation, making entropy coding tractable.
Entropy coding is essential: Unlike 3D scenes where anchor structuring alone achieves large compression gains, 2D image representation requires explicit entropy modeling.
Dual benefit of multi-scale fitting: Simultaneously accelerates convergence and improves quality, as coarse levels provide effective initialization for finer levels.

Limitations & Future Work¶

Both the seed count \(N\) and the per-seed Gaussian count \(K\) are fixed hyperparameters; content-adaptive allocation remains unexplored.
Quantization is currently simulated with uniform noise; more advanced quantization-aware training techniques could yield further gains.
Validation is limited to single-image representation; extension to video is a promising direction.

vs. GaussianImage: GaussianImage optimizes each Gaussian independently without structural organization; SGI introduces seed-level structure and entropy coding, substantially reducing model size.
vs. LIG: LIG employs hierarchical Gaussians for residual fitting; SGI uses full multi-scale fitting combined with entropy coding to achieve superior compression.

Rating¶

Novelty: ⭐⭐⭐⭐ — Seed-level entropy coding represents the first attempt at compression-oriented structured 2D Gaussian image representation.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Three dataset domains, comprehensive ablations, and comparisons against compression baselines.
Writing Quality: ⭐⭐⭐⭐ — Clear structure with rich figures and tables.
Value: ⭐⭐⭐⭐ — Provides an effective solution for compact high-resolution image representation.