UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping¶
Conference: CVPR 2025
arXiv: 2502.01846
Code: Project Page
Area: 3D Vision
Keywords: 3D Gaussian Splatting, UV Mapping, Structured Representation, Diffusion Models, 3D Generation
TL;DR¶
UVGS transforms unstructured 3D Gaussian Splatting (3DGS) into a structured 2D UV map representation via spherical mapping, which is further compressed into a 3-channel Super UVGS image. This allows pre-trained 2D image foundation models (VAEs, diffusion models) to be directly applied to 3DGS generation and compression in a zero-shot manner.
Background & Motivation¶
3D Gaussian Splatting (3DGS) has demonstrated exceptional quality and efficiency in 3D object and scene modeling. However, its unordered, discrete, and permutation-invariant nature poses significant challenges for generative tasks:
- Lack of Structure: Similar to point clouds, 3DGS lacks spatial structure, making it incompatible with image-based generative models (CNNs, Transformers).
- Permutation Invariance: Any permutation of the Gaussian set represents the same object, but neural networks are not permutation-invariant.
- Heterogeneous Attributes: Each Gaussian contains 14-dimensional heterogeneous attributes (3D position, 4D rotation, 3D scale, 3D color, 1D opacity), which exhibit highly disparate distributions.
Existing structuring solutions have limitations: - Voxel Grids (GaussianCube): High computational cost, difficult at high resolutions, and information loss due to voxelization. - Triplane Representation: Trade-off between quality and memory, risk of losing fine details. - Direct Attribute Prediction (DiffGS): Limited to category-level generation, making it difficult to learn generic probability functions. - Splatter Image: 3D-unaware projection, leading to poor multi-view consistency.
Method¶
Overall Architecture¶
The UVGS pipeline consists of three steps: (1) Spherical mapping projects \(N\) Gaussians of 3DGS onto an \(M \times N\) 14-channel UV map; (2) A multi-branch mapping network compresses the 14-channel UVGS into a 3-channel Super UVGS image; (3) The Super UVGS is directly fed into pre-trained 2D foundation models for compression or generation. The reverse process reconstructs 3DGS via an inverse mapping network and inverse spherical mapping.
Key Designs¶
Design 1: Spherical Mapping — Mapping Unordered Gaussians to Structured 2D UV Maps
- Function: Resolve the unstructuredness and permutation invariance of 3DGS, establishing local and global correspondences between Gaussians.
- Mechanism: Inscribe the 3DGS object within a sphere centered at its geometric center. For each Gaussian, compute the spherical coordinates \((\rho_i, \theta_i, \phi_i)\) and map the azimuth \(\theta\) and polar angle \(\phi\) to UV image coordinates. Each UV pixel stores the 14-dimensional Gaussian attributes. Many-to-one conflicts are resolved through Dynamic Selection (retaining the Gaussian with the highest opacity along the same ray).
- Design Motivation: Spherical mapping naturally provides a deterministic 2D arrangement for 3D points, so any random permutation of the same object maps to the same UVGS representation. The proximity of neighboring Gaussians in 3D space is preserved in the 2D UV map, enabling CNNs to learn local and global features effectively.
Design 2: Super UVGS + Multi-branch Mapping Network — Unified 3-channel Representation of Heterogeneous Attributes
- Function: Compress 14-channel heterogeneous attributes into a 3-channel image to achieve zero-shot compatibility with pre-trained 2D foundation models.
- Mechanism: The forward mapping network contains three branches: a position branch (handling \(\sigma\)), a transform branch (handling \(r, s\)), and an appearance branch (handling \(c, o\)). Each branch extracts feature maps, which are then concatenated and fed into a central branch (multi-layer Conv + BN + ReLU). The final layer uses a \(\tanh\) activation to output a 3-channel Super UVGS. The inverse mapping network reconstructs the 14-channel UVGS with a symmetric structure.
- Design Motivation: Position, rotation, scale, and color have completely different value distributions—positions and colors of adjacent Gaussians vary smoothly, while rotations and scales may change drastically. The branch processing strategy avoids gradient anomalies and slow convergence, allowing each branch to focus on the unique characteristics of its respective attributes.
Design 3: Zero-shot Foundation Model Integration — Directly Utilizing Pre-trained 2D Models
- Function: Achieve highly efficient compression (99.5%+) and unconditional/conditional generation of 3DGS.
- Mechanism: Super UVGS can be directly fed into pre-trained image AE/VAE/VQVAE for reconstruction (without fine-tuning), achieving storage compression of over 99.5%. An LDM is trained in the VAE latent space (unconditional or text-conditioned), and the generated latent vectors are decoded into Super UVGS, which is then reconstructed back into 3DGS objects via inverse mapping.
- Design Motivation: 2D foundation models are trained on massive datasets and possess powerful image understanding capabilities. Super UVGS "looks like an image" and is highly structured, allowing pre-trained VAEs to generalize directly (the zero-shot reconstruction PSNR drops by only ~0.3dB compared to UVGS).
Loss & Training¶
The mapping network is trained using a combined MSE + LPIPS loss: \(\mathcal{L}_{uvgs} = \mathcal{L}_{mse} + \lambda \cdot \mathcal{L}_{UV-lpips}\), where the LPIPS loss is computed separately for position, scale, rotation, and color attributes, and \(\lambda\) is increased from 0 to 10 during training.
Key Experimental Results¶
Main Results: Reconstruction Quality and Compression Rate (Objaverse Cars/Full)¶
| Method | PSNR (C/F) | LPIPS (C/F) | Compression Rate |
|---|---|---|---|
| Original 3DGS | 34.6/34.2 | 0.02/0.02 | 0% |
| UVGS (K=1) | 31.3/31.1 | 0.06/0.06 | 53.0% |
| UVGS (K=4) | 34.2/33.2 | 0.02/0.03 | 33.3% |
| Super UVGS (K=1) | 31.2/31.1 | 0.07/0.08 | 89.7% |
| VAE (K=1) | 30.6/30.9 | 0.07/0.09 | 99.5% |
Comparison of Generation Quality¶
| Method | Issues |
|---|---|
| DiffTF | Low quality, low resolution |
| Get3D | 3D inconsistent, many artifacts |
| GaussianCube | Inconsistent symmetry |
| UVGS (Ours) | High quality, high resolution, 3D consistent |
Key Findings¶
- Pre-trained image VAEs can reconstruct Super UVGS in a zero-shot manner, with only ~0.3-0.6dB drop in PSNR.
- Super UVGS alone achieves 89.7% compression, reaching 99.5% compression when encoded by VAE, with minimal quality loss.
- A 512×512 UV map can store up to 262K unique Gaussians.
- Text-conditioned 3DGS generation yields high-quality results even on complex geometric objects.
- For the first time, 3DGS inpainting experiments are demonstrated.
Highlights & Insights¶
- Simple and Elegant Core Idea: Use spherical mapping to bring structure to unordered Gaussians, completing the 3D→2D mapping without any learning.
- Discovery of Zero-shot Generalization: The finding that pre-trained VAEs can directly process Super UVGS is highly surprising and valuable.
- Scalability: Increasing the UV resolution allows accommodating more Gaussians, while multi-layer UV maps can handle complex objects.
Limitations & Future Work¶
- Spherical mapping has many-to-one conflicts for non-convex objects, requiring multi-layer UV maps.
- Currently, only object-level 3DGS has been tested; scene-level representation requires more complex mapping strategies.
- The 3-channel compression of Super UVGS inevitably loses some information, particularly in rotation and scale attributes.
- Future work can explore more efficient UV mapping schemes and larger-scale 3DGS generation.
Related Work & Insights¶
- GaussianCube: Structured 3DGS with voxel grids, but computationally intensive and limited in resolution.
- DiffGS: Continuous function representation of 3DGS, restricted to category-level.
- Splatter Image: Image-like 3DGS representation, but lacks 3D awareness.
- Insight: Finding the right representation shift is key to bridging model domains—transforming 3D problems into 2D problems allows direct utilization of mature 2D infrastructures.
Rating¶
⭐⭐⭐⭐ — The core idea is simple and inspiring; using spherical mapping to structure 3DGS is intuitive and effective. The discovery of zero-shot VAE generalization is an interesting empirical contribution. It provides a new paradigm for the field of 3DGS generation.