Mobile-GS: Real-time Gaussian Splatting for Mobile Devices¶

Conference: CVPR 2025
arXiv: 2603.11531
Code: https://xiaobiaodu.github.io/mobile-gs-project/
Area: 3D Vision
Keywords: 3D Gaussian Splatting, Real-time Rendering, Mobile Deployment, Order-independent Rendering, Model Compression

TL;DR¶

This paper proposes Mobile-GS, which achieves 116 FPS real-time Gaussian Splatting rendering on a Snapdragon 8 Gen 3 mobile GPU for the first time, with only 4.6MB of storage and visual quality comparable to the original 3DGS. This is accomplished through depth-aware order-independent rendering (eliminating sorting bottlenecks), neural view-dependent enhancement, first-order SH distillation, neural vector quantization, and contribution-based pruning.

Background & Motivation¶

Background: 3DGS achieves high-quality novel view synthesis, but its high computational and storage demands make it difficult to run in real-time on resource-constrained mobile devices such as smartphones and AR glasses.

Limitations of Prior Work: Alpha blending requires sorting Gaussians by depth, which is the primary computational bottleneck (accounting for a significant portion of inference time). Additionally, 3DGS uses third-order SH, storing a massive number of parameters. Existing lightweight methods (such as Scaffold-GS and Mini-Splatting) still rely on sorting and cannot overcome this speed bottleneck.

Key Challenge: Sorting is a prerequisite for correct blending but also acts as the primary speed bottleneck—removing sorting can significantly accelerate rendering but introduces transparency artifacts.

Goal: (1) Eliminate sorting bottlenecks to achieve order-independent rendering, (2) compensate for the quality loss caused by order-independent rendering, and (3) compress storage to adapt to mobile devices.

Key Insight: A depth-aware weight function can implicitly simulate near-to-far sorting effects, and a view-dependent MLP can compensate for transparency artifacts.

Core Idea: Replace sorting with depth-aware weights, compensate using neural networks, and apply multi-dimensional compression to achieve real-time 3DGS on mobile devices.

Method¶

Overall Architecture¶

A first-order SH student model is distilled from a pre-trained teacher model (Mini-Splatting). During training, depth-aware order-independent rendering and a view-dependent enhancement MLP are learned simultaneously. After training, vector quantization and contribution-based pruning are applied for compression. During inference, the sorting step is completely eliminated, and the model is deployed on mobile devices using Vulkan 2.0.

Key Designs¶

Depth-aware Order-independent Rendering:
- Function: Eliminates the sorting dependency of alpha blending, replacing it with a weighted sum.
- Mechanism: The pixel color is represented as \(\mathbf{C} = (1-T)\frac{\sum_i \mathbf{c}_i \alpha_i w_i}{\sum_i \alpha_i w_i} + T\mathbf{c}_{bg}\), where \(w_i = \phi_i^2 + \frac{\phi_i}{d_i^2} + \exp(\frac{s_{max}}{d_i})\). Since both the numerator and denominator are summations, they are order-independent. \(T = \prod(1-\alpha_j)\) is used to distinguish the foreground from the background.
- Design Motivation: Sorting is an \(O(N \log N)\) operation that dominates inference time. In order-independent weighted summation, the inverse depth term naturally suppresses the contributions of distant Gaussians, while the scale terms increase the weights of large Gaussians.
Neural View-dependent Enhancement:
- Function: Predicts view-dependent opacity and weights using an MLP to compensate for transparency artifacts caused by order-independent rendering.
- Mechanism: Let \(\mathbf{F} = \text{MLP}_f(\mathbf{P}_i, s_i, r_i, Y_i)\), \(\phi_i = \text{ReLU}(\text{MLP}_\phi(\mathbf{F}))\), and \(o_i = \sigma(\text{MLP}_o(\mathbf{F}))\). The inputs include the camera-to-Gaussian direction vector, scale, rotation, and SH coefficients.
- Design Motivation: Order-independent blending tends to cause semi-transparency in spatially overlapping regions. View-dependent opacity can dynamically suppress the transparency of occluded regions.
First-order SH Distillation + Neural Vector Quantization + Contribution Pruning:
- SH Distillation: Distills third-order (48 coefficients) representations into first-order (12 coefficients) student counterparts, supervised by color distillation loss and scale-invariant depth distortion loss, significantly reducing parameters.
- Neural Vector Quantization: Employs K-Means grouping, multi-codebook sub-vector quantization, and Huffman coding. SH features are further decomposed into diffuse and view-dependent components, which are decoded by a lightweight MLP.
- Contribution Pruning: Jointly evaluates opacity and scale quantiles to filter out low-contribution Gaussians. Gaussians are permanently pruned after cumulative votes reach a predefined threshold.

Loss & Training¶

Using Mini-Splatting as the teacher model, the training process lasts for 60K iterations, with vector quantization initiated at 35K. The loss function is defined as \(\mathcal{L} = \mathcal{L}_{rgb} + \mathcal{L}_{distill} + 0.1\mathcal{L}_{depth}\).

Key Experimental Results¶

Main Results (Mip-NeRF 360 Dataset, RTX 3090)¶

Method	PSNR↑	Storage↓	FPS↑
3DGS	27.21	839.9 MB	174
SortFreeGS	27.02	851.4 MB	731
LocoGS-S	27.02	8.5 MB	292
C3DGS	27.03	30.6 MB	184
Mobile-GS	27.12	4.6 MB	1125

Mobile devices (Snapdragon 8 Gen 3): Mobile-GS achieves 127 FPS, compared to SortFreeGS at 24 FPS and 3DGS at 8 FPS.

Ablation Study¶

Configuration	PSNR↑	FPS↑	Storage↓
Full model	27.12	1125	4.6 MB
w/o Order-independent Rendering	27.26	684	4.5 MB
w/o View-dependent Enhancement	26.68	1227	4.4 MB
w/o Neural Quantization	27.33	841	121 MB
0th-order SH Distillation	27.04	1219	3.6 MB
2nd-order SH Distillation	27.13	917	7.3 MB

Key Findings¶

Eliminating sorting contributes the most to the speed improvement (from 684 to 1125 FPS), but requires view-dependent enhancement to compensate for quality (without it, the PSNR drops by 0.44 dB).
First-order SH is the optimal trade-off point between accuracy and efficiency: 0th-order loses 0.08 dB but runs faster, while 2nd-order brings only a 0.01 dB improvement but is 20% slower.
Neural vector quantization compresses the storage from 121 MB to 4.6 MB (\(26\times\) compression) with only a 0.21 dB impact on PSNR.
Mobile-GS achieves 116 FPS on a mobile device in the Bicycle scene (at \(1600 \times 1063\) resolution), representing the first truly real-time mobile 3DGS.

Highlights & Insights¶

Identifying and completely eliminating sorting as the computational bottleneck: Instead of merely optimizing the sorting algorithm, this work redesigns the rendering equation using weighted summation. This approach is highly bold and innovative.
A three-pronged compression strategy of distillation, quantization, and pruning: The model size is compressed from 840 MB to 4.6 MB (\(182\times\) compression) with almost no quality loss.
Vulkan 2.0 deployment provides a valuable engineering reference for running 3DGS on mobile devices.

Limitations & Future Work¶

Order-independent rendering may exhibit worse performance on semi-transparent objects (e.g., glass, smoke), as the compensation provided by the MLP is limited.
The training time (1.5 hours) is longer than that of 3DGS (0.5 hours) due to the additional distillation and quantization steps.
The overhead of MLP inference on mobile environments is not analyzed in depth, which may present a new bottleneck on low-end devices.
The method is only validated on Snapdragon 8 Gen 3; its actual performance on low-end chips (e.g., Snapdragon 6 series) remains unknown.
The network architecture choices for the view-dependent enhancement MLP (e.g., number of layers, hidden dimensions) lack a systematic search, suggesting that more optimal configurations may exist.
The impact of codebook size (\(K=256\)) and sub-vector dimensions on the trade-off between compression ratio and quality during quantization has not been fully explored.
The coefficients combining the three terms in the depth-aware weight function \(w_i\) are empirically designed and lack rigorous theoretical analysis.

vs SortFreeGS: Both utilize order-independent rendering, but SortFreeGS still suffers from large storage requirements and lacks compression. Mobile-GS introduces a complete compression pipeline.
vs LocoGS: LocoGS achieves extreme compression (8.5 MB) but offers limited speed (292 FPS). Mobile-GS is both smaller (4.6 MB) and faster (1125 FPS).
vs LightGaussian: LightGaussian only performs SH distillation (from 3rd to 2nd order), whereas this work is more aggressive (from 3rd to 1st order) and integrates quantization.
vs Compact3D/C3DGS: These methods apply codebook compression but retain sorting, resulting in 30.6 MB of storage but no speedup. Mobile-GS addresses both speed and storage simultaneously.

Rating¶

Novelty: ⭐⭐⭐⭐ Systematic innovation combining order elimination and comprehensive mobile optimizations.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Validation on 3 datasets, real-world mobile testing, and detailed ablation studies.
Writing Quality: ⭐⭐⭐⭐ Clear presentation with comprehensive experimental results.
Value: ⭐⭐⭐⭐⭐ The first truly real-time mobile 3DGS, demonstrating high practical value.