FastGS: Training 3D Gaussian Splatting in 100 Seconds¶

Conference: CVPR2026 arXiv: 2511.04283 Code: fastgs.github.io Area: 3D Vision Keywords: 3D Gaussian Splatting, training acceleration, multi-view consistency, Gaussian density control, pruning strategy

TL;DR¶

FastGS is a multi-view consistency-based acceleration framework for 3DGS that precisely controls Gaussian count via View-Consistent Densification (VCD) and View-Consistent Pruning (VCP). It achieves scene training in approximately 100 seconds on datasets such as Mip-NeRF 360, delivering over 15× speedup over vanilla 3DGS with comparable rendering quality.

Background & Motivation¶

Training time bottleneck in 3DGS: Vanilla 3DGS typically requires tens of minutes to train a single scene. Its Adaptive Density Control (ADC) generates a large number of redundant Gaussians, resulting in persistently high computational overhead and limiting practical deployment experience.
Insufficient densification strategies: Although Taming-3DGS incorporates multi-view information, its scoring approach based on Gaussian-associated attributes (opacity, scale, gradient) fails to strictly enforce multi-view consistency, still producing millions of redundant Gaussians.
Limited effectiveness of existing pruning strategies: Speedy-Splat performs pruning via Hessian approximation accumulation, which only indirectly leverages multi-view information and leads to significant degradation in rendering quality. Other methods relying on simple opacity- or scale-based thresholds are similarly ineffective at eliminating redundancy.
Lack of strict multi-view consistency constraints: A large number of Gaussians contribute to rendering quality in only a few views while being nearly useless in others—that is, they do not satisfy a bundle-adjustment-style multi-view consistency constraint.
Limitations of budget mechanisms: Methods such as DashGaussian limit Gaussian count via budget mechanisms, yet scenes still require millions of Gaussians to maintain quality, leaving practical speedup limited.
Remaining inefficiency in the rasterization stage: The 3-sigma rule in vanilla 3DGS generates numerous redundant Gaussian–tile pairs. Even the precise tile-intersection strategy of Speedy-Splat does not fully resolve invalid coverage caused by marginal Gaussians.

Method¶

View-Consistent Densification (VCD)¶

The core idea is to evaluate whether each Gaussian requires densification from the perspective of multi-view reconstruction quality. The procedure is as follows:

Randomly sample \(K\) training views, render images, and compute per-pixel L1 error maps.
Apply min-max normalization to the error maps and mark high-error pixels using threshold \(\tau\).
Project each Gaussian into 2D image space to obtain its footprint region \(\Omega_i\).
Compute the mean number of high-error pixels within the footprint as the importance score \(s_d^i\).
A Gaussian is permitted to densify only when \(s_d^i\) exceeds threshold \(\tau_d\) (set to 5 in experiments).

This approach ensures that newly added Gaussians genuinely serve under-reconstructed regions across multiple views, preventing redundant growth that benefits only a small subset of views, without requiring a budget mechanism.

View-Consistent Pruning (VCP)¶

Adopting a scoring strategy analogous to VCD, VCP additionally incorporates the overall photometric loss to assess each Gaussian's contribution to rendering quality degradation:

Compute the overall photometric loss \(E_{\text{photo}}\) (a combination of L1 and SSIM) for each sampled view.
The pruning score \(s_p^i\) is a normalized weighted product of the high-error pixel count and photometric loss across views.
A Gaussian is removed when \(s_p^i\) exceeds threshold \(\tau_p\) (set to 0.9).

This strategy more directly and effectively identifies Gaussians with the lowest contribution to multi-view rendering quality compared to Hessian approximation-based methods.

Compact Bounding Box (CB)¶

Building upon the precise tile-intersection approach of Speedy-Splat, CB further tightens the bounding region:

A stricter effective-region threshold is set based on Mahalanobis distance.
A scaling factor \(\beta\) controls the effective support range of each 2D Gaussian.
Smaller \(\beta\) produces more compact ellipses, reducing invalid Gaussian–tile pairs at the margins.

Training Pipeline¶

Built upon 3DGS-accel (vanilla 3DGS + per-splat backpropagation from Taming + SH optimization acceleration).
Densification is performed every 500 iterations and stops at 15K iterations.
Pruning is applied every 500 iterations before 15K and every 3,000 iterations thereafter.
Total training runs for 30K iterations using the Adam optimizer.

Key Experimental Results¶

Table 1: Training Speed and Quality Comparison on Static Scenes (RTX 4090)¶

Method	Mip-NeRF 360 Time (min)	PSNR↑	SSIM↑	#Gaussian↓	FPS↑
3DGS	20.93	27.53	0.812	2.63M	146
Taming-3DGS	5.36	27.48	0.794	0.68M	221
DashGaussian	6.35	27.73	0.817	2.40M	155
Speedy-Splat	13.38	26.91	0.781	0.30M	552
FastGS	1.93	27.56	0.797	0.40M	579
FastGS-Big	3.58	27.93	0.820	1.15M	469

Table 2: Ablation Study (Mip-NeRF 360)¶

Method	Time (min)↓	PSNR↑	#Gaussian↓	FPS↑
3DGS-accel (baseline)	7.10	27.46	2.64M	182
+VCD	3.53	27.69	0.53M	222
+VCP	5.32	27.70	1.96M	285
+CB	6.13	27.44	2.78M	303
Full (VCD+VCP+CB)	1.93	27.56	0.40M	579

VCD is the largest single contributor, reducing Gaussian count from 2.64M to 0.53M (an 80% reduction) and providing over 2× training speedup.

Highlights & Insights¶

Extreme training speed: Scene training completes in as little as 77 seconds (Tanks & Temples), averaging approximately 100 seconds—far exceeding existing state-of-the-art methods.
Simplicity and generality: VCD and VCP require no budget mechanism and can be directly applied to dynamic reconstruction, surface reconstruction, sparse-view reconstruction, large-scale reconstruction, and SLAM, achieving 2–6× speedup across all tasks.
Principled multi-view consistency: Analogous to bundle adjustment, the framework requires each Gaussian to make a positive contribution to multi-view rendering, rather than serving only individual views.
Compatibility with multiple backbones: FastGS achieves 8.8× speedup on Mip-Splatting and 3.6× on Scaffold-GS while maintaining rendering quality.
FastGS-Big surpasses DashGaussian: The larger variant achieves 0.2 dB higher PSNR, reduces training time by 43.6%, and uses half the number of Gaussians.

Limitations & Future Work¶

Incompatibility with post-training of feed-forward 3DGS: Such methods output extremely dense Gaussians, making it difficult for VCP to effectively prune large numbers of points within a few thousand iterations; even 3K-iteration post-training still requires approximately 20 seconds.
Rendering quality not optimal: The default FastGS configuration achieves slightly inferior LPIPS scores compared to DashGaussian and vanilla 3DGS under maximum speedup settings.
Hyperparameter sensitivity: Hyperparameters such as \(\tau_d\), \(\tau_p\), and \(\beta\) require per-scene tuning, and the paper does not thoroughly discuss their robustness.
Evaluation limited to RTX 4090: The transferability of speedup gains to other GPU hardware is not demonstrated.
Persistent quality–speed trade-off: FastGS-Big achieves higher quality at the cost of roughly halved speed, indicating that the Pareto frontier between the two remains open for further exploration.

vs. Taming-3DGS: Both leverage multi-view information, but Taming indirectly scores Gaussians via their attributes (opacity/scale/gradient), imposing insufficient constraints and still requiring 0.68M Gaussians. FastGS directly evaluates contribution to reconstruction quality and achieves comparable quality with only 0.40M Gaussians.
vs. Speedy-Splat: Its pruning relies on Hessian approximation gradients, which only indirectly exploit multi-view consistency and lead to severe quality degradation (PSNR 26.91 vs. FastGS 27.56). FastGS's VCP is more precise while preserving quality.
vs. DashGaussian: The current state of the art uses resolution scheduling to maintain quality but still requires 2.40M Gaussians. FastGS-Big surpasses its quality with half the Gaussian count.
vs. Mini-Splatting: Uses an intersection-preserving simplification strategy; although Gaussian count is low (0.53M), training time (17.69 min) is far slower than FastGS.

The multi-view consistency scoring concept can be generalized to any 3D representation learning task requiring control over point cloud or primitive count. The error-map-based scoring of VCD/VCP is compatible with importance sampling for NeRF acceleration. The Mahalanobis distance-based pruning in CB is transferable to other tile-based rasterization methods.

Rating¶

Novelty: ⭐⭐⭐⭐ — The VCD/VCP designs are concise and effective; individual components are not complex, but their combination yields strong results.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Six task categories, multiple backbones, multiple datasets, and comprehensive ablation studies.
Writing Quality: ⭐⭐⭐⭐ — Motivation is clearly articulated, visual comparisons are intuitive, and supplementary materials are thorough.
Value: ⭐⭐⭐⭐⭐ — Training 3DGS in 100 seconds has significant practical value and strong generalizability.