Skip to content

UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction

  • Conference: ICLR 2026
  • arXiv: 2602.02089
  • Code: Not released
  • Area: 3D Vision / Large-Scale Scene Reconstruction
  • Keywords: 3D Gaussian Splatting, Large-Scale Reconstruction, Depth-Normal Regularization, Gaussian Pruning, Urban Scene

TL;DR

This paper proposes UrbanGS, a scalable 3DGS reconstruction framework for urban-scale scenes that simultaneously improves geometric accuracy, rendering quality, and memory efficiency through depth-consistent D-Normal regularization, spatially adaptive Gaussian pruning (SAGP), and a unified partitioning strategy.

Background & Motivation

3DGS performs well in bounded scenes, but scaling to large urban environments poses three major challenges:

Poor geometric consistency: Supervising only rendered normals updates rotation parameters but not positional parameters, resulting in inaccurate surface reconstruction.

Low memory efficiency: Homogeneous regions (sky, distant building facades) generate large numbers of redundant Gaussian primitives.

Poor computational scalability: Partitioning schemes introduce boundary discontinuities, and processing irrelevant viewpoints wastes computation.

Method

Overall Architecture

UrbanGS comprises three core modules:

  1. Depth-consistent D-Normal Regularization (geometric accuracy)
  2. Spatially Adaptive Gaussian Pruning (SAGP) (memory efficiency)
  3. Unified partitioning and view assignment (scalability)

1. Depth-Consistent D-Normal Regularization

Problem: Directly supervising rendered normals \(\hat{N}\) with pseudo-normals \(N\) updates only rotation parameters \(R\) via gradients, and cannot effectively update positional parameters \(u\).

Solution: D-Normals \(\bar{N}_d\) are derived from the rendered depth map:

\[\bar{N}_d(n,p) = \frac{\nabla_v d(n,p) \times \nabla_h d(n,p)}{|\nabla_v d \times \nabla_h d|}\]

where \(d\) denotes 3D coordinates obtained by back-projecting the depth map. The D-Normal regularization loss is:

\[\mathcal{L}_{dn} = \|\bar{N}_d - N\|_1 + (1 - \bar{N}_d \cdot N)\]

By linking geometric constraints to depth through D-Normals, both positional and rotation parameters are updated simultaneously.

Depth Consistency Regularization

To ensure multi-view depth consistency, an inverse depth loss and adaptive confidence weighting are introduced:

Inverse depth loss:

\[\mathcal{L}_{id}(u,v) = |\hat{D}^{-1}(u,v) - D_{ext}^{-1}(u,v)|\]

Geometry-aware confidence:

\[w_d = \exp\left(\frac{\cos\phi - 1}{0.01}\right) \cdot \exp\left(-\frac{\epsilon_d}{0.1}\right)\]

where \(\cos\phi\) measures depth gradient consistency and \(\epsilon_d\) measures normalized inverse depth deviation.

Total loss:

\[\mathcal{L}_{total} = \mathcal{L}_{RGB} + \lambda_1 \mathcal{L}_n + \lambda_2 \mathcal{L}_{dn} + \lambda_3 (w_d \cdot \mathcal{L}_{id})\]

2. Spatially Adaptive Gaussian Pruning (SAGP)

Scene partitioning: The scene is divided into voxel cells with characteristic length correlated to global Gaussian density:

\[\ell = \lambda \left(\frac{\mathcal{V}_{scene}}{\mathcal{N}}\right)^{1/3}\]

Local volume normalization (sublinear transform to suppress oversized primitives):

\[w_{v,i} = \left(\min\left(\frac{v_i}{\vartheta_{local}^{(t)}}, 1\right)\right)^{\kappa}\]

With \(\kappa=0.5\) (square root), the importance of fine-grained structures is amplified.

Importance score (product of three factors):

\[S_i = \phi_i \cdot \tau_i \cdot w_{v,i}\]
  • \(\phi_i\): normalized ray intersection frequency
  • \(\tau_i\): opacity mapped through Sigmoid
  • \(w_{v,i}\): sublinear volume weight

A Gaussian is retained only when it simultaneously exhibits high visibility, frequent observation, and appropriate geometric scale.

3. Partitioning Strategy

Building upon CityGS with the following improvements: - Global coarse 3DGS is first pruned via SAGP to reduce redundant Gaussians that attract irrelevant views. - Sub-block boundaries retain shared Gaussian primitives to avoid geometric discontinuities. - Camera view assignment is based on geometry and SSIM.

Key Experimental Results

Datasets

  • Mill19: Building, Rubble (aerial scenes)
  • UrbanScene3D: Residence, Sci-Art (urban scenes)

Main Results (Rendering Quality)

Method Building PSNR Rubble PSNR Residence PSNR Sci-Art PSNR
3DGS 22.53 25.51 22.36 24.13
CityGS-v2 - - - -
VCR-GauS - - - -
UrbanGS Best Best Best Best

UrbanGS achieves state-of-the-art or near-state-of-the-art SSIM, PSNR, and LPIPS across all datasets.

Geometric Accuracy

Qualitative comparison of rendered depth maps shows: - UrbanGS produces smoother object surfaces. - CityGS-v2 and VCR-GauS exhibit distortions on distant buildings and in complex regions.

Memory Efficiency

SAGP achieves significant model compression (see ablation for specific ratios) while maintaining rendering quality. VCR-GauS fails with out-of-memory errors on an A5000 GPU, whereas UrbanGS runs without issue.

Ablation Study

Ablation Effect
w/o D-Normal regularization Positional parameters cannot be effectively updated; surfaces appear rough
w/o depth consistency Multi-view depth misalignment
w/o confidence weighting Unreliable depth predictions interfere with optimization
w/o SAGP Gaussian count explodes; out-of-memory failure
Global vs. adaptive pruning Adaptive pruning preserves more detail

Highlights & Insights

  1. D-Normal regularization elegantly resolves the problem that normal supervision cannot update positional parameters.
  2. The combined depth and normal supervision is theoretically well-motivated and mathematically justified.
  3. SAGP is the first pruning framework specifically designed for urban-scale 3DGS.
  4. The approach offers a systematic solution balancing geometric accuracy, memory efficiency, and scalability.
  5. Large-scale scene reconstruction is demonstrated on consumer-grade GPUs such as the A5000.

Limitations & Future Work

  1. The method depends on the quality of external depth estimators (DepthAnything-v2) and normal estimators.
  2. Hyperparameters of SAGP (\(\lambda, t, \kappa\)) require tuning.
  3. The partitioning strategy is largely inherited from CityGS, offering limited novelty.
  4. Evaluation is restricted to aerial/urban scenes; applicability to large-scale indoor scenes remains unverified.
  5. The inverse depth loss may over-smooth nearby objects.
  • Large-scale 3DGS: VastGaussian (Lin et al., 2024) employs block partitioning but suffers boundary inconsistencies; CityGaussian (Liu et al., 2024a) requires time-consuming post-processing; CityGS-v2 (Liu et al., 2024b) adopts 2DGS but at the cost of rendering quality.
  • Geometric optimization: 2DGS (Huang et al., 2024a) and VCR-GauS (Chen et al., 2024b) introduce depth/normal regularization but fail to sufficiently update positional parameters.
  • Gaussian pruning: Fan et al. (2023) apply simple global-metric-based pruning that oversimplifies large-scale scenes.

Rating

  • Novelty: ⭐⭐⭐⭐ — Both D-Normal regularization and SAGP represent targeted contributions.
  • Practicality: ⭐⭐⭐⭐⭐ — Directly addresses real-world pain points in urban-scale reconstruction.
  • Clarity: ⭐⭐⭐⭐ — Methods are systematically described with sufficient theoretical analysis.
  • Significance: ⭐⭐⭐⭐ — Provides a complete solution for large-scale 3DGS.