OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering¶
Conference: ICCV 2025 arXiv: 2503.16177 Code: Project Page Area: 3D Vision Keywords: 3D Gaussian Splatting, Large Scene Reconstruction, Occlusion-Aware, Scene Partitioning, Rendering Acceleration
TL;DR¶
This paper proposes an occlusion-aware scene partitioning strategy and region-based rendering technique. By clustering a camera co-visibility graph, it achieves partitions aligned with the scene layout, significantly improving reconstruction quality and rendering speed for large-scale 3DGS.
Background & Motivation¶
Large scene reconstruction is critical for autonomous driving, cultural heritage preservation, and VR/AR applications. 3D Gaussian Splatting (3DGS) faces scalability challenges in large scenes due to its memory-intensive representation, and thus commonly adopts a divide-and-conquer strategy to partition the scene into smaller sub-regions for independent reconstruction.
However, existing scene partitioning methods share a common flaw: occlusion unawareness. They primarily partition based on camera positions or point clouds with uniform subdivision, neglecting scene layout and occlusion relationships. This leads to:
Severe occlusion within regions — cameras separated by walls or buildings are grouped into the same region
Low inter-camera correlation — cameras within a region share little co-visible content, resulting in low average contribution
Degraded reconstruction quality — training resources are dispersed across unrelated regions
This issue is particularly pronounced in ground-level captured scenes, where frequent occluders such as walls and buildings are common. Furthermore, after reconstruction, large scenes contain a massive number of Gaussian primitives, making rendering speed a critical bottleneck.
Method¶
Overall Architecture¶
OccluGaussian introduces two core innovations: 1. Occlusion-Aware Scene Partitioning — a camera partitioning strategy based on attributed graph clustering 2. Region-Based Rendering Acceleration — culling occluded Gaussians invisible to the current viewpoint
Occlusion-Aware Scene Partitioning¶
Attributed View Graph Construction¶
An undirected attributed graph \(\mathcal{G}=(\mathcal{V},\mathcal{E},X)\) is constructed as follows: - Nodes: each node corresponds to a camera - Edges: an edge is established between two cameras if they share visible content, with edge weight equal to the number of matched feature points, yielding adjacency matrix \(A \in \mathbb{R}^{n \times n}\) - Features: the 3D coordinates of each camera are position-encoded as node features \(X \in \mathbb{R}^{n \times d}\)
Cameras that are occluded from each other or spatially distant typically share very few overlapping views and can thus be effectively distinguished in the graph.
Graph Clustering¶
An attributed graph clustering algorithm is applied, first performing graph convolution to produce smoothed features, followed by spectral clustering:
Graph convolution filter: \(G = (I - \frac{1}{2}L_s)^r\)
Filtered features are obtained as \(\bar{X} = GX\). A similarity matrix is then computed, and spectral clustering is applied to group cameras that share substantial overlapping views or are spatially proximate into the same region.
Adaptive Cluster Number Determination¶
Starting from an initial cluster count \(K\), the algorithm recursively refines partitions by: - Splitting clusters that contain too many cameras - Ignoring clusters with too few cameras or those whose convex hull is entirely covered by another cluster - Iterating recursively until all clusters reach a balanced size
Region Reconstruction¶
Three types of training cameras are selected for each region: - Base set: cameras located within the region - Extension set: cameras outside the region that capture sufficient visible content within it - Boundary set: cameras facing the region but occluded, used to constrain Gaussian primitives near the boundary
Region-Based Rendering Acceleration¶
For each region, the 3D Gaussians visible to all training cameras within that region are recorded. During rendering, only the Gaussians recorded for the region containing the current viewpoint are processed, effectively culling occluded and invisible Gaussians. Regions can be further subdivided into smaller sub-regions for additional acceleration.
Key Experimental Results¶
Main Results — OccluScene3D Dataset¶
| Scene | Method | PSNR↑ | SSIM↑ | LPIPS↓ | FPS↑ |
|---|---|---|---|---|---|
| Gallery | VastGaussian | 25.09 | 0.903 | 0.095 | 215 |
| Gallery | CityGaussian | 21.98 | 0.808 | 0.294 | 120 |
| Gallery | Hierarchical-GS | 22.23 | 0.800 | 0.182 | 216 |
| Gallery | OccluGaussian | 25.81 | 0.903 | 0.094 | 289 |
| Canteen | VastGaussian | 24.60 | 0.890 | 0.105 | 211 |
| Canteen | OccluGaussian | 25.25 | 0.900 | 0.100 | 312 |
| ClassBuilding | VastGaussian | 24.05 | 0.884 | 0.111 | 270 |
| ClassBuilding | OccluGaussian | 25.33 | 0.921 | 0.083 | 340 |
Comparison on Zip-NeRF Dataset¶
| Method | PSNR↑ | SSIM↑ | LPIPS↓ |
|---|---|---|---|
| MERF | 23.49 | 0.747 | 0.445 |
| SMERF | 27.28 | 0.829 | 0.340 |
| Zip-NeRF | 27.37 | 0.836 | — |
Key Findings¶
- OccluGaussian achieves the best reconstruction quality across all scenes, with PSNR improvements of 1–3 dB
- Rendering speed is significantly improved, with FPS increasing from 200+ to 280–340; region-based rendering effectively eliminates redundant computation on invisible Gaussians
- The occlusion-aware partitioning strategy yields stronger inter-camera correlation and higher average contribution within each region
Highlights & Insights¶
- Precise problem formulation — Reframing scene partitioning from the perspective of camera co-visibility precisely identifies the root cause of quality degradation due to occlusion
- Effective reuse of SfM information — The view graph is built directly from SfM matching results with no additional computational overhead
- Unified rendering acceleration and partitioning — The occlusion-aware region partitioning naturally supports region-based rendering culling, achieving two goals simultaneously
- Strong applicability to ground-level scenes — Particularly well-suited for indoor and urban environments with frequent occlusions
Limitations & Future Work¶
- Relies on camera poses and matching information provided by SfM
- Primarily targets ground-level captured scenes; improvements are limited for aerial or open-world scenarios
- The adaptive cluster number determination requires recursive refinement, which may incur non-trivial overhead for extremely large scenes
Related Work & Insights¶
- Large scene reconstruction: BlockNeRF, Mega-NeRF, VastGaussian, CityGaussian, and other divide-and-conquer approaches
- Camera clustering: COLMAP's Metis graph partitioning, Out-of-Core-BA
- Rendering acceleration: Octree-GS, LightGaussian, Hierarchical-3DGS, and other LoD-based methods
Rating¶
- Novelty: ⭐⭐⭐⭐ (The unified design of occlusion-aware partitioning and region-based rendering is novel)
- Technical Depth: ⭐⭐⭐⭐ (Graph clustering methodology is complete and well-motivated)
- Experimental Thoroughness: ⭐⭐⭐⭐ (Multi-dataset validation with comprehensive quantitative and qualitative evaluation)
- Practical Value: ⭐⭐⭐⭐⭐ (Directly addresses practical pain points in large scene reconstruction)