CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis¶
Conference: CVPR 2025
arXiv: 2503.20998
Code: youngkyoonjang.github.io/projects/comapgs
Area: 3D Vision
Keywords: Sparse View Synthesis, 3D Gaussian Splatting, Covisibility Map, Point Cloud Enhancement, Uncertainty-aware
TL;DR¶
This paper proposes CoMapGS, which utilizes pixel-level covisibility maps to guide initial point cloud enhancement and adaptively weighted supervision in sparse-view 3DGS, representing the first attempt to explicitly focus on and recover high-uncertainty single-view regions.
Background & Motivation¶
Sparse novel view synthesis faces three core challenges:
- Imbalanced Regional Supervision: Highly covisible regions are over-optimized due to multi-view availability, whereas single-view regions (visible under only one training view) are neglected due to the lack of multi-view constraints.
- Sparse Point Cloud Initialization: COLMAP keypoint matching under a small number of training images yields extremely sparse point clouds, lacking geometric details.
- High-Uncertainty Regions: Single-view regions lack multi-view geometric constraints; existing methods either ignore or penalize these regions instead of leveraging their information.
Existing methods (e.g., FSGS, CoR-GS) primarily focus on optimizing highly covisible regions, lacking effective recovery strategies for single-view regions. The core innovation of this paper is: utilizing covisibility maps to quantify the uncertainty level of each pixel, and performing differentiated point cloud enhancement and supervisory weighting accordingly.
Method¶
Overall Architecture¶
Based on sparse-view 3DGS methods like CoR-GS, CoMapGS introduces three key steps: (1) generating pixel-level covisibility maps using MASt3R dense correspondence predictions; (2) enhancing the initial point cloud in both low- and high-uncertainty regions through dense correspondence triangulation and monocular depth estimation alignment; (3) training a proximity MLP classifier and combining it with the covisibility map for adaptively weighted proximity loss supervision.
Key Designs¶
-
Covisibility Map Generation and Initial Point Cloud Enhancement:
- Function: Quantify the multi-view covisibility frequency of each pixel and enhance the sparse point cloud regionally based on this.
- Mechanism: Predict dense correspondences for each training image pair using MASt3R, and accumulate the match counts \(M_i(x,y)\) for each pixel to obtain the covisibility map. Low-uncertainty regions (\(M_i \geq 1\)) supplement the COLMAP sparse point cloud \(P_C\) by triangulating dense correspondence points \(P_T\), keeping new points whose distance to \(P_C\) exceeds a threshold \(\epsilon\). High-uncertainty regions (single-view regions where \(M_i = 0\)) generate \(P_d^{high}\) via monocular depth estimation back-projection, and an anisotropic scaling transformation \(f_{scale}\) is learned to align them to the coordinate system of the triangulated point cloud.
- Design Motivation: COLMAP keypoint matching yields extremely sparse points with few images, whereas dense correspondences from MASt3R can significantly increase point cloud density; although monocular depth has arbitrary scale, it can be aligned by learning from known regions.
-
Covisibility Map-Weighted Proximity Loss:
- Function: Adaptively adjust supervision strength based on regional uncertainty levels.
- Mechanism: Train a 3-layer MLP classifier \(f_p\) to distinguish between the enhanced point cloud \(P_{final}\) (positive samples) and randomly shifted points (negative samples), outputting a proximity score \(s \in [0,1]\). For Gaussians inside the camera frustum, the weight \(w_{in} = 1/(M_i(\pi(g,\mathbf{H}_i)) + 1)\) is inversely proportional to the covisibility frequency—maximizing single-view region weight (=1) while reducing it for highly covisible regions. For Gaussians outside the camera frustum, a linear decay weight \(w_{out}\) based on the scene's average covisibility score \(S\) is enabled when \(S > 0.7\).
- Design Motivation: Highly covisible regions are already sufficiently supervised by standard reconstruction losses, so the proximity loss should focus more on under-constrained single-view regions; Gaussians outside the frustum in highly covisible scenes should also be moderately constrained.
-
Regional Strategy for Enhanced Point Cloud:
- Function: Divide the enhanced point clouds into regions of different confidence levels based on covisibility for separate processing.
- Mechanism: The final point cloud is \(P_{final} = P_u^{low} \cup P_s^{high}\), where \(P_u^{low}\) comes from triangulation (high confidence) and \(P_s^{high}\) comes from aligned monocular depth back-projection (low confidence). The classifier and weighted supervision naturally distinguish the reliability of points from different sources.
- Design Motivation: Triangulated points are more reliable due to multi-view verification, while depth back-projected points, although less accurate, are crucial for filling empty regions.
Loss & Training¶
A proximity loss term is added to the total loss:
where the proximity loss \(\mathcal{L}_p = \frac{1}{|G|}\sum_{g \in G}(\chi(g)w_{in} + (1-\chi(g))w_{out}) \cdot (1-s)\), and \(\chi(g)\) indicates whether the Gaussian is inside the frustum. This method can be seamlessly integrated into existing methods such as FSGS and CoR-GS.
Key Experimental Results¶
Main Results¶
| Dataset/View | Metric | CoR-GS | CoR-GS + CoMapGS | Gain |
|---|---|---|---|---|
| LLFF 3-view | PSNR/SSIM/LPIPS | 20.47/0.717/0.199 | 21.11/0.747/0.182 | +0.64/+0.030/-0.017 |
| LLFF 6-view | PSNR/SSIM/LPIPS | 24.78/0.844/0.116 | 25.20/0.854/0.108 | +0.42/+0.010/-0.008 |
| LLFF 9-view | PSNR/SSIM/LPIPS | 26.48/0.881/0.086 | 26.73/0.886/0.082 | +0.25/+0.005/-0.004 |
| Mip-NeRF 360 12-view | PSNR/SSIM/LPIPS | 19.16/0.574/0.414 | 19.68/0.591/0.394 | +0.52/+0.017/-0.020 |
| Mip-NeRF 360 24-view | PSNR/SSIM/LPIPS | 23.32/0.729/0.271 | 23.46/0.734/0.264 | +0.14/+0.005/-0.007 |
Ablation Study (LLFF 6-view)¶
| Configuration | PSNR↑ | SSIM↑ | LPIPS↓ | Description |
|---|---|---|---|---|
| CoR-GS baseline | 24.777 | 0.844 | 0.116 | Baseline |
| + Proximity loss only | 24.787 | 0.845 | 0.116 | Limited gain without point cloud enhancement |
| + Low-uncertainty point cloud enhancement △ | 24.90 | 0.849 | 0.112 | Point cloud in dense covisibility regions |
| + △ + Proximity loss | 25.153 | 0.854 | 0.109 | Significant synergistic effect |
| + Full point cloud enhancement | 25.076 | 0.852 | 0.109 | Adding single-view region points |
| Full CoMapGS | 25.204 | 0.854 | 0.108 | All components |
Key Findings¶
- Synergistic effect between initial point cloud enhancement and weighted supervision: Using proximity loss alone yields a marginal improvement of 0.01 PSNR, but when combined with enhanced point clouds, the improvement exceeds 0.25 PSNR.
- Greater improvements are observed with fewer views (0.64 PSNR gain for 3-view vs 0.25 for 9-view), demonstrating that the method is particularly effective for extremely sparse scenes.
- Significant improvements are achieved even by only enhancing point clouds in low-uncertainty regions (△) (+0.12 PSNR, -0.004 LPIPS), indicating that point cloud density is a key bottleneck.
- LPIPS improvement on Mip-NeRF 360 is particularly prominent (-0.020) because outdoor scenes contain more high-uncertainty regions.
Highlights & Insights¶
- First to explicitly focus on single-view regions: While previous methods ignore or penalize high-uncertainty regions, this paper does the opposite, applying stronger constraints to these regions through proximity loss.
- The concept of the covisibility map is simple yet powerful, compressing complex multi-view geometric relationships into a single integer count per pixel.
- Plug-and-play design: CoMapGS can be directly integrated into FSGS and CoR-GS without altering original training pipelines.
Limitations & Future Work¶
- Reliance on MASt3R for dense correspondence prediction increases preprocessing computational costs.
- The proximity MLP classifier is trained offline and does not participate in the online optimization of 3DGS, potentially failing to exploit dynamic geometric variations during training.
- Monocular depth alignment employs simple linear regression (anisotropic scaling), which may not sufficiently handle complex non-linear depth scale variations.
- Slightly lower PSNR than ReconFusion (a diffusion-model-based method) on 3-view, but superior in SSIM/LPIPS.
Related Work & Insights¶
- Unlike the covisibility concept proposed in DyCheck, this work extends covisibility from an evaluation tool to a training signal.
- The point cloud enhancement strategy (dense correspondence triangulation + depth alignment) can be used independently, benefiting all 3DGS-based methods.
- The proximity classifier workflow is similar to occupancy prediction in SDF fields but is more lightweight, making it worth exploring in other scene reconstruction tasks.
Rating¶
- Novelty: ⭐⭐⭐⭐ The idea of covisibility-map-guided adaptive supervision is novel, marking the first focus on recovering single-view regions.
- Experimental Thoroughness: ⭐⭐⭐⭐ Evaluated across multiple settings on LLFF and Mip-NeRF 360 with comprehensive ablation studies and comparisons against multiple baselines.
- Writing Quality: ⭐⭐⭐⭐ Rigorous notation definitions, systematic methodology description, and clear figures.
- Value: ⭐⭐⭐⭐ A practical plug-and-play module that provides sustained contributions to sparse-view synthesis.