Multi-View Pose-Agnostic Change Localization with Zero Labels¶

Conference: CVPR 2025
arXiv: 2412.03911
Code: https://MV-3DCD.github.io
Area: 3D Vision / Change Detection
Keywords: Multi-view change detection, 3D Gaussian Splatting, Zero-label, Pose-agnostic, DINOv2

TL;DR¶

This paper proposes the first zero-label, pose-agnostic multi-view change detection method. By constructing a change-aware 3DGS representation to fuse multi-view change information, it improves mIoU by 1.7 times compared to the baseline and is capable of generating change masks for unseen views.

Background & Motivation¶

Background: Change detection typically relies on precisely aligned pre- and post-event image pairs, which limits its application scenarios. A few methods support inconsistent poses but require labeled training data.

Limitations of Prior Work: Existing label-free methods (such as OmniPoseAD and SplatPose) only compare image pairs on a single-view basis, which is severely affected by view-dependent pseudo-changes (e.g., reflections and shadows).

Key Challenge: Single-view comparisons tend to yield many false positives, whereas multi-view fusion lacks an effective 3D representation.

Goal: To build a 3D change representation leveraging multi-view information to suppress view-dependent false positives.

Key Insight: Embedding change channels into 3DGS and modeling view-independent changes using zero-order spherical harmonic coefficients.

Core Idea: Learning additional change channels (change magnitude + change opacity) within the 3DGS of the evaluation scene to fuse multi-view feature- and structure-aware change masks.

Method¶

Overall Architecture¶

(1) Reconstruct 3DGS_ref using reference scene images; (2) Render reference scene images from the viewpoints of the evaluation scene; (3) Compare the rendered images with actual images to generate feature- and structure-aware change masks; (4) Embed change information into Change-3DGS_inf of the evaluation scene; (5) Render multi-view change masks from arbitrary novel viewpoints.

Key Designs¶

Feature- and Structure-Aware Change Mask:
- Function: Detect candidate change regions from a single viewpoint.
- Mechanism: Extract feature differences using DINOv2 to obtain the feature-aware mask \(M_F^k\); compute SSIM to acquire the structure-aware mask \(M_S^k\); perform element-wise multiplication of both to obtain the combined mask \(M_{F,S}^k\).
- Design Motivation: Feature and structural information are complementary—DINOv2 captures semantic changes, while SSIM captures pixel-level changes.
3DGS Change Channel Embedding:
- Function: Encode change information within the 3D representation to achieve multi-view fusion.
- Mechanism: Append two parameters—change magnitude \(\tilde{c}\) and change opacity \(\tilde{\alpha}\)—to each Gaussian point, model the change magnitude using zero-order spherical harmonic coefficients (view-independent), and supervise change channel rendering with L1 + D-SSIM loss.
- Design Motivation: Zero-order spherical harmonic coefficients model the changes as view-independent, effectively suppressing view-dependent false positives such as reflections and shadows.
Data Augmentation Strategy:
- Function: Increase the number of training masks used for learning change channels.
- Mechanism: Use Change-3DGS_inf to render evaluation scene images from reference scene viewpoints, backward-calculate change masks, and merge them with forward masks to augment training data.
- Design Motivation: The number of evaluation scene images can be very small (as few as 5 images); data augmentation improves the quality of change channel learning.

Loss & Training¶

Change channel learning utilizes L1 + D-SSIM loss. The final change mask is filtered via the alpha channel to exclude unseen regions: \(M^k = M_{ren}^k \cdot \mathbf{1}(A_{ren}^k \geq 0.5)\).

Key Experimental Results¶

Main Results¶

Dataset	Metric	Ours	SplatPose	Gain
MAD-Real	mIoU	0.132	0.077	1.7×
MAD-Real	F1	0.210	0.123	1.7×
ChangeSim	mIoU(C)	0.407	-	1.7× vs CSCDNet
PASLCD	Mean mIoU	Highest	Second Highest	Leading in all scenes

Key Findings¶

Only 5 evaluation scene images are required to learn effective change channels.
Able to generate change masks for novel viewpoints that are unseen in both the evaluation and reference scenes.
Zero-order spherical harmonic coefficients perform better than higher-order counterparts, verifying the assumption that changes are view-independent.

Highlights & Insights¶

Lifts change detection to the 3D representation level for the first time, achieving genuine multi-view fusion.
Contributes the PASLCD dataset containing 10 real-world scenes.
The method can serve as a multi-view extension for any single-view change detection method.

Limitations & Future Work¶

Relies on COLMAP to register evaluation scene images to the reference scene.
Extreme appearance changes (such as complete darkness) may lead to COLMAP registration failure.
Training the change channels requires scene-by-scene optimization.

Rating¶

Novelty: 9/10 — For embedding change channels in 3DGS for the first time.
Technical Depth: 8/10 — The design of modeling change with zero-order spherical harmonics is theoretically sound.
Experimental Thoroughness: 8/10 — Three datasets plus a newly contributed dataset.
Writing Quality: 8/10 — Clear description of the methodology.