Spectral-Geometric Neural Fields for Pose-Free LiDAR View Synthesis¶

Conference: CVPR2025
arXiv: 2603.12903
Code: TBD
Area: Autonomous Driving
Keywords: LiDAR view synthesis, NeRF, pose estimation, spectral embedding, point cloud reconstruction

TL;DR¶

SG-NLF proposes a pose-free LiDAR NeRF framework. By reconstructing smooth geometry with a hybrid spectral-geometric representation, achieving global alignment via a confidence-aware pose graph, and enhancing cross-frame consistency with adversarial learning, it outperforms the state-of-the-art by 35.8% in reconstruction quality and 68.8% in pose accuracy under low-frequency LiDAR scenarios.

Background & Motivation¶

LiDAR novel view synthesis (NVS) is crucial for expanding the perception range and enhancing the robustness of autonomous driving systems.
Traditional LiDAR simulation (ray casting) struggles to accurately model the intensity and ray-drop characteristics of real-world LiDAR.
While NeRF has been successfully extended to LiDAR NVS, most methods rely heavily on accurate poses, which are difficult to acquire in practice.
LiDAR data is sparse and lacks texture information; interpolation encoding (such as multi-resolution hash encoding) struggles to reconstruct continuous surfaces, leading to geometric cavities and discontinuities.
The existing pose-free method, GeoNLF, relies on pairwise alignment constraints, making it difficult to guarantee global pose accuracy.
Low-frequency LiDAR sequences (characterized by large inter-frame motion and low overlap rates) further exacerbate the challenges of multi-view consistency.

Method¶

Overall Architecture¶

SG-NLF consists of three core components: (1) a hybrid spectral-geometric representation for smooth and consistent scene reconstruction; (2) a confidence-aware pose graph for global pose optimization; and (3) an adversarial learning strategy to enhance cross-frame consistency. Given a sequence of multi-frame LiDAR point clouds as input, it jointly recovers global poses and reconstructs a continuous implicit scene representation.

Key Designs¶

1. Hybrid Spectral-Geometric Representation - Geometric Encoding: Extracts local geometric features \(\boldsymbol{f}_{\text{geo}}(\mathbf{x})\) based on a multi-resolution hash grid. - Spectral Embedding: Learns the first \(K\) eigenfunctions \(\Psi_k(\mathbf{x})\) of the Laplace-Beltrami operator, which possess intrinsic isometric invariance. - Differentiably approximates the eigenfunctions via an MLP while minimizing the Rayleigh quotient. - Employs rejection sampling to uniformly sample points on the implicit surface to compute the area elements of the First Fundamental Form. - Orthogonality loss \(\mathcal{L}_{\text{ortho}}\) + Normalization loss \(\mathcal{L}_{\text{norm}}\). - Progressive Fusion: Gradually fuses the spectral and geometric features into \(\boldsymbol{f}_{\text{hyb}}(\mathbf{x})\) during training.

2. Confidence-Aware Pose Graph - Constructs a pose graph \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\) containing sequential edges as well as non-adjacent, highly-compatible edges. - Establishes point correspondences using a coarse-to-fine Mutual Nearest Neighbor (MNN) matching based on hybrid features. - Edge Compatibility Score: Measures the average cosine similarity \(E^{ij}\) of corresponding feature pairs, with adaptive thresholding controlling edge selection. - Spatial Consistency Weighting: Computes a distance-preservation score \(P_{mn}\) between corresponding pairs to serve as the edge weight \(\alpha^{ij}\). - Pose Graph Loss: Weighted Chamfer Distance \(\mathcal{L}_{\text{graph}} = \sum_{(i,j) \in \mathcal{E}} \alpha^{ij} \cdot \mathcal{L}_{\text{cd}}^{ij}\). - Pose Parameterization: Employs 6D Lie algebra + exponential mapping, omitting the Jacobian to stabilize convergence.

3. Adversarial Learning - For adjacent frames \((i,j)\), transforms the reconstructed point cloud \(\hat{\mathcal{S}}_i\) into the coordinate system of frame \(j\) using the estimated relative pose to render a depth map. - Constructs real pairs \([\hat{D}_{ij}, D_j]\) and fake pairs \([D_{ij}, D_j]\). - Multi-scale PatchGAN discriminator + hinge loss. - The discriminator simultaneously evaluates the quality of frame-by-frame reconstruction and the accuracy of cross-frame geometric alignment.

Loss & Training¶

Range image supervision (depth + intensity + ray-drop).
Spectral loss \(\mathcal{L}_{\text{spe}}\) (Rayleigh quotient + orthogonality + normalization).
Pose graph loss \(\mathcal{L}_{\text{graph}}\).
Adversarial consistency loss \(\mathcal{L}_{\text{con}}\).

Key Experimental Results¶

Main Results (KITTI-360 Low-Frequency Configuration)¶

Method	CD↓	F-score↑	Depth RMSE↓	Depth PSNR↑	Intensity PSNR↑
LiDAR4D (GT pose)	0.2760	0.8843	4.7303	24.73	16.95
GeoNLF	0.2363	0.9178	4.0293	25.28	16.58
SG-NLF	0.1695	0.9191	2.9514	28.71	19.27

nuScenes Low-Frequency Configuration¶

Method	CD↓	Depth RMSE↓	Intensity RMSE↓
GeoNLF	0.2408	5.8208	0.0378
SG-NLF	0.1545	3.0706	0.0299

CD decreases by 35.8%, and ATE decreases by 68.8% (nuScenes).

KITTI-360 Standard-Frequency Configuration¶

Method	CD↓	Depth PSNR↑	Intensity PSNR↑
LiDAR-NeRF	0.0923	26.77	16.17
LiDAR4D	0.0894	27.88	17.45
GeoNLF	0.1855	29.39	16.57
SG-NLF	0.0867	32.72	19.55

Key Findings¶

Even though LiDAR4D uses GT poses, the pose-free SG-NLF comprehensively outperforms it in terms of CD and RMSE.
Spectral embedding significantly reduces geometric holes, rendering reconstructed surfaces more continuous and smooth.
Non-adjacent edges in the pose graph effectively improve global trajectory accuracy.
Adversarial learning yields a pronounced improvement in cross-frame consistency.

Highlights & Insights¶

Innovative Application of Spectral Embedding: First to introduce LBO eigenfunctions into LiDAR NeRF, leveraging intrinsic isometric invariance to reconstruct smooth geometry.
Global vs. Pairwise Alignment: The confidence-aware pose graph discovers non-adjacent loop closure constraints through feature compatibility, breaking the limitations of pairwise alignment.
Outperforming GT-Pose Methods: Without requiring pre-computed poses, SG-NLF outperforms methods utilizing ground-truth poses in reconstruction quality.
Adversarial Learning for Geometric Consistency: The PatchGAN discriminator simultaneously evaluates reconstruction quality and pose accuracy.

Limitations & Future Work¶

The MLP optimization for spectral embedding increases training time, and inference efficiency is not reported in detail.
Validated only on two autonomous driving datasets (KITTI-360 and nuScenes), with no testing on indoor or unstructured scenarios.
The sensitivity of performance to the adaptive threshold selection in pose graph construction is not fully analyzed.
The stability of adversarial training may be affected under extreme scenarios.

Compared to the pairwise alignment in GeoNLF, the global optimization using a pose graph combined with feature compatibility is a key breakthrough.
The concept of spectral embedding can be generalized to other sparse 3D reconstruction tasks (e.g., RGB-D, event cameras).
Adversarial learning-based cross-frame consistency supervision can be applied to other scene reconstruction frameworks.
Provides a high-quality view synthesis tool for LiDAR data augmentation and simulation.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ (Triple innovation of spectral embedding + global pose graph + adversarial learning)
Experimental Thoroughness: ⭐⭐⭐⭐ (Multiple datasets and configurations, comprehensive ablation studies)
Writing Quality: ⭐⭐⭐⭐ (Clear description of methodology, complete mathematical derivation)
Value: ⭐⭐⭐⭐⭐ (Significantly advances the pose-free LiDAR NVS state-of-the-art)