Spectral-Geometric Neural Fields for Pose-Free LiDAR View Synthesis¶

Conference: CVPR 2026 arXiv: 2603.12903 Code: Unavailable Area: Autonomous Driving Keywords: LiDAR view synthesis, NeRF, pose-free, spectral embedding, pose graph optimization

TL;DR¶

This paper proposes SG-NLF, a framework that achieves pose-free LiDAR novel view synthesis via a hybrid spectral-geometric representation, combined with a confidence-aware pose graph and adversarial learning strategy. It significantly outperforms state-of-the-art methods on KITTI-360 and nuScenes (Chamfer Distance reduced by 35.8%, ATE reduced by 68.8%).

Background & Motivation¶

Background: NeRF has been successfully extended to LiDAR novel view synthesis (NVS), with methods such as LiDAR-NeRF and LiDAR4D implicitly reconstructing scenes via volume rendering. However, two critical bottlenecks remain: reliance on accurate poses and geometric discontinuities caused by LiDAR data sparsity.

Limitations of Prior Work: (a) Nearly all existing methods (LiDAR-NeRF, NFL, LiDAR4D, STGC) require accurate camera pose inputs, which are difficult to obtain in practice. (b) Geometry interpolation based on multi-resolution hash encoding tends to produce geometric holes and discontinuous surfaces in textureless regions (as illustrated in Fig. 2).

Key Challenge: The only existing pose-free method, GeoNLF, employs pairwise alignment constraints that cannot guarantee global trajectory accuracy. Furthermore, purely geometric interpolation representations fail to reconstruct continuous surfaces in sparse LiDAR regions.

Goal: To simultaneously achieve high-quality LiDAR view synthesis and accurate pose estimation without requiring precise pose inputs.

Key Insight: Spectral embeddings are introduced to provide global structural priors for geometric representation, and a confidence-aware pose graph based on feature compatibility is constructed for global pose optimization.

Core Idea: The isometry-invariant property of spectral embeddings is naturally suited to filling geometric holes in sparse LiDAR data. Combined with geometric encoding and a global pose graph, this approach simultaneously addresses the dual challenges of scene representation and pose estimation.

Method¶

Overall Architecture¶

Given a multi-view LiDAR sequence \(\{S_i\}\), SG-NLF proceeds as follows: (1) projects LiDAR point clouds into range images; (2) encodes 3D point features with a hybrid spectral-geometric representation; (3) optimizes global poses via a confidence-aware pose graph; (4) feeds optimized poses and hybrid features into a NeRF to render novel views; (5) applies an adversarial learning strategy to enhance cross-frame consistency.

Key Designs¶

Hybrid Spectral-Geometric Representation:
Geometric encoding \(f_\text{geo}(x)\): based on multi-resolution hash grid encoding, capturing local structure and high-frequency details.
Spectral embedding \(f_\text{spe}(x)\): approximates the first \(K\) eigenfunctions of the Laplace-Beltrami Operator (LBO) via an MLP, possessing intrinsic isometry invariance.
Core optimization objective: minimization of the discrete Rayleigh quotient + orthogonality constraint + normalization constraint.
The two representations are progressively fused into a hybrid representation \(f_\text{hyb}(x)\), balancing low-frequency smooth geometry and high-frequency details.
Confidence-Aware Global Pose Optimization:
A pose graph \(G = (V, E)\) is constructed, where nodes represent LiDAR frames and edges include temporally adjacent edges and non-adjacent high-compatibility edges.
Point correspondences are established via coarse-to-fine mutual nearest neighbor (MNN) matching on hybrid features.
Edge compatibility scores are computed as the mean cosine similarity of feature pairs; edges are included only when the score exceeds an adaptive threshold.
Edge weights are based on the spatial consistency score (distance preservation) of correspondences.
Pose graph loss: weighted Chamfer Distance.
Cross-Frame Consistency:
An adversarial learning strategy is introduced: reconstructed point clouds are transformed into adjacent frame coordinate systems using estimated relative poses and rendered as depth maps.
Fake samples (synthesized depth + real depth) and real samples (real transformed depth + real depth) are constructed.
A multi-scale PatchGAN discriminator evaluates geometric alignment quality.
Hinge loss is used to stabilize training.

Loss & Training¶

The overall training objective comprises: range image supervision loss (depth / intensity / raydrop) + pose-graph-weighted Chamfer Distance loss + spectral loss (Rayleigh quotient + orthogonality + normalization) + adversarial consistency loss.
Training runs for 60,000 iterations with a batch size of 4,096 rays, an initial learning rate of 0.01, and linear decay.
Poses are parameterized as 6D Lie algebra vectors and optimized as increments in \(\mathfrak{se}(3)\) space.
Training is feasible on a single RTX 4090 GPU.

Key Experimental Results¶

Main Results: KITTI-360 Low-Frequency Setting¶

Method	Pose	CD↓	F-score↑	Depth RMSE↓	Depth PSNR↑	Intensity PSNR↑
LiDARsim	GT	11.04	0.598	10.20	17.94	13.61
PCGen	GT	1.036	0.786	7.57	20.62	13.42
LiDAR4D	GT	0.276	0.884	4.73	24.73	16.95
GeoNLF	Pose-free	0.236	0.918	4.03	25.28	16.58
SG-NLF	Pose-free	0.170	0.919	2.95	28.71	19.27

Ablation Study: Component Contributions on nuScenes¶

Method	HR	GP	CFC	CD↓	Depth PSNR↑	Intensity PSNR↑	ATE (m)↓
Baseline	✗	✗	✗	0.618	21.32	25.86	1.328
w/o HR	✗	✓	✓	0.217	25.10	28.43	0.204
w/o GP	✓	✗	✓	0.463	23.94	27.55	0.798
w/o CFC	✓	✓	✗	0.182	26.60	29.30	0.076
Full SG-NLF	✓	✓	✓	0.155	28.41	30.50	0.071

Key Findings¶

Under the nuScenes low-frequency setting, SG-NLF reduces CD by 35.8% and ATE by 68.8% compared to GeoNLF.
Even compared to LiDAR4D which uses GT poses, the pose-free SG-NLF achieves improvements of 38.5%, 37.5%, and 25.4% in CD, depth RMSE, and intensity RMSE, respectively.
SG-NLF also achieves state-of-the-art performance on standard-frequency KITTI-360, demonstrating generalization ability.
The contribution of spectral embedding: it produces smoother and more complete geometry (see Fig. 6), though using it alone leads to missing high-frequency details; fusion with geometric encoding achieves the best of both.

Highlights & Insights¶

First introduction of spectral methods into LiDAR NeRF: The isometry-invariant eigenfunctions of the LBO naturally compensate for LiDAR sparsity and texturelessness, making them a well-motivated design choice.
Global pose graph vs. pairwise alignment: By leveraging feature compatibility to discover constraints between non-adjacent frames, trajectory accuracy is substantially improved.
Clever application of adversarial learning: Cross-frame depth map pairs serve as real/fake samples, allowing the discriminator to jointly evaluate reconstruction quality and pose accuracy.
Strong performance in low-frequency scenarios: The advantage is especially pronounced in low-frequency sequences characterized by large inter-frame motion and limited overlap.

Limitations & Future Work¶

The paper acknowledges that only one effective instantiation of SG-NLF is presented; future work may explore additional technical combinations.
Spectral embedding requires sampling on implicit surfaces and solving the LBO, which increases computational complexity.
Dynamic scenes are not handled (a capability already present in LiDAR4D and STGC).
Edge filtering in the pose graph relies on adaptive thresholds, and the robustness of the thresholding strategy has not been thoroughly analyzed.

Compared to GeoNLF (pairwise alignment), SG-NLF achieves global optimization via the confidence-aware pose graph, yielding ATE reductions of 56%–69%.
Successful practices from spectral methods (SNS, Neural Geometry Processing) in 3D geometric processing are introduced into LiDAR NeRF for the first time.
The application of adversarial learning for cross-frame consistency can inspire other multi-frame reconstruction tasks.
The framework is extensible to joint LiDAR-camera representation learning.

Rating¶

Novelty: ⭐⭐⭐⭐⭐ Introducing spectral embeddings into LiDAR NeRF is a highly creative design; the global pose graph and adversarial consistency also constitute substantial contributions.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Two datasets × two frequency settings, with comprehensive comparisons against both pose-supervised and pose-free methods, and clear ablation studies.
Writing Quality: ⭐⭐⭐⭐ Well-structured with complete mathematical derivations, though some sections are notation-heavy.
Value: ⭐⭐⭐⭐⭐ A significant advance in pose-free LiDAR view synthesis, with particularly strong gains over state-of-the-art in low-frequency scenarios.