SG-NLF: Spectral-Geometric Neural Fields for Pose-Free LiDAR View Synthesis¶

Conference: CVPR 2026 arXiv: 2603.12903 Code: N/A Area: Autonomous Driving Keywords: Pose-free LiDAR, NeRF, spectral embedding, confidence-aware graph optimization, adversarial cross-frame consistency

TL;DR¶

SG-NLF proposes a pose-free LiDAR NeRF framework that addresses the geometric hole problem arising from LiDAR sparsity via a spectral-geometric hybrid representation, achieves global pose optimization through a confidence-aware pose graph, and enforces cross-frame consistency via adversarial learning. On nuScenes, it outperforms the previous state of the art by 35.8% in reconstruction quality and 68.8% in pose accuracy.

Background & Motivation¶

LiDAR novel view synthesis (NVS) is critical for autonomous driving perception, as it can extend the perceptual field of view and enhance system robustness. Existing methods face two core challenges: (1) most LiDAR NeRF methods rely on accurate camera poses, which are difficult to obtain in real-world scenarios; (2) LiDAR point clouds are inherently sparse and texture-free, making it difficult for conventional geometric interpolation encodings (e.g., multi-resolution hash encodings) to reconstruct continuous and complete surfaces in unobserved regions, leading to geometric holes and discontinuities. The existing pose-free method GeoNLF attempts to perform registration and reconstruction jointly, but relies solely on pairwise alignment constraints, which limits global trajectory accuracy. These issues are further exacerbated in low-frequency LiDAR sequences characterized by large inter-frame motion and limited overlap.

Core Problem¶

How can high-quality scene reconstruction and accurate global pose estimation be achieved simultaneously from sparse LiDAR point cloud sequences without relying on precise poses? The key difficulty is that the sparse, texture-free nature of LiDAR data prevents purely geometric interpolation representations from filling in geometric information in unobserved regions, while pairwise pose alignment cannot guarantee global trajectory consistency.

Method¶

Overall Architecture¶

Given a multi-frame LiDAR sequence \(\{S_i\}\), each point cloud is projected into a range image and each laser beam is modeled as a ray. The framework consists of three core modules: (1) a spectral-geometric hybrid representation for scene feature extraction; (2) a confidence-aware pose graph built from hybrid features for global pose optimization; and (3) an adversarial learning strategy to enforce cross-frame consistency. The optimized poses and hybrid features are fed into a NeRF, which synthesizes depth, intensity, and ray-drop probability via volume rendering.

Key Designs¶

Hybrid Spectral-Geometric Representation: On top of the geometric encoding \(f_\text{geo}\) from multi-resolution hash grids, a differentiable spectral embedding \(f_\text{spe}\) based on the Laplace–Beltrami operator (LBO) is introduced. An MLP approximates the first \(K\) LBO eigenfunctions by minimizing the Rayleigh quotient, with orthogonality and normalization constraints imposed to ensure validity. The spectral embedding exhibits intrinsic isometry invariance and captures global surface structure priors, compensating for the limitations of pure geometric interpolation in unobserved regions. The two representations are progressively fused into a hybrid feature \(f_\text{hyb}\): the low-frequency spectral embedding provides smooth and globally continuous geometry, while the high-frequency geometric encoding preserves local detail.
Confidence-Aware Graph Pose Optimization: A pose graph \(G = (V, E)\) is constructed in which vertices represent individual frames and their associated poses. In addition to temporally adjacent frame pairs, edges between non-adjacent frames are introduced based on a compatibility score computed from hybrid feature similarity (using a coarse-to-fine mutual nearest neighbor (MNN) strategy to establish point-level correspondences, with cosine similarity as the compatibility score). Each edge is weighted by a spatial consistency score that measures distance preservation between corresponding point pairs. Global poses are then optimized via a weighted Chamfer Distance loss. Compared to the pairwise constraints used in GeoNLF, this graph-based optimization achieves global trajectory accuracy.
Cross-Frame Consistency via Adversarial Learning: Existing methods apply pixel-level supervision only on single-frame range images, neglecting cross-frame structural information. SG-NLF transforms reconstructed point clouds into neighboring frame coordinate systems using estimated relative poses, renders "fake" depth maps, and pairs them with "real" depth maps obtained from ground-truth transformations. These pairs are fed into a multi-scale PatchGAN discriminator for adversarial training. The discriminator simultaneously evaluates reconstruction quality and pose accuracy, detecting geometric misalignment at both global and local scales.

Loss & Training¶

The total loss comprises: spectral loss (Rayleigh quotient + orthogonality + normalization) + graph optimization loss (weighted CD) + cross-frame consistency loss (adversarial hinge loss) + range image supervision loss. Training runs for 60k iterations with a batch size of 4096 rays, using the Adam optimizer with linear power decay of the learning rate. Poses are optimized in the Lie algebra space, omitting explicit Jacobian computation for more stable convergence.

Key Experimental Results¶

Low-Frequency Setting (KITTI-360, 2 Hz sampling)¶

Method	CD↓	Depth PSNR↑	Intensity PSNR↑
LiDAR4D (w/ GT pose)	0.276	24.728	16.951
GeoNLF (pose-free)	0.236	25.276	16.581
SG-NLF (Ours)	0.170	28.707	19.265

Low-Frequency Setting (nuScenes, 2 Hz sampling)¶

Method	CD↓	Depth PSNR↑	Intensity PSNR↑
LiDAR4D (w/ GT pose)	0.567	17.092	24.475
GeoNLF (pose-free)	0.241	22.947	28.608
SG-NLF (Ours)	0.155	28.409	30.499

Pose Estimation (ATE, m)¶

Method	KITTI-360	nuScenes
GeoNLF	0.170	0.228
SG-NLF	0.074	0.071

Ablation Study¶

Spectral embedding contributes most: Removing geometric encoding and using only spectral embedding (w/o GE) still substantially outperforms GeoNLF (CD: 0.181 vs. 0.241), confirming that the spectral prior is the core component.
Hybrid representation is optimal: Adding geometric encoding further improves performance (CD: 0.155), as high-frequency details require the complementary geometric encoding.
All three modules are necessary: Removing any single module (HR/GP/CFC) leads to significant performance degradation; removing the hybrid representation (w/o HR) drops CD to 0.217, and removing global pose optimization (w/o GP) drops CD to 0.463.
Cross-frame consistency is effective: Even without pose optimization, adding CFC improves training through regularization (as shown by comparing the baseline and w/o GP variants).

Highlights & Insights¶

Spectral embedding for LiDAR NeRF is a clever design: Exploiting the isometry invariance of LBO eigenfunctions to compensate for geometric holes caused by LiDAR sparsity represents a principled integration of differential geometry tools into volumetric rendering, offering structural prior advantages over pure hash encoding interpolation.
GAN discriminator as a cross-frame consistency verifier: By constructing real/fake depth map pairs via pose-based transformation, the discriminator jointly validates reconstruction quality and pose accuracy. This paradigm of using a discriminator for self-supervised regularization is transferable to other multi-view reconstruction tasks.
Compatibility-based edge selection in pose graphs: Using learned feature similarity to determine graph connectivity is more flexible than fixed temporal adjacency, and is particularly well-suited for low-frequency (large-motion) scenarios.

Limitations & Future Work¶

The method currently handles only static scenes and does not account for dynamic objects (extensions to dynamic scenes have been explored in LiDAR4D and STGC).
Spectral embedding requires additional Monte Carlo sampling and an eigenfunction MLP, increasing computational overhead; efficiency is not discussed in detail in the paper.
Evaluation is limited to KITTI-360 and nuScenes; generalization to other LiDAR sensor configurations remains untested.
The paper characterizes the approach as "one effective implementation," implying that alternative realizations of the framework have not been explored.

vs. GeoNLF: The most direct baseline, also a pose-free LiDAR NeRF. GeoNLF relies on pure geometric interpolation and pairwise alignment, whereas SG-NLF employs spectral-geometric hybrid representation, global graph optimization, and adversarial learning, achieving comprehensive improvements (CD reduced by 35.8%, ATE reduced by 68.8% on nuScenes).
vs. LiDAR4D: Despite using GT poses, LiDAR4D is outperformed by the pose-free SG-NLF (CD reduced by 38.5%), demonstrating that representational capacity improvements are more critical than pose accuracy.
vs. image-based pose-free methods (BARF, HASH, etc.): Adapting these methods to LiDAR yields poor results, highlighting the need for LiDAR-specific designs tailored to sparse data.

Relevance to My Research¶

The idea of spectral embedding as a geometric prior is transferable to other 3D tasks, such as 3D occupancy prediction and point cloud registration.
The edge selection strategy in confidence-aware graph optimization can serve as a reference for dynamically selecting reliable view pairs in multi-view fusion.

Rating¶

Novelty: ⭐⭐⭐⭐ — The combination of spectral embedding and LiDAR NeRF is novel, though each individual component (spectral analysis, graph optimization, GAN) is not new in isolation.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — Two datasets, low-frequency and standard-frequency settings, extensive ablations, and comprehensive quantitative and qualitative comparisons.
Writing Quality: ⭐⭐⭐⭐ — Clear structure, complete mathematical derivations, and highly informative figures and tables.
Value: ⭐⭐⭐ — The spectral embedding idea is inspiring, though LiDAR NVS is not a core research focus.