GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting¶

Conference: ICCV 2025 arXiv: 2410.17084 Code: GitHub Area: Autonomous Driving Keywords: 3D Gaussian Splatting, LiDAR-IMU-Visual Fusion, Real-Time Mapping, Gaussian Process Regression, Novel View Synthesis

TL;DR¶

This paper proposes GS-LIVM, the first real-time photo-realistic LiDAR-inertial-visual mapping framework designed for large-scale unbounded outdoor scenes. It addresses the problem of sparse and non-uniform LiDAR point clouds via voxel-level Gaussian Process Regression (Voxel-GPR), and leverages a covariance-centric design to rapidly initialize 3D Gaussian parameters. The method achieves state-of-the-art mapping efficiency and rendering quality across multiple outdoor datasets.

Background & Motivation¶

SLAM is a foundational technology for robotics and autonomous driving. Traditional SLAM relies on sparse feature representations, which are effective for localization but incapable of high-quality 3D reconstruction and photo-realistic rendering. The emergence of neural scene representations such as NeRF and 3DGS has transformed this landscape; however, applying them to large-scale unbounded outdoor scenes remains highly challenging:

Sparse and Non-Uniform Point Clouds: Point clouds generated by multi-line spinning LiDARs and non-repetitive scanning LiDARs are unevenly distributed. Directly applying them to 3DGS leads to memory inefficiency and optimization difficulties.

Directional Bias: Outdoor SLAM predominantly involves unidirectional motion, causing 3D Gaussian optimization to bias toward the camera's viewing direction and significantly degrading novel view synthesis quality.

Real-Time Bottleneck: Existing methods (e.g., Gaussian-LIC) require approximately 1 second per frame, far from satisfying real-time requirements (~100 ms), and fail to process complete outdoor sequences due to GPU memory overflow.

Overfitting in Offline Methods: Offline optimization methods overfit to supervised viewpoints, resulting in poor generalization to novel views.

Core Idea: A covariance-centric pipeline is proposed: Voxel-GPR generates uniformly distributed point clouds from sparse LiDAR input → GPR-derived covariances initialize the scale and rotation of 3D Gaussians → GPR parameters and the Gaussian map structure are iteratively updated to achieve real-time, high-quality mapping.

Method¶

Overall Architecture¶

The system consists of a tracking thread and a mapping thread. The tracking thread employs an ESIKF algorithm to fuse LiDAR/IMU/camera data for real-time state estimation, outputting odometry at IMU frequency. The mapping thread receives colored point clouds, applies Voxel-GPR to generate uniformly distributed predicted point clouds, initializes 3D Gaussians, and integrates them into a dense map for rendering optimization.

Key Designs¶

Voxel-Level Gaussian Process Regression (Voxel-GPR):
- Function: Converts sparse, non-uniform LiDAR point clouds into uniformly distributed grid point clouds.
- Mechanism: PCA is performed on the points \(\mathcal{P}_\alpha\) within each voxel to identify the principal directions, reducing the 3D problem to a 2D→1D GPR formulation. The axis with the smallest angle to the eigenvectors is designated as the value axis \(\mathbf{f}_\alpha\), while the remaining two axes serve as parameter axes \(\mathbf{x}_\alpha\). A uniform grid \(\mathbf{x}_{\alpha*}\) is generated on the parameter plane, and GPR predicts value-axis coordinates as: \(\mathbf{f}_{\alpha*} = \boldsymbol{\mu}_{\alpha*} = \mathbf{K}_{\alpha*}^\top (\mathbf{K}_\alpha + \sigma^2 \mathbf{I})^{-1} \mathbf{f}_\alpha\) along with covariance \(\boldsymbol{\Sigma}_{\alpha*}\). Hundreds of voxels are processed in parallel via CUDA, with per-frame latency under 30 ms.
- Design Motivation: Non-uniform point clouds lead to uneven gradient supervision, making 3DGS difficult to converge under fast motion. The uniform point clouds produced by GPR maintain or improve performance with fewer 3D Gaussians, significantly reducing GPU memory consumption.
Covariance-Driven Fast 3D Gaussian Initialization:
- Function: Uses GPR-output covariances to rapidly estimate the scale and rotation parameters of each 3D Gaussian.
- Mechanism: Each voxel is subdivided into \(n_s \times n_s\) sub-grids. For the \(\beta\)-th sub-grid, the weighted center \(\mathbf{p}^\beta\) and weighted covariance matrix \(\Phi^\beta\) are computed using the inverse of the GPR covariance as weights: \(\Phi^\beta = \frac{\mathbf{Q}^\top \cdot \text{diag}(w_1^\beta, ..., w_{n_r^2}^\beta) \cdot \mathbf{Q}}{\sum w_i^\beta}\) The scale parameter is set as \(S^\beta = \text{diag}(\Phi^\beta)\), and rotation is initialized as the identity quaternion. Colors are obtained by reprojecting positions onto the current image.
- Design Motivation: The original 3DGS computes scale via nearest-neighbor distances and initializes rotation to a constant, resulting in slow convergence. Regions with low covariance (high confidence) are assigned higher weights, ensuring geometrically reliable initialization.
Iterative Optimization Framework (Map Extension + Covariance Update + Similarity Regularization):
- Function: Continuously updates GPR parameters and the 3D Gaussian map structure.
- Mechanism: Voxels are categorized into four states (unexplored / pending / active / converged); only pending and active voxels are processed to improve efficiency. Two regularization losses are introduced:
  - Delta Depth Similarity Loss \(\mathcal{L}_d\): Constrains consistency between rendered depths of adjacent frames, enhancing robustness in novel view synthesis.
  - Structural Similarity Loss \(\mathcal{L}_p\): Measures the Euclidean distance between the current 3D Gaussian map and the latest LiDAR frame to accelerate optimization.
- Design Motivation: Relying solely on photometric loss tends to cause overfitting in outdoor scenes with predominantly unidirectional motion.

Loss & Training¶

\[\mathcal{L} = (1-\lambda_s)\|C - C_{gt}\|_1 + \lambda_s \mathcal{L}_{ssim} + \lambda_d \sum \mathcal{L}_d(\mathcal{F}_*, \mathcal{F}_{*+1}) + \lambda_p \mathcal{L}_p\]

Joint optimization is performed over the current camera window \(\mathcal{Q}_{curr}\) and randomly sampled historical frames \(\mathcal{Q}_{hist}\) to prevent catastrophic forgetting.

Key Experimental Results¶

Main Results¶

Dataset/Sequence	Metric	GS-LIVM	3DGS (Offline)	MonoGS	NeRF-SLAM
hku_campus_seq_00	PSNR↑/SSIM↑/LPIPS↓	22.43/0.719/0.247	21.74/0.719/0.302	12.14/0.368/0.608	13.23/0.410/0.653
Visual_Challenge	PSNR↑	21.81	18.55	13.48	12.98
eee_02 (Ouster-16)	PSNR↑	20.72	20.39	11.63	8.32
1005_00 (Botanic)	PSNR↑	21.12	22.17	14.56	9.79

Ablation Study¶

Configuration	PSNR	SSIM	Notes
w/o Voxel-GPR (raw point cloud)	Degraded	Degraded	Non-uniform distribution hinders optimization
w/o Covariance Initialization	1.3× time	—	Slower convergence
w/o Structural Similarity Loss (SS)	1.3× time	—	Insufficient detail representation
\(n_s=3\), \(\eta=0.3\), +SS, +DDS	19.29	0.640	Baseline full configuration
\(n_s=4\), \(\eta=0.3\), +SS, +DDS	20.14	0.672	More Gaussians improve quality
\(\eta=0.1\) (stricter convergence)	18.49	0.637	6× time increase with no quality gain

Key Findings¶

All sequences can be fully mapped on an 8 GB GPU (\(n_s=3\)), with peak memory consumption of approximately 3.5 GB.
Mapping time for the Botanic Garden 1018_13 sequence (208 s) consistently remains below the real-time threshold.
Voxel-GPR reduces the number of Gaussians while maintaining or improving rendering quality.
The choice of \(n_s\) involves an accuracy-speed trade-off: gains are significant in small scenes, but GPU memory may overflow in large scenes.

Highlights & Insights¶

First Real-Time Photo-Realistic LiDAR-Inertial-Visual SLAM Framework: Achieves genuine real-time performance (satisfying \(t \leq \mathcal{D}/\mathcal{C}\)) in large-scale outdoor scenes.
Covariance-Centric Design: The GPR covariance simultaneously serves 3D Gaussian initialization and GPR parameter updates, yielding dual benefits from a single computation.
CUDA Parallelization: Voxel-GPR processes hundreds of voxels in under 30 ms, making the system practically deployable.
Voxel state classification (a/b/c/d) eliminates unnecessary computation, representing an important engineering optimization.

Limitations & Future Work¶

The choice of \(n_s\) may cause memory issues in large-scale scenes, necessitating adaptive adjustment strategies.
Only zeroth-order spherical harmonics (SH) are used, limiting view-dependent color representation.
Dynamic object modeling is not supported; moving objects introduce artifacts.
Localization accuracy depends on the front-end odometry (FAST-LIVO/R3LIVE), and loop closure capability is limited.
The delta depth similarity loss may introduce false-positive constraints during rapid motion.

The Voxel-GPR concept is generalizable to other tasks requiring point cloud completion, such as sparse LiDAR semantic segmentation.
The covariance-based initialization approach has broader applicability in other 3DGS settings, such as city-scale reconstruction.
The iterative voxel state update strategy is analogous to keyframe management in incremental SLAM.

Rating¶

Novelty: ⭐⭐⭐⭐ The combination of Voxel-GPR and covariance-driven initialization is novel, though each component has precursors in prior work.
Experimental Thoroughness: ⭐⭐⭐⭐ Validated across multiple LiDAR configurations with detailed ablations; however, direct numerical comparison with the recent Gaussian-LIC is absent.
Writing Quality: ⭐⭐⭐⭐ Algorithm pseudocode is clear, system design is thoroughly described, and figures are informative.
Value: ⭐⭐⭐⭐⭐ Real-time large-scale outdoor photo-realistic mapping is a critical need for robotics and autonomous driving; open-source code is an additional merit.